Semantic Arts Admin, Author at Semantic Arts

Six Enterprise Knowledge Graph Anti-Patterns

November 10, 2022 by Semantic Arts Admin

The truth is that despite the outsider’s perception that the world of technology just keeps getting better, faster and less expensive, over 70% of enterprise digital transformations fail to achieve their objectives (McKinsey, 2019). Most commentary on the subject points the finger at executive priorities, strategic goal-setting and organizational management. But the seldom discussed root cause of this failure is complexity, more specifically integration debt. Integration debt is the ongoing cost of integrating data from a large number of enterprise applications that each have their own local incompatible data models.

Leading organizations are starting to address this proliferation of local data models by adopting a more “data-centric” architecture. A data-centric architecture models a singular representation of the organization’s core data in a knowledge graph. The knowledge graph simplifies and harmonizes data from enterprise applications. Organizations can see a 1000-fold decrease in complexity across the enterprise data landscape.

As organizations make this move, they stumble falter and fail. This leads to what Gartner’s calls the trough of disillusionment. But it doesn’t have to. The technology is ready to deliver with the right skills and approach. With careful execution and the right Sherpa, you can make it up the mountain. The ascent is worth it. Knowledge graphs are changing the game for those who succeed. We are sharing six common anti-patterns for enterprise knowledge graphs to keep you heading in the right direction.

Anti-pattern #1 — Agreeing with the Status Quo

“This is how enterprise architecture is done”

The most progress deadening perspective you can hold is that the future is going to be just like the present. The most challenging aspect of moving to an enterprise knowledge graph is not a technical one, it is the mindset that embraces the status quo. In order to break through to those who are entrenched is to educate them and show them a better way preferably with their own data.

People who make decisions in your organization must view this technology for the radical departure that it is from relational databases, data warehouses or data lakes. If you don’t secure this distinction, you will be in danger of creating yet another silo of data, in this case a semantic silo.

You must personally step boldly enough into this mindset and educate yourself to the point that you can see a coherent vision of this next generation architecture. Your vision has to be nearly white-hot, because in the beginning no one will understand you and they will disbelieve that such a revolutionary approach is possible. Convince yourself it is possible.

But it is not just the boldness of vision but its broadness as well. The vision is a transformation for the entire organization, and you must keep this in mind. If you can find a few advocates across functional boundaries, you may be able to demonstrate what data harmonization is capable of: astounding flexibility, intricate insights, a broad purview of your enterprise, holistic yet simple.

Anti-pattern #2 — Fad Surfing

“This is cool, lets buy it and see what it does”

The tech world moves quickly from one favored technology to the next. Fad surfing refers to plucking technologies from the latest trend and trying them out. We have seen NoSQL, Hadoop, Kafka, Data Lakes, Data Warehouses, AI and Snowflake all fly by and get picked up by organizations struggling to get a handle on their data. The problem with fad surfing is that organizations get used to trying and quickly discarding technologies. This might work if you were trying out an application or a shirt but with enterprise architecture it’s a little like trying on a new foundation for your house to see how it works.

Data-centric thinking is not strictly speaking a technology. Is it an approach that uses a knowledge graph, semantic modeling and model driven everything. The move needs to square with corporate strategic planning and have people dedicated to making a longer-term investment in design, execution and integration of the core ontology. Impulse buying is not a recipe for success. There is too great a maturation phase than is sustainable by someone trying a technology out and then showing it off to gain support. With this mindset you will be forced to move on quickly perhaps feeling like you have exercised due diligence but rejecting it because either your organization or the technology was “not ready”.

Anti-pattern #3 — Too Small

“We can get that from the old system”

Have you ever done a proof of concept successfully only to have it shelfed because there was no enthusiasm for a larger implementation. It happens, a lot. The same thing will happen with your enterprise knowledge graph proof of concept if the scope is too small. If you show someone a graph solution that embodies a single domain, they feel they could have gotten the answer using the status-quo relational system.

The way to avoid this frustration is by selecting 2–3 domains that are currently difficult to reconcile. This demonstrates that even if queries could be answered before, they would have been time consuming and difficult one-offs and required a great deal of specialized knowledge to complete. This demonstrates the knowledge graph’s serviceability, its ability to answer questions easily that span difficult to reconcile data sources with different levels of abstraction or other domains that do not share a common nomenclature. In addition, it demonstrates the simplification and logical grounding of semantic concepts. By simplifying the entire structure non-experts can understand and query the data. A working example of data democratization.

Anti-pattern #4 — Too Big

“Ok, now what?”

It is possible to develop an enterprise knowledge graph that encompasses nearly the entire organization within nine months to a year if you have a team of ontologists ready to go. The problem is that when presented with an all-encompassing enterprise knowledge graph, few technical leaders know what to do with it, or even if they want to deploy this model at scale. The prospect of making a change of this magnitude drains the energy and enthusiasm of the people who contemplate it. You may have to swallow an elephant but perhaps the waiter could bring just the ear to start with.

The paradox is that you must include enough of the enterprise perspective so that you do not create an overall design that locks you into design (ontological) commitments that are structurally incompatible with other areas of your business. The key is to survey enough of the rest of the enterprise at a sufficient but not exhaustive level such that you can future proof the design to a large degree and rework is minimal. One good thing is that with a graph data structure even rework is less destructive than relational models that have rigid schemas and deep structural commitments.

The use of a minimalist upper-level ontology designed for such a purpose can help by establishing reusable patterns that can be applied in one domain today and somewhere else a year from now with consistent results. We are biased but Gist, the ontology from Semantic Arts, has over 100 enterprise implementations under its belt and is both simple and comprehensive in scope. It provides a set of covering concepts, semantic primitives, that can be combined to describe your organization and sub domains with minimal extension. So think big, looking broadly at the enterprise, but start small with a couple of sub-domains.

Anti-pattern #5 — Data Governance

“We’ve got this covered”

Data governance initiatives are a growth industry, with good reason. Organizations know they face an overwhelming challenge to make sense of all of their data. They also have the obligations to secure personally identifiable PII data, comply with domestic and international data regulations and be able to provide audit support. This is all before enhancing the ability to find and reuse data for the purposes of analysis and operational improvements.

So called master data management initiatives designate certain data in the organization as “golden records” then establish a practice around managing and proliferating the most up-to-date version of those golden records. This is seen as a solution in today’s enterprise environment. What it really is is the overgrowth of a set of technologies and practices that are built to correct for the shortcomings of the application-centric world we live in. We are becoming expert at solving the wrong problem and waste effort paying the wages of integration debt. The solution is data-centric. In a data-centric world there would not be literally thousands of instances of the same social security number (a real example).

The anti-pattern here is to believe that since you have a master data management initiative you are solving the problem of organizing your data and making it maximally findable and usable. You are not. In some ways it makes the problem worse by entrenching systems of records and adding additional layers of automation to ship updated data around at greater cost and effort.

Anti-pattern #6 — Data Hoarding

“It’s too valuable to share”

Some organizations are better than others at sharing data among internal groups. There are good reasons why data should be kept secure. We have all heard of incidents of data breeches that have cost companies lots of money and prestige. Data hording though, is a practice of going beyond protecting data to preventing its use by others with legitimate needs. This is a pathological behavior that is typically borne from the perception that the data is valuable and sharing it with others would reduce me or my team’s value to the organization. Currently, people must come and have permission to get summary level data from the data hoarders, and they like it.

Two lessons here, first if you or someone you know is a hoarder try to get help. If data hoarders cannot be reasoned with, and some cannot, you should identify them early in the process and not count on them to participate. Instead, develop the system elsewhere until it has enough demonstrable value and political support to perhaps persuade them to come along. The process can be a positive one. Sometimes data hoarders can be made to see the value of the harmonization of data. They themselves can see how combining other’s data with their own can provide value to them. Seeing this, it may be possible that they come around to seeing value in living in harmony in the graph.

Need a sherpa to get up the mountain?

CONTACT US

Originally posted at Medium.com

The Greatest Sin of Tabular Data

September 9, 2022September 6, 2022 by Semantic Arts Admin

We recently came across this great article titled “The greatest sin of tabular data”. It is an excellent summary of the kind of work we do for our clients and how they benefit.

You can read it at The greatest sin of tabular data · A blog @ nonodename.com

The journey of capturing the meaning to data is an elusive process. If 80% of data science is simply data wrangling, how can we do better actually providing value by making sense of that data?

With a disciplined approach and levering RDF capabilities, Semantic Arts can help to create clear, defined data, saving time and money and driving true value instead of getting bogged down in simply trying to understand data.

As stated by the author, “We can do better!”

Reach out to Semantic Arts today to see how we can help.

Original article at nonodename.com/Dan Bennett via LinkedIn Post.

Get the gist: start building simplicity now

January 20, 2025July 21, 2022 by Semantic Arts Admin

While organizing data has always been important, a noticeably profound interest in optimizing information models with Semantic Knowledge graphs has arisen. LinkedIn, AirBnB, in addition to giants Google and Amazon use graphs, but without a model for connecting concepts with rules for membership buyer recommendations and enhanced searchability (follow your nose) capabilities would lack accuracy.

Drum roll please … Introduce the ontology.

It is a model that supports semantic knowledge graph reasoning, inference, and provenance enablement. Think of an ontology as the brain giving messages to the nervous systems (the knowledge graph). An ontology organizes data into well-defined categories with clearly defined relationships. This model represents a foundational starting point that allows humans and machines to read, understand, and infer knowledge based on its classification. In short, this automatically figures out what is similar and what is different.

We’re asked often, where do I start?

Enter ‘gist’ a minimalist business ontology (model) to springboard transitioning information into knowledge. With more than a decade of refinement grounded in simplicity, ‘gist’ is designed to have the maximum coverage of typical business ontology concepts with the fewest number of primitives and least amount of ambiguity. ‘gist’ is available for free under a Creative Commons license and is being applied and extended within a number of business use cases and utilized by countless industries.

Recently, senior Ontologist Michael Uschold has been sharing an introductory overview of ‘gist’, maintained by Semantic Arts.

One compelling difference from most publicly available ontologies, ‘gist’ has an active governance and best practices community, called the gist Council. The council meets virtually on the first Thursday of every month to discuss how to use ‘gist’ and make suggestions on its evolution.

See Part I of Michael’s introduction here:

See Part II of Michael’s introduction here:

Stay tuned for the final installment!

Interested in gist? Visit Semantic Arts – gist

See more informative videos on Semantic Arts – YouTube

The Data-Centric Revolution: Headless BI and the Metrics Layer

June 29, 2022June 22, 2022 by Semantic Arts Admin

Read more from Dave McComb in his recent article on The Data Administration Newsletter.

“The data-centric approach to metrics puts the definition of the metrics in the shared data. Not in the BI tool, not in code in an API. It’s in the data, right along with the measurement itself.”

Link: The Data-Centric Revolution: Headless BI and the Metrics Layer – TDAN.com

Read more of Dave’s articles: mccomb – TDAN.com

The 90s Are Over, Let’s Stop Basing Capital Cost Decisions on Lagging Indicators

June 16, 2021February 12, 2021 by Semantic Arts Admin

Let’s Stop Basing Capital Cost Decisions on Lagging Indicators. Remember the good old days of emerging digital technology? Accessing information through a dial-up internet connection. Saving data to floppy discs or CDs. Sending emails to have them printed for storage. Mobile connectivity was new, exciting, and… slow compared to what we have today.

In the energy sector, data access limitations influenced the structure of traditional execution workflows for capital projects. It was common – and still is – for project execution models to focus on document-based deliverables over raw data.

The inherent problem with a document-centric approach is that documents take time to produce. Let’s imagine the workflow for a technology evaluation study that:

Begins with initial input from multiple departments.
Gets reviewed by 2-3 management layers on the project organizational chart.
Finally lands on the desk of a senior decision-maker.

This process could easily take two weeks or longer. But what happens during those two weeks? Work doesn’t get paused. The project continues to progress. The information initially collected for the study no longer represents current project conditions. By the time it gets to the decision-maker, the study is based on two-week-old lagging indicators.

A lot can change on a project in that amount of time. Execution workflows built around lagging indicators tend to:

Lead to costly and unnecessary errors caused by decisions based on old information.
Stymie innovation with rigid and slow processes that limit experimentation.

Click here to read more.

Originally posted in: Digital Transformation

Click here to Read an Advanced Chapter from the Data-Centric Revolution by Dave McComb

A Data-Centric Approach to Managing Customer Data

June 9, 2021January 11, 2021 by Semantic Arts Admin

by Phil Blackwood, Ph.D.

Without a doubt every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of each customer. However, this customer data is typically scattered across hundreds of applications, its meaning is embedded in code written years ago, and much of its value is locked away in silos. Compounding the problem, stakeholders in different parts of the business are likely to have different views of what the word “customer” means because they support different kinds of interactions with customers.

In this post, we’ll outline how to tackle these issues and unlock the value of customer data. We’ll use semantics to establish simple common terminology, show how a knowledge graph can provide 360 degree views, and explain how to classify data without writing code.

The semantic analysis will have three parts: first we consider the simple use case illustrated in the diagram below, then take a much broader view by looking at Events, and finally we will dive deeper into the meaning of the diagram by using use the concept of Agreements.

Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

The diagram shows an event in which a customer purchases a shirt from a shop. Ask stakeholders around your company what types of events customers participate in, and you are likely to get a long list. It might look something like this (the verbs are from the viewpoint of your company):

Answer general questions about products and services
Create billing account for products and services
Create usage account for a product or service
Deliver product or service (including right-to-use)
Finalize contract for sale of product or service
Help a customer use a product or service
Identify a visitor to our web site.
Determine a recommender of a product or service
Find a user of a product or service
Migrate a customer from one service to another
Migrate a service from one customer to another
Prepare a proposal for sale of product or service
Receive customer agreement to terms and conditions
Receive payment for product or service
Rent product or service
Sell product or service
Send bill for product or service
Ship product

We can model these events using classes from the gist ontology, with one new class consisting of the categories of events listed above. When we load data into our knowledge graph, we link each item to its class and we relate the items to each other with object properties. For example, an entry for one event might look like:

Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

By using categories instead of creating 18 new classes of events, we keep the model simple and flexible. We can round out the picture by realizing that the Person could instead be an Organization (company, non-profit, or government entity) and the Product could instead be a Service (e.g. window washing).

In a green-field scenario, the model and the data are seamlessly linked in a knowledge graph and we can answer many different questions about our customers. However, in most companies a considerable amount of customer data exists in application-centric silos. To unlock existing customer data, we have to first understand its meaning and then we can link it into the knowledge graph by using the R2RML data mapping language. This data federation allows us to write queries using the simple, standard semantic model and get results that include the existing data.

For any node in the knowledge graph, we have a 360 degree view of the data about the node and its context. A Person node can be enriched with data from social media. An Organization node can be enriched with data about corporate structure, subsidiaries, or partnerships.

Now let’s pivot from the broad event-based perspective to look more closely at the meaning of the original example. Implicit in the idea of a sale is an agreement between the buyer and the seller; once the agreement is made, the seller is obligated to deliver something, while the buyer must pay for it. The “something” is a product or service. We can model the transaction like this: Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

This basic pattern of agreement and obligation covers many use cases. The agreement could be the simple act of placing the shirt on the check-out counter, or it could be a contract. Delivery and payment could coincide in time, or not. Payments or deliveries, or both, could be monthly.

If our Contract Administration group wants a simple way to identify all the customers who have a contract, we can create a Class named ContractCustomer and populate it automatically from the data in our knowledge graph. To do this, we would write an expression similar to a query that defines what we mean by ContractCustomer, declare the Class to be equivalent to the expression, and then run an off-the-shelf, standards-based inference engine to populate the new class. With no code needed … it’s model-driven.

This method of automatically populating classes can be used to support the wide variety of needs of stakeholders in different parts of the company, even though they do not have the same definition of customer. For example, you could provide classes like PayingCustomer and ProductUsers that can be used to simplify the way the data is accessed or to become building blocks in the model to build upon. With this approach, there is no need to try to get everyone to agree on a single definition of customer. It lets everyone stay focused on what will help them run their part of the business.

While many refinements are possible, we’ve outlined the core of a data-centric solution to the knotty problem of managing customer data. The semantic analysis reveals a simple way to capture information about customer interactions and agreements. A knowledge graph supports 360 degree views of the data, and an inference engine allows us to populate classes automatically without writing a single line of code.

I hope you can glean some ideas from this discussion to help your business, and that you get a sense of why semantics, knowledge graphs, and model-driven-everything are three keys to data-centric architecture.

Dispose, Delete, and Discard: Keep your Enterprise Data Tidy Part 3

June 16, 2021December 16, 2020 by Semantic Arts Admin

Those who are familiar with Marie Kondo know that she is a ruthless disposer. If you’ve read parts one and two of this series, you know that the process is more nuanced than just “throw it all away,” but we’ve come to the point in the process where it’s important to focus on discarding. If you haven’t read parts one and two of this series, please do so; they provide context for the content of this post. Armed with categories that work for your organization and a solid set of values that the data you keep must uphold to be useful to your business, this part of the process is primarily dedicating time to pruning your files and records, and documentation.

Data Lifecycle Policies

“The fact that you possess a surplus of things that you can’t bring yourself to discard doesn’t mean you are taking good care of them. In fact, it is quite the opposite.” It’s interesting to note that, while there are many book collectors who lament Kondo’s popularity and cry, “You can pry my books out of my cold, dead hands,” there aren’t many librarians who hold this sentiment. Professionals know that collections must be pruned and managed. In fact, your organization may have one or more policies about managing data and documents. At a minimum, data lifecycle policies cover three points of a document’s existence within an organization: creation or acquisition, use and storage, and disposition. These policies may be driven by the systems used to manage your documents (Microsoft SharePoint comes to mind) or they may be driven by government mandates. These should be your guide on what and when you discard. If your organization has these policies outlined clearly, the hard work is already done, and you can begin using parts one and two as your guide to systematically deleting unneeded data and documentation. It may also be that some of this lifecycle management functionality is encoded in your systems, but it’s important to understand the policies if you’re making the decisions about data disposition. If your organization does not have a data lifecycle policy, you can explore creating one while you work on becoming data centric.

Data Configuration Management

Outside of an overarching strategy or policy for managing your organization’s data and information, your organization may have various configuration management tools in place (e.g., Git or Subversion) to manage drafts and backups. Many large organizations use file sharing systems to govern who has privileges to directories and files. If you’re attempting to KonMari your files when such systems are in place, it will be necessary to work collaboratively to get access to the files in your control.

When do you actually discard???

One of the key ideas in Marie Kondo’s method is that when you discard, you only discard your own belongings. If you are the owner and CTO of a company, then you have the freedom to discard what no longer sparks joy. In a large company, that question of ownership is far more complex and possibly beyond the reader’s paygrade. It might be beyond the CEO’s paygrade. It is certainly beyond the paygrade of the writer, except with a select few files on a laptop and in a removable storage device used for backups. But the question of ownership can often be established by completing the work recommended in this series of blog posts. And once you’ve established ownership, even complex ownership, you can use metadata to describe ownership and provenance, making it easier to manage that data’s future state, discarded or otherwise.

Futureproofing your Data

Now that we’ve considered the end of the data lifecycle management picture, take a look at the start—data acquisition and creation. If you’ve done the work so far of identifying your business processes and assessed how well your data supports your goals and aligned to your data lifecycle management policy (formal or otherwise), you know how important it is to also consider the introduction of new data. We touched on this in the first two parts, but there’s a subtle difference between considering how data came to be in your collection and considering data that you will include in your collection from this point forward.

This is something you can specify with policy, and it’s something you can anticipate with a robust ontology. However, it’s not as simple as building robust metadata. An ontology that is carefully anchored to your organization’s processes, has sufficient input from the right subject matter experts, and is developed within a hospitable IT infrastructure, is far more likely to be a sound gatekeeper for your incoming data.

In the IT industry, this is referred to as Futureproofing, and is designed to minimize the need for down-stream development to make corrections to work you’re doing now. It’s often a judgment call as to whether the application or system is introducing too much technical debt, but there is no argument that being able to understand each piece of data that goes into your system is critical to avoiding such debt. The way to ensure your data will be understandable downstream is to have adequate metadata. If you want your data to be sophisticated and able to support complex information needs, you need to use semantics.

“The secret to maintaining an uncluttered room is to pursue ultimate simplicity in storage so that you can tell at a glance how much you have.” -Marie Kondo

Read Part 1: Does your Data Spark Joy?

Read Part 2: Setting the Stage for Success

Written by Meika Ungricht

A Data Engineer’s Guide to Semantic Modelling

June 14, 2021November 25, 2020 by Semantic Arts Admin

While on her semantic modelling journey and as a Data Engineer herself, Ilaria Maresi encountered a range of challenges. There was not one deﬁnite source where she could quickly look things up, many of the resources were extremely technical and geared towards a more experienced audience while others were too wishy-washy. Therefore, she decided to compose this 50-page document where she explains semantic modelling and her most important lessons-learned – all in an engaging and down-to-earth writing style.

She starts off with the basics: what is a semantic model and why should you consider building one? Obviously, this is best explained by using a famous rock band as an example. In this way, you learn to draw the basic elements of a semantic model and some fun facts about Led Zeppelin at the same time!

For your model to actually work, it is essential that machines can also understand these fun facts. This might sound challenging if you are not a computer scientist but this guide will walk you through it step-by-step – it even has pictures of baby animals! You will learn how to structure your model in Resource Description Framework (RDF) and give it meaning with the vocabulary extension that wins the prize for cutest acronym: Web Ontology Language (OWL).

All other important aspects of semantic modelling will be discussed. For example, how to make sure we all talk about the same Led Zeppelin by using Uniform Resource Identiﬁers (URIs). Moreover, you are not the first one thinking and learning about knowledge representation: many domain experts have spent serious time and effort in defining the major concepts of their field, called ontologies. To prevent you from re-inventing the wheel, we list the most important resources and explain their origin.

Are you a Data Engineer that has just started with semantic modelling? Want to refresh your memory? Maybe you have no experience with semantic modelling yet but feel it might come in handy? Well, this guide is for you!

Click here to access a data engineer’s guide to semantic modelling

Written by Tess Korthout

A Brief Introduction to the gist Semantic Model

June 14, 2021November 3, 2020 by Semantic Arts Admin

Phil Blackwood, Ph.D.

It’s no secret that most companies have silos of data and continue to create new silos. Data that has the same meaning is often represented hundreds or thousands of different ways as new data models are introduced with every new software application, resulting in a high cost of integration.

By contrast, the data-centric approach starts with the common meaning of the data to address the root cause of data silos:

An enterprise is data-centric to the extent that all application functionality is based on a single, simple, extensible, federate-able data model.

An early step along the way to becoming data-centric is to establish a semantic model of the common concepts used across your business. This might sound like a huge undertaking, and perhaps it will be if you start from scratch. A better option is to adopt an existing core semantic model that has been designed for businesses and has a track record of success, such as gist.

Gist is an open source semantic model created by Semantic Arts. It is the result of more than a decade of refinement based on data-centric projects done with major corporations in a variety of lines of business. Semantic Arts describes gist as “… designed to have the maximum coverage of typical business ontology concepts with the fewest number of primitives and the least amount of ambiguity.” The Wikipedia entry for upper ontologies compares gist to other ontologies, and gives a sense of why it is a match for corporate data management.

This blog post introduces gist by examining how some of the major Classes and Properties can be used. We will not go into much detail; just enough to convey the general idea.

Everyone in your company would probably agree that running the business involves products, services, agreements, and events like payments and deliveries. In turn, agreements and events involve “who, what, where, when, and why”, all of which are included in the gist model. Gist includes about 150 Classes (types of things), and different parts of the business can be often be modeled by adding sub-classes. Here are a few of the major Classes in gist:

Gist also includes about 100 standard ways things can be related to each other (Object Properties), such as:

owns
produces
governs
requires, prevents, or allows
based on
categorized by
part of
triggered by
occurs at (some place)
start time, end time
has physical location
has party (e.g. party to an agreement)

For example, the data representing a contract between a person and your company might include things like:

In gist, a Contract is a legally binding Agreement, and an Agreement is a Commitment involving two or more parties. It’s clear and simple. It’s also expressed in a way that is machine-readable to support automated inferences, Machine Learning, and Artificial Intelligence.

The items and relationships of the contract can be loaded into a knowledge graph, where each “thing” is a node and each relationship is an edge. Existing data can be mapped to this standard representation to make it possible to view all of your contracts through a single lens of terminology. The knowledge graph for an individual contract as sketched out above would look like:

Note that this example is just a starting point. In practice, every node in the diagram would have additional properties (arrows out) providing more detail. For example, the ID would link to a text string and to the party that allocated the ID (e.g. the state government that allocated a driver’s license ID). The CatalogItem would be a detailed Product or Service Specification.

In the knowledge graph, there would be a single Person entry representing a given individual, and if two entries were later discovered to represent the same person, they could be linked with a sameAs relationship.

Relationships in gist (Properties) are first class citizens that have a meaning independent of the things they link, making them highly re-usable. For example, identifiedBy is not limited to contracts, but can be used anywhere something has an ID. Note that the Properties in gist are used to define relationships between instances rather than Classes; there are also a few standard relationships between Classes such as subClassOf and equivalentTo.

The categorizedBy relationship is a powerful one, because it allows the meaning of an item to be specified by linking to a taxonomy rather than by creating new Classes. This pattern contributes to extensibility; adding new characteristics becomes comparable to adding valid values to a relational semantic model instead of adding new attributes.

Unlike traditional data models, the gist semantic model can be loaded into a knowledge graph and then the data is loaded into the same knowledge graph as an extension to the model. There is no separation between the conceptual, logical, and physical models. Similar queries can be used to discover the model or to view the data.

Gist uses the W3C OWL standard (Web Ontology Language), and you will need to understand OWL to get the most value out of gist. To get started with OWL for corporate data management, check out the book Demystifying OWL for the Enterprise, by Michael Uschold. There’s also a brief introduction to OWL and the way it uses set theory here.

The technology stack that supports OWL is well-established and has minimal vendor lock-in because of the simple standard data representation. A semantic model created in one knowledge graph (triple store) can generally be ported to another tool without too much trouble.

To explore gist in more detail, you can download an ontology editor such as Protégé and then select File > Open From URL and enter: https://ontologies.semanticarts.com/o/gistCore9.4.0 Once you have the gist model loaded, select Entities and then review the descriptions of Classes, Object Properties (relationships between things), and Data Properties (which point to string or numeric values with no additional properties). If you want to investigate gist in an orderly sequence, I’d suggest viewing items in groups of “who, what, when, where, and how.”

Take a look at gist. It’s worth your time, because having a standard set f common terms like gist is a significant step toward reversing the trend toward more and more expensive data silos.

Click here to learn more about gist.

Setting the Stage for Success Part 2

June 14, 2021September 28, 2020 by Semantic Arts Admin

Envisioning Your Dream System with the Marie Kondo Method

Before you begin gathering your belongings, discarding, or reorganizing, Marie Kondo asks you to envision your dream lifestyle. She insists that this is the critical first step to ensuring success Envisioning Your Dream System with the Marie Kondo Method with her method, and she provides some guidance on how to do so and examples from her clients. The example Marie Kondo uses in her book is a young woman who lives in a tiny apartment, typical of Japanese cities. Her floor is covered with things and her bed is a storage space when she isn’t sleeping on it. She comes home from work tired and her living space compounds that exhaustion. Maria Kondo has a dream and that dream is simple: to have the space be free from clutter, like a hotel suite, where she can come home and relax with tea and a bath before bed.

While the situation may be different for someone who has responsibility for stores of corporate data and systems, the process of envisioning your ideal environment is not. As you begin to examine your systems, information architecture, data—an information landscape, in general—it’s absolutely critical to have in mind what you want. Having in mind “better” or “new technology” leads you towards trends and vendors with cool product features that may meet your needs, but more likely will end up contributing to the data and system clutter in the long run. It may seem like a simplistic question, “What do you want?” but your efforts in defining that will help you navigate the marketplace of emerging technology. At this point, it is important not to focus on the process or the items in front of you that you may or may not want to keep; rather, envisioning your ideal end-state, be it a living space filled with only things you love or a database filled only with data that supports your business, is what empowers you to move forward.

If you’re a savvy tech professional, you’re already thinking, “This is the requirements gathering process,” and you would be right. There is no shortage of requirements gathering methodologies out there and most of them are pretty good. If it gets you to envision an ideal that is vendor and tool agnostic and is based on the needs and desires of your key stakeholders and end-users, your method is fine. If your requirements include things like, “better search functionality,” or, “more insight into what data we have,” it’s very likely that you’re also in need of some data decluttering.

Get Started by Defining your Categories

The Marie Kondo method requires you to see your belongings in two overarching categories: things that spark joy and everything else. Everything else should be discarded. For our purposes, data that sparks joy is data that serves your business. It is helpful to look at the antithesis of joy to get an idea of what should be kept or discarded. For example, if you are facing an audit, the Get Started by Defining your Categories antithesis of joy is not being able to produce the documentation that the auditor needs to conduct the audit. That could be because you can’t access it, because what you have isn’t what they need, you don’t have what they need, or what they need is too difficult to find amidst data and information that you have. In this example, the information that allows you to have peace of mind during an audit is what you should keep. The bigger pattern here is that it’s important to know what business processes, data flows, decision points, and dependencies are impacting your business, and what the inputs and outputs are to those process steps.

Before you can begin to discard by category, you must know what categories drive your business. Marie Kondo starts by outlining a series of categories that guide her clients through the process of discarding. She starts with clothing, then books, then papers, then everything else. She breaks down these categories even further, allowing people with astonishingly large and complex collections of things to take a systematic approach to decluttering. With organizational data, this approach will work, but the way you define the categories depends on the kind of organization you are.

The categories you need should emerge out of your efforts at process improvement. From Investopedia: “Kaizen is a Japanese term meaning ‘change for the better’ or ‘continuous improvement.’ It is a Japanese business philosophy regarding the processes that continuously improve operations and involve all employees. Kaizen sees improvement in productivity as a gradual and methodical process.”(1) Often, semantic work is done alongside large-scale business process improvement efforts. Businesses want to know what the information inputs and outputs are, and they want to know how that information influences decisions and actions. These efforts are often iterative, and it’s not uncommon to uncover conflicts in how people understand the data, or what they use it for. I remember working with a team of medical experts who all used “normal” as a data point in their diagnostic processes. It took our team years to come up with a good way to encode “normal” because each expert meant something different by the term. There were heated debates about whether or not “normal” meant within the context of a patient who might be legally blind, in which case a low visual acuity score might be considered normal, or if normal was a cohort or population average, in which case that patient’s low score was not normal. These conflicts and pain points are like mismatched socks and poorly-fitting jeans: they’re your clue about where you need to look at your data. This is also the starting point for determining which categories you need to use to evaluate your data. Do not strong-arm your conflicts into silence; use them to light the way ahead.

Building the Categories that Matter to You with the Marie Kondo Method

The Marie Kondo method categories are presented in an order that begins by teaching us what it means to feel that spark of joy (clothing) and works through household items that might be useful but not particularly exciting, and ends with items of sentimental value (photos and heirlooms). One of the big challenges of applying the Marie Kondo method to organizational data is that this rubric and categorization doesn’t easily map to things like clothing and photos. However, the underlying idea of what is essential to our survival and our comfort does easily translate to data. Don’t get Building the Categories that Matter to You with the Marie Kondo Method bogged down in the details too soon. Marie Kondo advises that you create subcategories according to your need.

When I was organizing my miscellaneous items, I uncovered some camping gear I had purchased a couple years ago with the intention of going on a long bike ride that involved camping at night. I was unable to go, so I packed the gear away for another time. As I went through the process of evaluating my belongings using the Marie Kondo method, I decided I’ve always enjoyed camping and I was going to make space in my life for it. I booked a camping trip for a few days, loaded my gear into a rental car, and put my gear to the test.

This camping trip was rich with lessons, pleasant and painful both. I took the gear I had bought for the bike trip, but since I had a car, I also supplemented it with larger and heavier items I knew would be useful now that I had the space. Things I thought would be overkill turned out to be very useful: extra flashlight, large water container, spare book of matches, extra pillow, folding chair, extra plastic tub, etc. Things I was certain I would use ended up coming home unused: pancake mix, spare sleeping bag, two changes of clothes, packets of sample skin and hair products, etc. And I found there were things I needed in the moment that I didn’t have: a lighter, fire starters, strong bug spray, an umbrella, and 4WD. The underlying lesson here is that your gear should enable the activities you want to do. And different types of gear serve different types of experiences, even if they’re categorically similar. If you look at the gear belonging to someone who likes glamping and compare it to someone who likes to through-hike the Appalachian Trail, there may not be a whole lot of overlap in the specifics, even though the categories are the same. This is because your process determines your needs.

Camping gear is often designed to meet basic human needs and provide basic creature comforts. Complex business processes can draw from this analog example, in that your categories are going to appear around the essential tasks of your business. In many of the projects I’ve done in the past, some effort has been made to identify key information areas that need development using Continuous Improvement or Kaizen principles. Information artifacts, key concepts, subject headings, however you choose to refer to them, are the overarching conceptual subjects that drive your business. Using the camping example, this might look like the following: Sleep, Food, Hygiene, Recreation. If you break down sleep, the process could be as simple as laying out a tarp and a blanket and wrapping yourself up in it and going to sleep. Or it might be as complex as building a platform, building a tent, constructing a bed frame, unfolding sheets, pillows, and blankets, securing the tent, and finally going to sleep. In both scenarios, there are categories for sleep surface, shelter, and bedding.

Another key comparison comes up when considering duplication and re-use. Chances are, you aren’t going to need a different sleeping bag for each camping scenario. It’s interesting to note that if you go into an outdoor supply outfitter looking for sleeping bags, you will find a range of options based on very specific situations. If your business is camping, you just might need several different bags! But for most people, this just adds complexity and expense. You do want to make sure the zipper works so you can control the amount of body heat you’re trapping in the bag, and if you’re camping in the cold you might add a blanket. But otherwise, a multi-season sleeping bag that’s comfortable and easy to care for is going to be re-used over and over in many camping scenarios.

For a business, the examples might range from a child’s lemonade stand to Starbucks. The information objects are going to be similar: menu, supplies metrics. Once you’ve established these categories, you can look at your data systematically. Coming up with these key concepts allows you to define the scope of your work and priorities for development.

What’s Next?

Now that you’ve got a sense of how to create a list of categories based on your business processes, you can begin the process of discarding. As with the process so far, it’s not as simple as it is for your possessions at home. Disposition of data within an enterprise, large or small, comes with politics and legal requirements. In part three, you will see some ideas about where to start with data disposition and how to use your company’s data disposition strategies to your advantage.

Click Here to Read Part 1 of this Series

Footnotes:
(1) https://www.investopedia.com/terms/k/kaizen.asp