The Greatest Sin of Tabular Data

We recently came across this great article titled “The greatest sin of tabular data”. It is an excellent summary of the kind of work we do for our clients and how they benefit.

You can read it at The greatest sin of tabular data · A blog @ nonodename.com

The journey of capturing the meaning to data is an elusive process.  If 80% of data science is simply data wrangling, how can we do better actually providing value by making sense of that data?

With a disciplined approach and levering RDF capabilities, Semantic Arts can help to create clear, defined data, saving time and money and driving true value instead of getting bogged down in simply trying to understand data.

As stated by the author, “We can do better!”

Reach out to Semantic Arts today to see how we can help.

Original article at nonodename.com/Dan Bennett via LinkedIn Post.

Get the gist: start building simplicity now

While organizing data has always been important, a noticeably profound interest in optimizing information models with Semantic Knowledge graphs has arisen.  LinkedIn, AirBnB, in addition to giants Google and Amazon use graphs, but without a model for connecting concepts with rules for membership buyer recommendations and enhanced searchability (follow your nose) capabilities would lack accuracy.
Drum roll please … Introduce the ontology.
It is a model that supports semantic knowledge graph reasoning, inference, and provenance enablement.  Think of an ontology as the brain giving messages to the nervous systems (the knowledge graph).  An ontology organizes data into well-defined categories with clearly defined relationships.  This model represents a foundational starting point that allows humans and machines to read, understand, and infer knowledge based on its classification.  In short, this automatically figures out what is similar and what is different.
We’re asked often, where do I start?
Enter ‘gist’ a minimalist business ontology (model) to springboard transitioning information into knowledge.  With more than a decade of refinement grounded in simplicity, ‘gist’ is designed to have the maximum coverage of typical business ontology concepts with the fewest number of primitives and least amount of ambiguity.  ‘gist’ is available for free under a Creative Commons license and is being applied and extended within a number of business use cases and utilized by countless industries.
Recently, senior Ontologist Michael Uschold has been sharing an introductory overview of ‘gist’, maintained by Semantic Arts.
One compelling difference from most publicly available ontologies, ‘gist’ has an active governance and best practices community, called the gist Council. The council meets virtually on the first Thursday of every month to discuss how to use ‘gist’ and make suggestions on its evolution.
See Part I of Michael’s introduction here:

See Part II of Michael’s introduction here:

Stay tuned for the final installment!

Interested in gist? Visit Semantic Arts – gist

See more informative videos on Semantic Arts – YouTube

The Data-Centric Revolution: Headless BI and the Metrics Layer

Read more from Dave McComb in his recent article on The Data Administration Newsletter.

“The data-centric approach to metrics puts the definition of the metrics in the shared data. Not in the BI tool, not in code in an API. It’s in the data, right along with the measurement itself.”

Link: The Data-Centric Revolution: Headless BI and the Metrics Layer – TDAN.com

Read more of Dave’s articles: mccomb – TDAN.com

The 90s Are Over, Let’s Stop Basing Capital Cost Decisions on Lagging Indicators

Let’s Stop Basing Capital Cost Decisions on Lagging IndicatorsLet’s Stop Basing Capital Cost Decisions on Lagging Indicators. Remember the good old days of emerging digital technology? Accessing information through a dial-up internet connection. Saving data to floppy discs or CDs. Sending emails to have them printed for storage. Mobile connectivity was new, exciting, and… slow compared to what we have today.

In the energy sector, data access limitations influenced the structure of traditional execution workflows for capital projects. It was common – and still is – for project execution models to focus on document-based deliverables over raw data.

The inherent problem with a document-centric approach is that documents take time to produce. Let’s imagine the workflow for a technology evaluation study that:

  • Begins with initial input from multiple departments.
  • Gets reviewed by 2-3 management layers on the project organizational chart.
  • Finally lands on the desk of a senior decision-maker.

This process could easily take two weeks or longer. But what happens during those two weeks? Work doesn’t get paused. The project continues to progress. The information initially collected for the study no longer represents current project conditions. By the time it gets to the decision-maker, the study is based on two-week-old lagging indicators.

A lot can change on a project in that amount of time. Execution workflows built around lagging indicators tend to:

  • Lead to costly and unnecessary errors caused by decisions based on old information.
  • Stymie innovation with rigid and slow processes that limit experimentation.

Click here to read more. 

Originally posted in: Digital Transformation

Click here to Read an Advanced Chapter from the Data-Centric Revolution by Dave McComb

A Data-Centric Approach to Managing Customer Data

by Phil Blackwood, Ph.D.

Without a doubt every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of each customer. However, this customer data is typically scattered across hundreds of applications, its meaning is embedded in code written years ago, and much of its value is locked away in silos. Compounding the problem, stakeholders in different parts of the business are likely to have different views of what the word “customer” means because they support different kinds of interactions with customers.

In this post, we’ll outline how to tackle these issues and unlock the value of customer data. We’ll use semantics to establish simple common terminology, show how a knowledge graph can provide 360 degree views, and explain how to classify data without writing code.

The semantic analysis will have three parts: first we consider the simple use case illustrated in the diagram below, then take a much broader view by looking at Events, and finally we will dive deeper into the meaning of the diagram by using use the concept of Agreements.

Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

The diagram shows an event in which a customer purchases a shirt from a shop. Ask stakeholders around your company what types of events customers participate in, and you are likely to get a long list. It might look something like this (the verbs are from the viewpoint of your company):

  • Answer general questions about products and services
  • Create billing account for products and services
  • Create usage account for a product or service
  • Deliver product or service (including right-to-use)
  • Finalize contract for sale of product or service
  • Help a customer use a product or service
  • Identify a visitor to our web site.
  • Determine a recommender of a product or service
  • Find a user of a product or service
  • Migrate a customer from one service to another
  • Migrate a service from one customer to another
  • Prepare a proposal for sale of product or service
  • Receive customer agreement to terms and conditions
  • Receive payment for product or service
  • Rent product or service
  • Sell product or service
  • Send bill for product or service
  • Ship product

We can model these events using classes from the gist ontology, with one new class consisting of the categories of events listed above. When we load data into our knowledge graph, we link each item to its class and we relate the items to each other with object properties. For example, an entry for one event might look like:

Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

By using categories instead of creating 18 new classes of events, we keep the model simple and flexible. We can round out the picture by realizing that the Person could instead be an Organization (company, non-profit, or government entity) and the Product could instead be a Service (e.g. window washing).

In a green-field scenario, the model and the data are seamlessly linked in a knowledge graph and we can answer many different questions about our customers. However, in most companies a considerable amount of customer data exists in application-centric silos. To unlock existing customer data, we have to first understand its meaning and then we can link it into the knowledge graph by using the R2RML data mapping language. This data federation allows us to write queries using the simple, standard semantic model and get results that include the existing data.

For any node in the knowledge graph, we have a 360 degree view of the data about the node and its context. A Person node can be enriched with data from social media. An Organization node can be enriched with data about corporate structure, subsidiaries, or partnerships.

Now let’s pivot from the broad event-based perspective to look more closely at the meaning of the original example. Implicit in the idea of a sale is an agreement between the buyer and the seller; once the agreement is made, the seller is obligated to deliver something, while the buyer must pay for it. The “something” is a product or service. We can model the transaction like this:Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

This basic pattern of agreement and obligation covers many use cases. The agreement could be the simple act of placing the shirt on the check-out counter, or it could be a contract. Delivery and payment could coincide in time, or not. Payments or deliveries, or both, could be monthly.

If our Contract Administration group wants a simple way to identify all the customers who have a contract, we can create a Class named ContractCustomer and populate it automatically from the data in our knowledge graph. To do this, we would write an expression similar to a query that defines what we mean by ContractCustomer, declare the Class to be equivalent to the expression, and then run an off-the-shelf, standards-based inference engine to populate the new class. With no code needed … it’s model-driven.

This method of automatically populating classes can be used to support the wide variety of needs of stakeholders in different parts of the company, even though they do not have the same definition of customer. For example, you could provide classes like PayingCustomer and ProductUsers that can be used to simplify the way the data is accessed or to become building blocks in the model to build upon. With this approach, there is no need to try to get everyone to agree on a single definition of customer. It lets everyone stay focused on what will help them run their part of the business.

While many refinements are possible, we’ve outlined the core of a data-centric solution to the knotty problem of managing customer data. The semantic analysis reveals a simple way to capture information about customer interactions and agreements. A knowledge graph supports 360 degree views of the data, and an inference engine allows us to populate classes automatically without writing a single line of code.

I hope you can glean some ideas from this discussion to help your business, and that you get a sense of why semantics, knowledge graphs, and model-driven-everything are three keys to data-centric architecture.

Dispose, Delete, and Discard: Keep your Enterprise Data Tidy Part 3

Those who are familiar with Marie Kondo know that she is a ruthless disposer. If you’ve read parts one and two of this series, you know that the process is more nuanced than just “throw it all away,” but we’ve come to the point in the process where it’s important to focus on discarding. If you haven’t read parts one and two of this series, please do so; they provide context for the content of this post.  Armed with categories that work for your organization and a solid set of values that the data you keep must uphold to be useful to your business, this part of the process is primarily dedicating time to pruning your files and records, and documentation.

Data Lifecycle Policies

“The fact that you possess a surplus of things that you can’t bring yourself to discard doesn’t mean you are taking good care of them.  In fact, it is quite the opposite.” It’s interesting to note that, while there are many book collectors who lament Kondo’s popularity and cry, “You can pry my books out of my cold, dead hands,” there aren’t many librarians who hold this sentiment.  Professionals know that collections must be pruned and managed. In fact, your organization may have one or more policies about managing data and documents.  At a minimum, data lifecycle policies cover three points of a document’s existence within an organization: creation or acquisition, use and storage, and disposition.  These policies may be driven by the systems used to manage your documents (Microsoft SharePoint comes to mind) or they may be driven by government mandates. These should be your guide on what and when you discard.  If your organization has these policies outlined clearly, the hard work is already done, and you can begin using parts one and two as your guide to systematically deleting unneeded data and documentation. It may also be that some of this lifecycle management functionality is encoded in your systems, but it’s important to understand the policies if you’re making the decisions about data disposition. If your organization does not have a data lifecycle policy, you can explore creating one while you work on becoming data centric.

Data Configuration Management

Outside of an overarching strategy or policy for managing your organization’s data and information, your organization may have various configuration management tools in place (e.g., Git or Subversion) to manage drafts and backups. Many large organizations use file sharing systems to govern who has privileges to directories and files.  If you’re attempting to KonMari your files when such systems are in place, it will be necessary to work collaboratively to get access to the files in your control.

When do you actually discard???

One of the key ideas in Marie Kondo’s method is that when you discard, you only discard your own belongings.  If you are the owner and CTO of a company, then you have the freedom to discard what no longer sparks joy.  In a large company, that question of ownership is far more complex and possibly beyond the reader’s paygrade. It might be beyond the CEO’s paygrade. It is certainly beyond the paygrade of the writer, except with a select few files on a laptop and in a removable storage device used for backups.  But the question of ownership can often be established by completing the work recommended in this series of blog posts.  And once you’ve established ownership, even complex ownership, you can use metadata to describe ownership and provenance, making it easier to manage that data’s future state, discarded or otherwise.

Futureproofing your Data

Now that we’ve considered the end of the data lifecycle management picture, take a look at the start—data acquisition and creation.  If you’ve done the work so far of identifying your business processes and assessed how well your data supports your goals and aligned to your data lifecycle management policy (formal or otherwise), you know how important it is to also consider the introduction of new data.  We touched on this in the first two parts, but there’s a subtle difference between considering how data came to be in your collection and considering data that you will include in your collection from this point forward.

This is something you can specify with policy, and it’s something you can anticipate with a robust ontology. However, it’s not as simple as building robust metadata.  An ontology that is carefully anchored to your organization’s processes, has sufficient input from the right subject matter experts, and is developed within a hospitable IT infrastructure, is far more likely to be a sound gatekeeper for your incoming data.

In the IT industry, this is referred to as Futureproofing, and is designed to minimize the need for down-stream development to make corrections to work you’re doing now. It’s often a judgment call as to whether the application or system is introducing too much technical debt, but there is no argument that being able to understand each piece of data that goes into your system is critical to avoiding such debt. The way to ensure your data will be understandable downstream is to have adequate metadata.  If you want your data to be sophisticated and able to support complex information needs, you need to use semantics.

“The secret to maintaining an uncluttered room is to pursue ultimate simplicity in storage so that you can tell at a glance how much you have.” -Marie Kondo

Read Part 1: Does your Data Spark Joy?

Read Part 2: Setting the Stage for Success

Written by Meika Ungricht

A Data Engineer’s Guide to Semantic Modelling

While on her semantic modelling journey and as a Data Engineer herself, Ilaria Maresi encountered a range of challenges. There was not one definite source where she could quickly look things up, many of the resources were extremely technical and geared towards a more experienced audience while others were too wishy-washy. Therefore, she decided to compose this 50-page document where she explains semantic modelling and her most important lessons-learned – all in an engaging and down-to-earth writing style.

She starts off with the basics: what is a semantic model and why should you consider building one? Obviously, this is best explained by using a famous rock band as an example. In this way, you learn to draw the basic elements of a semantic model and some fun facts about Led Zeppelin at the same time!

For your model to actually work, it is essential that machines can also understand these fun facts. This might sound challenging if you are not a computer scientist but this guide will walk you through it  step-by-step – it even has pictures of baby animals! You will learn how to structure your model in Resource Description Framework (RDF) and give it meaning with the vocabulary extension that wins the prize for cutest acronym: Web Ontology Language (OWL).

All other important aspects of semantic modelling will be discussed. For example, how to make sure we all talk about the same Led Zeppelin by using Uniform Resource Identifiers (URIs). Moreover, you are not the first one thinking and learning about knowledge representation: many domain experts have spent serious time and effort in defining the major concepts of their field, called ontologies. To prevent you from re-inventing the wheel, we list the most important resources and explain their origin.

Are you a Data Engineer that has just started with semantic modelling? Want to refresh your memory? Maybe you have no experience with semantic modelling yet but feel it might come in handy? Well, this guide is for you!

Click here to access a data engineer’s guide to semantic modelling

Written by Tess Korthout

A Brief Introduction to the gist Semantic Model

Phil Blackwood, Ph.D.

It’s no secret that most companies have silos of data and continue to create new silos.  Data that has the same meaning is often represented hundreds or thousands of different ways as new data models are introduced with every new software application, resulting in a high cost of integration.

By contrast, the data-centric approach starts with the common meaning of the data to address the root cause of data silos:

An enterprise is data-centric to the extent that all application functionality is based on a single, simple, extensible, federate-able data model.

An early step along the way to becoming data-centric is to establish a semantic model of the common concepts used across your business.  This might sound like a huge undertaking, and perhaps it will be if you start from scratch.  A better option is to adopt an existing core semantic model that has been designed for businesses and has a track record of success, such as gist.

Gist is an open source semantic model created by Semantic Arts. 

Gist is an open source semantic model created by Semantic Arts.  It is the result of more than a decade of refinement based on data-centric projects done with major corporations in a variety of lines of business.  Semantic Arts describes gist as “… designed to have the maximum coverage of typical business ontology concepts with the fewest number of primitives and the least amount of ambiguity.”  The Wikipedia entry for upper ontologies compares gist to other ontologies, and gives a sense of why it is a match for corporate data management.

 

This blog post introduces gist by examining how some of the major Classes and Properties can be used.  We will not go into much detail; just enough to convey the general idea.

Everyone in your company would probably agree that running the business involves products, services, agreements, and events like payments and deliveries.  In turn, agreements and events involve “who, what, where, when, and why”, all of which are included in the gist model.  Gist includes about 150 Classes (types of things), and different parts of the business can be often be modeled by adding sub-classes.  Here are a few of the major Classes in gist:

Gist also includes about 100 standard ways things can be related to each other (Object Properties), such as:

  • owns
  • produces
  • governs
  • requires, prevents, or allows
  • based on
  • categorized by
  • part of
  • triggered by
  • occurs at (some place)
  • start time, end time
  • has physical location
  • has party (e.g. party to an agreement)

For example, the data representing a contract between a person and your company might include things like:

In gist, a Contract is a legally binding Agreement, and an Agreement is a Commitment involving two or more parties.  It’s clear and simple.  It’s also expressed in a way that is machine-readable to support automated inferences, Machine Learning, and Artificial Intelligence.

The items and relationships of the contract can be loaded into a knowledge graph, where each “thing” is a node and each relationship is an edge.  Existing data can be mapped to this standard representation to make it possible to view all of your contracts through a single lens of terminology.  The knowledge graph for an individual contract as sketched out above would look like:

Note that this example is just a starting point.  In practice, every node in the diagram would have additional properties (arrows out) providing more detail.  For example, the ID would link to a text string and to the party that allocated the ID (e.g. the state government that allocated a driver’s license ID).  The CatalogItem would be a detailed Product or Service Specification.

In the knowledge graph, there would be a single Person entry representing a given individual, and if two entries were later discovered to represent the same person, they could be linked with a sameAs relationship.

Relationships in gist (Properties) are first class citizens that have a meaning independent of the things they link, making them highly re-usable.  For example, identifiedBy is not limited to contracts, but can be used anywhere something has an ID.  Note that the Properties in gist are used to define relationships between instances rather than Classes; there are also a few standard relationships between Classes such as subClassOf and equivalentTo.

The categorizedBy relationship is a powerful one, because it allows the meaning of an item to be specified by linking to a taxonomy rather than by creating new Classes.  This pattern contributes to extensibility; adding new characteristics becomes comparable to adding valid values to a relational semantic model instead of adding new attributes.

Unlike traditional data models, the gist semantic model can be loaded into a knowledge graph and then the data is loaded into the same knowledge graph as an extension to the model.  There is no separation between the conceptual, logical, and physical models.  Similar queries can be used to discover the model or to view the data.

Gist uses the W3C OWL standard (Web Ontology Language), and you will need to understand OWL to get the most value out of gist.  To get started with OWL for corporate data management, check out the book Demystifying OWL for the Enterprise, by Michael Uschold.  There’s also a brief introduction to OWL and the way it uses set theory here.

The technology stack that supports OWL is well-established and has minimal vendor lock-in because of the simple standard data representation.  A semantic model created in one knowledge graph (triple store) can generally be ported to another tool without too much trouble.

To explore gist in more detail, you can download an ontology editor such as Protégé and then select File > Open From URL and enter: https://ontologies.semanticarts.com/o/gistCore9.4.0  Once you have the gist model loaded, select Entities and then review the descriptions of Classes, Object Properties (relationships between things), and Data Properties (which point to string or numeric values with no additional properties).  If you want to investigate gist in an orderly sequence, I’d suggest viewing items in groups of “who, what, when, where, and how.”

Take a look at gist.  It’s worth your time, because having a standard set f common terms like gist is a significant step toward reversing the trend toward more and more expensive data silos.

Click here to learn more about gist.

Setting the Stage for Success Part 2

Envisioning Your Dream System with the Marie Kondo Method

Before you begin gathering your belongings, discarding, or reorganizing, Marie Kondo asks you to envision your dream lifestyle.  She insists that this is the critical first step to ensuring success Envisioning Your Dream System with the Marie Kondo Methodwith her method, and she provides some guidance on how to do so and examples from her clients.  The example Marie Kondo uses in her book is a young woman who lives in a tiny apartment, typical of Japanese cities.  Her floor is covered with things and her bed is a storage space when she isn’t sleeping on it.  She comes home from work tired and her living space compounds that exhaustion.  Maria Kondo has a dream and that dream is simple: to have the space be free from clutter, like a hotel suite, where she can come home and relax with tea and a bath before bed.

While the situation may be different for someone who has responsibility for stores of corporate data and systems, the process of envisioning your ideal environment is not.  As you begin to examine your systems, information architecture, data—an information landscape, in general—it’s absolutely critical to have in mind what you want.  Having in mind “better” or “new technology” leads you towards trends and vendors with cool product features that may meet your needs, but more likely will end up contributing to the data and system clutter in the long run.  It may seem like a simplistic question, “What do you want?” but your efforts in defining that will help you navigate the marketplace of emerging technology.  At this point, it is important not to focus on the process or the items in front of you that you may or may not want to keep; rather, envisioning your ideal end-state, be it a living space filled with only things you love or a database filled only with data that supports your business, is what empowers you to move forward.

If you’re a savvy tech professional, you’re already thinking, “This is the requirements gathering process,” and you would be right.  There is no shortage of requirements gathering methodologies out there and most of them are pretty good.  If it gets you to envision an ideal that is vendor and tool agnostic and is based on the needs and desires of your key stakeholders and end-users, your method is fine.  If your requirements include things like, “better search functionality,” or, “more insight into what data we have,” it’s very likely that you’re also in need of some data decluttering.

Get Started by Defining your Categories

The Marie Kondo method requires you to see your belongings in two overarching categories: things that spark joy and everything else.  Everything else should be discarded.  For our purposes, data that sparks joy is data that serves your business.  It is helpful to look at the antithesis of joy to get an idea of what should be kept or discarded.  For example, if you are facing an audit, the Get Started by Defining your Categoriesantithesis of joy is not being able to produce the documentation that the auditor needs to conduct the audit.  That could be because you can’t access it, because what you have isn’t what they need, you don’t have what they need, or what they need is too difficult to find amidst data and information that you have.  In this example, the information that allows you to have peace of mind during an audit is what you should keep. The bigger pattern here is that it’s important to know what business processes, data flows, decision points, and dependencies are impacting your business, and what the inputs and outputs are to those process steps.

Before you can begin to discard by category, you must know what categories drive your business.  Marie Kondo starts by outlining a series of categories that guide her clients through the process of discarding.  She starts with clothing, then books, then papers, then everything else.  She breaks down these categories even further, allowing people with astonishingly large and complex collections of things to take a systematic approach to decluttering. With organizational data, this approach will work, but the way you define the categories depends on the kind of organization you are.

The categories you need should emerge out of your efforts at process improvement. From Investopedia: “Kaizen is a Japanese term meaning ‘change for the better’ or ‘continuous improvement.’ It is a Japanese business philosophy regarding the processes that continuously improve operations and involve all employees. Kaizen sees improvement in productivity as a gradual and methodical process.”(1) Often, semantic work is done alongside large-scale business process improvement efforts.  Businesses want to know what the information inputs and outputs are, and they want to know how that information influences decisions and actions.  These efforts are often iterative, and it’s not uncommon to uncover conflicts in how people understand the data, or what they use it for.  I remember working with a team of medical experts who all used “normal” as a data point in their diagnostic processes.  It took our team years to come up with a good way to encode “normal” because each expert meant something different by the term.  There were heated debates about whether or not “normal” meant within the context of a patient who might be legally blind, in which case a low visual acuity score might be considered normal, or if normal was a cohort or population average, in which case that patient’s low score was not normal.  These conflicts and pain points are like mismatched socks and poorly-fitting jeans: they’re your clue about where you need to look at your data. This is also the starting point for determining which categories you need to use to evaluate your data. Do not strong-arm your conflicts into silence; use them to light the way ahead.

Building the Categories that Matter to You with the Marie Kondo Method

The Marie Kondo method categories are presented in an order that begins by teaching us what it means to feel that spark of joy (clothing) and works through household items that might be useful but not particularly exciting, and ends with items of sentimental value (photos and heirlooms).  One of the big challenges of applying the Marie Kondo method to organizational data is that this rubric and categorization doesn’t easily map to things like clothing and photos.  However, the underlying idea of what is essential to our survival and our comfort does easily translate to data.  Don’t getBuilding the Categories that Matter to You with the Marie Kondo Method bogged down in the details too soon. Marie Kondo advises that you create subcategories according to your need.

When I was organizing my miscellaneous items, I uncovered some camping gear I had purchased a couple years ago with the intention of going on a long bike ride that involved camping at night.  I was unable to go, so I packed the gear away for another time.  As I went through the process of evaluating my belongings using the Marie Kondo method, I decided I’ve always enjoyed camping and I was going to make space in my life for it.  I booked a camping trip for a few days, loaded my gear into a rental car, and put my gear to the test.

This camping trip was rich with lessons, pleasant and painful both. I took the gear I had bought for the bike trip, but since I had a car, I also supplemented it with larger and heavier items I knew would be useful now that I had the space.  Things I thought would be overkill turned out to be very useful: extra flashlight, large water container, spare book of matches, extra pillow, folding chair, extra plastic tub, etc.  Things I was certain I would use ended up coming home unused: pancake mix, spare sleeping bag, two changes of clothes, packets of sample skin and hair products, etc. And I found there were things I needed in the moment that I didn’t have: a lighter, fire starters, strong bug spray, an umbrella, and 4WD.  The underlying lesson here is that your gear should enable the activities you want to do.  And different types of gear serve different types of experiences, even if they’re categorically similar. If you look at the gear belonging to someone who likes glamping and compare it to someone who likes to through-hike the Appalachian Trail, there may not be a whole lot of overlap in the specifics, even though the categories are the same.  This is because your process determines your needs.

Camping gear is often designed to meet basic human needs and provide basic creature comforts.  Complex business processes can draw from this analog example, in that your categories are going to appear around the essential tasks of your business. In many of the projects I’ve done in the past, some effort has been made to identify key information areas that need development using Continuous Improvement or Kaizen principles.  Information artifacts, key concepts, subject headings, however you choose to refer to them, are the overarching conceptual subjects that drive your business.  Using the camping example, this might look like the following: Sleep, Food, Hygiene, Recreation. If you break down sleep, the process could be as simple as laying out a tarp and a blanket and wrapping yourself up in it and going to sleep.  Or it might be as complex as building a platform, building a tent, constructing a bed frame, unfolding sheets, pillows, and blankets, securing the tent, and finally going to sleep. In both scenarios, there are categories for sleep surface, shelter, and bedding.

Another key comparison comes up when considering duplication and re-use.  Chances are, you aren’t going to need a different sleeping bag for each camping scenario.  It’s interesting to note that if you go into an outdoor supply outfitter looking for sleeping bags, you will find a range of options based on very specific situations.  If your business is camping, you just might need several different bags!  But for most people, this just adds complexity and expense.  You do want to make sure the zipper works so you can control the amount of body heat you’re trapping in the bag, and if you’re camping in the cold you might add a blanket. But otherwise, a multi-season sleeping bag that’s comfortable and easy to care for is going to be re-used over and over in many camping scenarios.

For a business, the examples might range from a child’s lemonade stand to Starbucks. The information objects are going to be similar: menu, supplies metrics. Once you’ve established these categories, you can look at your data systematically.  Coming up with these key concepts allows you to define the scope of your work and priorities for development.

What’s Next?

Now that you’ve got a sense of how to create a list of categories based on your business processes, you can begin the process of discarding.  As with the process so far, it’s not as simple as it is for your possessions at home.  Disposition of data within an enterprise, large or small, comes with politics and legal requirements.  In part three, you will see some ideas about where to start with data disposition and how to use your company’s data disposition strategies to your advantage.

Click Here to Read Part 1 of this Series

Footnotes:
(1) https://www.investopedia.com/terms/k/kaizen.asp

A Mathematician and an Ontologist walk into a bar…

The Ontologist and Mathematician should be able to find common ground because Cantor introduced set theory into the foundation of mathematics, and W3C OWL uses set theory as a foundation for ontology language.  Let’s listen in as they mash up Cantor and OWL …

Ontologist: What would you like to talk about?

Mathematician: Anything.

Ontologist: Pick a thing. Any. Thing. You. Like.

Mathematician: [looks across the street]

A Mathematician and an Ontologist walk into a bar…

Ontologist: Sure, why not?  Wells Fargo it is.  If we wanted to create an ontology for banking, we might need to have a concept of a company being a bank to differentiate it from other types of companies.  We would also want to generalize a bit and include the concept of Organization.

Mathematician: That’s simple in the world of sets.

A Mathematician and an Ontologist walk into a bar…

Ontologist: In my world, every item in your diagram is related to every other item.  For example, Wells Fargo is not only a Bank, but it is also an Organization.  Relationships to “Thing” are applied automatically by my ontology editor.  When we build our ontology, we would first enter the relationships in the diagram below (read it from the bottom to the top):

A Mathematician and an Ontologist walk into a bar…

Then we would run a reasoner to infer other relationships.  The result would look like this:

A Mathematician and an Ontologist walk into a bar…

Mathematician: My picture has “Banks” and yours has “Bank”.  You took off the “s”.

Ontologist: Well, yes, I changed all the set names to make them singular because that’s the convention for Class names.  Sorry.  But now that you mention it … whenever I create a new Class I use a singular name just like everyone else does, but I also check to see if the plural is the good name for the set of things in the Class.  If the plural doesn’t sound like a set, I rethink it.  Try that with “Tom’s Stamp Collection” and see what you get.

Mathematician: I’d say you would have to rethink that Class name if you wanted the members of the Class to be stamps.  Otherwise, people using your model might not understand your intent.  Is a Class more like a set, or more like a template?

Ontologist: Definitely not a template, unlike object-oriented programming.  More like a set where the membership can change over time.

Mathematician: OK.  S or no S, I think we are mostly talking about the same thing.  In fact, your picture showing the Classes separated out instead of nested reminds me of what Georg Cantor said: “A set is a Many that allows itself to be thought of as a One.”

Ontologist: Yes.  You can think of a Class as a set of real world instances of a concept that is used to describe a subject like Banking.  Typically, we can re-use more general Classes and only need to create a subclass to differentiate its members from the other members of the existing Class (like Bank is a special kind of Company).  We create or re-use a Class when we want to give the Things in it meaning and context by relating them to other things.

Mathematician: Like this?

A Mathematician and an Ontologist walk into a bar…

Ontologist: Exactly.  Now we know more about Joan, and we know more about Wells Fargo.  We call that a triple.

Mathematician: A triple.  How clever.

Ontologist: Actually, that’s the way we store all our data.  The triples form a knowledge graph.

Mathematician: Oh, now that’s interesting …  nice idea. Simple and elegant.  I think I like it.

Ontologist: Good.  Now back to your triple with Joan and Wells Fargo.  How would you generalize it in the world of sets?

Mathematician: Simple.  I call this next diagram a mapping, with Domain defined as the things I’m mapping from and Range defined as the things I’m mapping to.

A Mathematician and an Ontologist walk into a bar…

Ontologist: I call worksFor an Object Property.  For today only, I’m going to shorten that to just “Property”.  But.  Wait, wait, wait.  Domain and Range?

A Mathematician and an Ontologist walk into a bar…

In my world, I need to be careful about what I include in the Domain and Range, because any time I use worksFor, my reasoner will conclude that the thing on the left is in the Domain and the thing on the right is in the Range.

Ontologist continues: Imagine if I set the Domain to Person and the Range to Company, and then assert that Sparkplug the horse worksFor Tom the farmer.  The reasoner will tell me Sparkplug is a Person and Tom is a Company.  That’s why Domain and Range always raise a big CAUTION sign for me.  I always ask myself if there is anything else that might possibly be in the Domain or Range, ever, especially if the Property gets re-used by some else.  I need to define the Domain and Range broadly enough for future uses so I won’t end up trying to find the Social Security number of a horse.

Mathematician: Bummer.  Good luck with that.

Ontologist: Oh, thank you.  Now back your “mapping”.  I suppose you think of it as a set of arrows and you can have subsets of them.

Mathematician: Yes, pretty much.  If I wanted to be more precise, I would say a mapping is a set of ordered pairs.  I’m going to use an arrow to show what order the things are in; and voila, here is your set diagram for the concept:

A Mathematician and an Ontologist walk into a bar…

You will notice that there are two different relationships:

A Mathematician and an Ontologist walk into a bar…

The pair (Joan, Wells Fargo) is in both sets, so it is in both mappings.  Does that make sense to you?

Ontologist: Yes, I think it makes sense.  In my world, if I cared about both of these types of relationships, I would make isAManagerAt a subProperty of worksFor, and enter the assertion that Joan is a manager at Wells Fargo.  My reasoner would add the inferred relationship that Joan worksFor Wells Fargo.

Mathematician: Of course!  I think I’ve got the basic idea now.  Let me show you what else I can do with sets.  I’ll even throw in some your terminology.

Ontologist: Oh, by all means. [O is silently thinking, “I bet this is all in OWL, but hey, the OWL specs don’t have pictures of sets.”]

Mathematician: [takes a deep breath so he can go on and on … ]

Let’s start with two sets:

A Mathematician and an Ontologist walk into a bar…

The intersection is a subset of each set, and each of the sets is a subset of the union.  If we want to use the intersection as a Class, we should be able to infer:

A Mathematician and an Ontologist walk into a bar…And if we want to use the union as a Class, then each original Class is a Sub Class of the union:

A Mathematician and an Ontologist walk into a bar…

If two Classes A and B have no members in common (disjoint), then every Sub Class of A is disjoint from every sub class of B:

A Mathematician and an Ontologist walk into a bar…A mapping where there is at most one arrow out from each starting point is called a function.

A Mathematician and an Ontologist walk into a bar…A mapping where there is at most one arrow into each ending point is called inverse-functional.

A Mathematician and an Ontologist walk into a bar…

You get the inverse of a mapping by reversing the direction of all the arrows in it.  As the name implies, if a mapping is inverse-functional, it means the inverse is a function.

Sometimes the inverse mapping ends up looking just like the original (called symmetric), and sometimes it is “totally different” (disjoint or asymmetric).

A Mathematician and an Ontologist walk into a bar…Sometimes a mapping is transitive, like our diagram of inferences with subClassOf, where a subclass of a subclass is a subclass.  I don’t have a nice simple set diagram for that, but our Class diagram is an easy way to visualize it.  Take two hops using the same relationship and you get another instance of the relationship:

A Mathematician and an Ontologist walk into a bar…

Sets can be defined by combining other sets and mappings, such as the set of all people who work for some bank (any bank).

Ontologist: Not bad.  Here’s what I would add:

Sometimes I define a set by a phrase like you mentioned (worksFor some Bank), and in OWL I can plug that phrase into any expression where a Class name would make sense.  If I want to turn the set into a named Class, I can say the Class is equivalent to the phrase that defines it.  Like this:

BankEmployee is equivalentTo (worksFor some Bank).

The reasoner can often use the phrase to infer things into the Class BankEmployee, or use membership in the Class to infer the conditions in the phrase are true.  A lot of meaning can be added to data this way.  Just as in a dictionary, we define things in terms of other things.

When two Classes are disjoint, it means they have very distinct and separate meanings.  It’s a really good thing, especially at more general levels.  When we record disjointness in the ontology, the reasoner can use it to detect errors.

Whenever I create a Property, I always check to see if it is a function.  If so, I record the fact that it is a function in the ontology because it sharpens the meaning.

We never really talked about Data Properties.  Maybe next time.  They’re for simple attributes like “the building is 5 stories tall”.

A lot of times, a high level Property can be used instead of creating a new subProperty.  Whenever I consider creating a new subProperty, I ask myself if my triples will be just as meaningful if I use the original Property.  A lot of times, the answer is yes and I can keep my model simple by not creating a new Property.

An ontology is defined in terms of sets of things in the real world, but our data base usually does not have a complete set of records for everything defined in the ontology.  So, we should not try to infer too much from the data that is present.  That kind of logic is built in to reasoners.

On the flip side, the data can include multiple instances for the same thing, especially when we are linking multiple data sets together.  We can use the sameAs Property to link records that refer to the same real-world thing, or even to link together independently-created graphs.

The OWL ontology language is explained well at: https://www.w3.org/TR/owl-primer/

However, even if we understand the theory, there are many choices to be made when creating an ontology.  If you are creating an ontology for a business, a great book that covers the practical aspects is Demystifying OWL for the Enterprise by Michael Uschold.

Mathematician: I want the last word.

Ontologist: OK.

Mathematician:

A Mathematician and an Ontologist walk into a bar…Ontologist: I agree, but that wasn’t a word.  🙂

Mathematician: OK.  I think I’m starting to see what you are doing with ontologies.  Here’s what it looks like to me: since it is based on set logic and triples, the OWL ontology language has a rock-solid foundation.

Written By: Phil Blackwood, Ph.D.

Skip to content