The Data-Centric Revolution: Headless BI and the Metrics Layer

Read more from Dave McComb in his recent article on The Data Administration Newsletter.

“The data-centric approach to metrics puts the definition of the metrics in the shared data. Not in the BI tool, not in code in an API. It’s in the data, right along with the measurement itself.”

Link: The Data-Centric Revolution: Headless BI and the Metrics Layer – TDAN.com

Read more of Dave’s articles: mccomb – TDAN.com

Semantic Arts future-proofs the enterprise  with data-centric transformation programs

Semantic Arts future-proofs the enterprise with data-centric transformation programs.

99% of all enterprises shoot themselves in the foot every time they implement a new information system. The prevailing mindset, the “application-centric” mindset,  guarantees that each new system introduces yet another incompatible data model into the firms datascape. It doesn’t matter whether the firm is building a custom application, buying an application package, or renting Software as a Service. It doesn’t matter whether the methodology is waterfall or agile. Solving business problems by implementing applications is what creates the dreaded silos.

The compounding problem is these application systems each have their own completely idiosyncratic data models. In addition to being arbitrarily different, almost every application is at least an order of magnitude more complex than it needs to be. The net result is, after a few decades, you have hundreds or thousands of applications implemented, each with their own data model. You spend most of your IT budget on  systems integration, without really achieving  it. 

Semantic Arts discovered it’s possible to build an elegant core model of even the most complex enterprise in a limited amount of time (typically less than six months)  and with a limited amount of complexity  (typically fewer than 500 concepts (classes  + properties). When well implemented, the said model can be extended and specialized in handling specific requirements of subdomains or departments while still staying aligned with the enterprise core as business continues to grow. The model can be directly populated with data from legacy systems, and queries can be federated over the legacy systems through the core model. Such a model in the short term can provide a simple integration platform that not only integrates legacy systems but also allows the integration of unstructured and external data.

Meet the leader behind the success of  Semantic Arts. 

Dave McComb is the President and Co-founder of Semantic Arts. He and his team help organizations uncover the meaning in the data from their information systems. Dave is also the author of “The Data-Centric  Revolution”, “Software Wasteland”, and “Semantics in  Business Systems”. For 20 years, Semantic Arts has helped firms of all sizes in this endeavor, including Proctor &  Gamble, Goldman Sachs, Schneider-Electric, Lexis Nexis,  Dun & Bradstreet, and Morgan Stanley. Prior to Semantic  Arts, Dave co-founded Velocity Healthcare, where he developed and patented the first fully model-driven architecture. Prior to that, he was a part of the problem. 

“Semantic Arts exists to help organizations transition to a newly emerging paradigm of information systems based on flexible data structures and deep semantics. “

Semantic Arts is a professional services firm—they do not sell software or hardware—therefore, the company is completely objective and can bring business outcome thinking and considerable experience to every engagement. 

In conversation with  Dave McComb, Co-founder and CEO of Semantic Arts 

Can you tell us about your  services in brief? 

Semantic Arts delivers a single offering: we help guide clients who are committed to transitioning to a data-centric future by implementing this capability through our consulting services. Some may come to the need to do this through their digital transformation efforts, which more and more are discovering to be far harder than it should be because they are trying to build on a foundation of brittle complexity inherently in relational and legacy data structures. Some come to it through big data, AI, and  ML, when they find that they are spending most of their time and effort wrangling data that should already be organized. And some have come to it from first principles,  as we did. Our first engagement is a combination of what we call the “think big / start small” implementation methodology for most clients. We discovered the need to do both in parallel. To do only the think big  part (as we did for more than a  decade) creates beautiful models  that do not get implemented. To only start small (as every agile project does these days) almost guarantees the furtherance of silos.  By doing the two in parallel, we  can show how the core model (the  think big part) guides each “small” (incremental) project to conform to  the core. 

After that first project, most clients re-engaged us for many more projects, which either tackle additional domain areas, begin building out the information architecture needed to take maximum advantage of this approach, or establish training and governance to make this the new normal. 

Huge amount of data is created every second from sources like online transactions, social media,  or customer data. Do you think the existing ways of managing data is enough for exponentially bigger data quantity? 

Well, it has to be managed in layers,  it’s obvious that we will have even more data, and the Internet of Things is going to swamp the existing data completely. But the traditional approach of just copying all of the data and moving it around  to a big data platform so that people  can do analytics is not going to scale to that. There’s more data than what the systems can process over. What will happen instead is there’s going to be a lot more edge computing where the big data, IoT,  and the clickstream analysis will stay. It’s going to be abstracted, and the essential differences are going to come into an analytic framework. 

With the situation today due to coronavirus, everything is going online. How are you helping your clients adjust to the new normal? 

One of the big things that people  are doing right now is called digital transformation, and I think that the coronavirus pandemic has only accelerated the process. But as a lot of authors have pointed out, digital transformation projects have about a 70 percent failure rate, and they are incredibly expensive. People are just trying to use brute force to change their existing procedures  into a fully digital-enabled, stymied  sheer scale and complexity. Therefore, our approach is to find an elegant and simple model that represents the core of your business and use it to fuel your digital transformation.  We’re guiding our clients in implementing this capability to ease information findability, accessibility, interoperability, and reusability. 

Normally, reliable and high-quality services come at an expensive price tag. How do you manage to keep your services affordable? 

We primarily work with very large firms and have small, agile teams,  so compared to a typical application implementation project, these are a fraction of. Our services are very affordable and barely noticeable, but in the longer term, the data-centric approach is going to move into the mid-market, where it will become a platform play. 

The 90s Are Over, Let’s Stop Basing Capital Cost Decisions on Lagging Indicators

Let’s Stop Basing Capital Cost Decisions on Lagging IndicatorsLet’s Stop Basing Capital Cost Decisions on Lagging Indicators. Remember the good old days of emerging digital technology? Accessing information through a dial-up internet connection. Saving data to floppy discs or CDs. Sending emails to have them printed for storage. Mobile connectivity was new, exciting, and… slow compared to what we have today.

In the energy sector, data access limitations influenced the structure of traditional execution workflows for capital projects. It was common – and still is – for project execution models to focus on document-based deliverables over raw data.

The inherent problem with a document-centric approach is that documents take time to produce. Let’s imagine the workflow for a technology evaluation study that:

  • Begins with initial input from multiple departments.
  • Gets reviewed by 2-3 management layers on the project organizational chart.
  • Finally lands on the desk of a senior decision-maker.

This process could easily take two weeks or longer. But what happens during those two weeks? Work doesn’t get paused. The project continues to progress. The information initially collected for the study no longer represents current project conditions. By the time it gets to the decision-maker, the study is based on two-week-old lagging indicators.

A lot can change on a project in that amount of time. Execution workflows built around lagging indicators tend to:

  • Lead to costly and unnecessary errors caused by decisions based on old information.
  • Stymie innovation with rigid and slow processes that limit experimentation.

Click here to read more. 

Originally posted in: Digital Transformation

Click here to Read an Advanced Chapter from the Data-Centric Revolution by Dave McComb

A Data-Centric Approach to Managing Customer Data

by Phil Blackwood, Ph.D.

Without a doubt every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of each customer. However, this customer data is typically scattered across hundreds of applications, its meaning is embedded in code written years ago, and much of its value is locked away in silos. Compounding the problem, stakeholders in different parts of the business are likely to have different views of what the word “customer” means because they support different kinds of interactions with customers.

In this post, we’ll outline how to tackle these issues and unlock the value of customer data. We’ll use semantics to establish simple common terminology, show how a knowledge graph can provide 360 degree views, and explain how to classify data without writing code.

The semantic analysis will have three parts: first we consider the simple use case illustrated in the diagram below, then take a much broader view by looking at Events, and finally we will dive deeper into the meaning of the diagram by using use the concept of Agreements.

Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

The diagram shows an event in which a customer purchases a shirt from a shop. Ask stakeholders around your company what types of events customers participate in, and you are likely to get a long list. It might look something like this (the verbs are from the viewpoint of your company):

  • Answer general questions about products and services
  • Create billing account for products and services
  • Create usage account for a product or service
  • Deliver product or service (including right-to-use)
  • Finalize contract for sale of product or service
  • Help a customer use a product or service
  • Identify a visitor to our web site.
  • Determine a recommender of a product or service
  • Find a user of a product or service
  • Migrate a customer from one service to another
  • Migrate a service from one customer to another
  • Prepare a proposal for sale of product or service
  • Receive customer agreement to terms and conditions
  • Receive payment for product or service
  • Rent product or service
  • Sell product or service
  • Send bill for product or service
  • Ship product

We can model these events using classes from the gist ontology, with one new class consisting of the categories of events listed above. When we load data into our knowledge graph, we link each item to its class and we relate the items to each other with object properties. For example, an entry for one event might look like:

Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

By using categories instead of creating 18 new classes of events, we keep the model simple and flexible. We can round out the picture by realizing that the Person could instead be an Organization (company, non-profit, or government entity) and the Product could instead be a Service (e.g. window washing).

In a green-field scenario, the model and the data are seamlessly linked in a knowledge graph and we can answer many different questions about our customers. However, in most companies a considerable amount of customer data exists in application-centric silos. To unlock existing customer data, we have to first understand its meaning and then we can link it into the knowledge graph by using the R2RML data mapping language. This data federation allows us to write queries using the simple, standard semantic model and get results that include the existing data.

For any node in the knowledge graph, we have a 360 degree view of the data about the node and its context. A Person node can be enriched with data from social media. An Organization node can be enriched with data about corporate structure, subsidiaries, or partnerships.

Now let’s pivot from the broad event-based perspective to look more closely at the meaning of the original example. Implicit in the idea of a sale is an agreement between the buyer and the seller; once the agreement is made, the seller is obligated to deliver something, while the buyer must pay for it. The “something” is a product or service. We can model the transaction like this:Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

This basic pattern of agreement and obligation covers many use cases. The agreement could be the simple act of placing the shirt on the check-out counter, or it could be a contract. Delivery and payment could coincide in time, or not. Payments or deliveries, or both, could be monthly.

If our Contract Administration group wants a simple way to identify all the customers who have a contract, we can create a Class named ContractCustomer and populate it automatically from the data in our knowledge graph. To do this, we would write an expression similar to a query that defines what we mean by ContractCustomer, declare the Class to be equivalent to the expression, and then run an off-the-shelf, standards-based inference engine to populate the new class. With no code needed … it’s model-driven.

This method of automatically populating classes can be used to support the wide variety of needs of stakeholders in different parts of the company, even though they do not have the same definition of customer. For example, you could provide classes like PayingCustomer and ProductUsers that can be used to simplify the way the data is accessed or to become building blocks in the model to build upon. With this approach, there is no need to try to get everyone to agree on a single definition of customer. It lets everyone stay focused on what will help them run their part of the business.

While many refinements are possible, we’ve outlined the core of a data-centric solution to the knotty problem of managing customer data. The semantic analysis reveals a simple way to capture information about customer interactions and agreements. A knowledge graph supports 360 degree views of the data, and an inference engine allows us to populate classes automatically without writing a single line of code.

I hope you can glean some ideas from this discussion to help your business, and that you get a sense of why semantics, knowledge graphs, and model-driven-everything are three keys to data-centric architecture.

Dispose, Delete, and Discard: Keep your Enterprise Data Tidy Part 3

Those who are familiar with Marie Kondo know that she is a ruthless disposer. If you’ve read parts one and two of this series, you know that the process is more nuanced than just “throw it all away,” but we’ve come to the point in the process where it’s important to focus on discarding. If you haven’t read parts one and two of this series, please do so; they provide context for the content of this post.  Armed with categories that work for your organization and a solid set of values that the data you keep must uphold to be useful to your business, this part of the process is primarily dedicating time to pruning your files and records, and documentation.

Data Lifecycle Policies

“The fact that you possess a surplus of things that you can’t bring yourself to discard doesn’t mean you are taking good care of them.  In fact, it is quite the opposite.” It’s interesting to note that, while there are many book collectors who lament Kondo’s popularity and cry, “You can pry my books out of my cold, dead hands,” there aren’t many librarians who hold this sentiment.  Professionals know that collections must be pruned and managed. In fact, your organization may have one or more policies about managing data and documents.  At a minimum, data lifecycle policies cover three points of a document’s existence within an organization: creation or acquisition, use and storage, and disposition.  These policies may be driven by the systems used to manage your documents (Microsoft SharePoint comes to mind) or they may be driven by government mandates. These should be your guide on what and when you discard.  If your organization has these policies outlined clearly, the hard work is already done, and you can begin using parts one and two as your guide to systematically deleting unneeded data and documentation. It may also be that some of this lifecycle management functionality is encoded in your systems, but it’s important to understand the policies if you’re making the decisions about data disposition. If your organization does not have a data lifecycle policy, you can explore creating one while you work on becoming data centric.

Data Configuration Management

Outside of an overarching strategy or policy for managing your organization’s data and information, your organization may have various configuration management tools in place (e.g., Git or Subversion) to manage drafts and backups. Many large organizations use file sharing systems to govern who has privileges to directories and files.  If you’re attempting to KonMari your files when such systems are in place, it will be necessary to work collaboratively to get access to the files in your control.

When do you actually discard???

One of the key ideas in Marie Kondo’s method is that when you discard, you only discard your own belongings.  If you are the owner and CTO of a company, then you have the freedom to discard what no longer sparks joy.  In a large company, that question of ownership is far more complex and possibly beyond the reader’s paygrade. It might be beyond the CEO’s paygrade. It is certainly beyond the paygrade of the writer, except with a select few files on a laptop and in a removable storage device used for backups.  But the question of ownership can often be established by completing the work recommended in this series of blog posts.  And once you’ve established ownership, even complex ownership, you can use metadata to describe ownership and provenance, making it easier to manage that data’s future state, discarded or otherwise.

Futureproofing your Data

Now that we’ve considered the end of the data lifecycle management picture, take a look at the start—data acquisition and creation.  If you’ve done the work so far of identifying your business processes and assessed how well your data supports your goals and aligned to your data lifecycle management policy (formal or otherwise), you know how important it is to also consider the introduction of new data.  We touched on this in the first two parts, but there’s a subtle difference between considering how data came to be in your collection and considering data that you will include in your collection from this point forward.

This is something you can specify with policy, and it’s something you can anticipate with a robust ontology. However, it’s not as simple as building robust metadata.  An ontology that is carefully anchored to your organization’s processes, has sufficient input from the right subject matter experts, and is developed within a hospitable IT infrastructure, is far more likely to be a sound gatekeeper for your incoming data.

In the IT industry, this is referred to as Futureproofing, and is designed to minimize the need for down-stream development to make corrections to work you’re doing now. It’s often a judgment call as to whether the application or system is introducing too much technical debt, but there is no argument that being able to understand each piece of data that goes into your system is critical to avoiding such debt. The way to ensure your data will be understandable downstream is to have adequate metadata.  If you want your data to be sophisticated and able to support complex information needs, you need to use semantics.

“The secret to maintaining an uncluttered room is to pursue ultimate simplicity in storage so that you can tell at a glance how much you have.” -Marie Kondo

Read Part 1: Does your Data Spark Joy?

Read Part 2: Setting the Stage for Success

Written by Meika Ungricht

A Data Engineer’s Guide to Semantic Modelling

While on her semantic modelling journey and as a Data Engineer herself, Ilaria Maresi encountered a range of challenges. There was not one definite source where she could quickly look things up, many of the resources were extremely technical and geared towards a more experienced audience while others were too wishy-washy. Therefore, she decided to compose this 50-page document where she explains semantic modelling and her most important lessons-learned – all in an engaging and down-to-earth writing style.

She starts off with the basics: what is a semantic model and why should you consider building one? Obviously, this is best explained by using a famous rock band as an example. In this way, you learn to draw the basic elements of a semantic model and some fun facts about Led Zeppelin at the same time!

For your model to actually work, it is essential that machines can also understand these fun facts. This might sound challenging if you are not a computer scientist but this guide will walk you through it  step-by-step – it even has pictures of baby animals! You will learn how to structure your model in Resource Description Framework (RDF) and give it meaning with the vocabulary extension that wins the prize for cutest acronym: Web Ontology Language (OWL).

All other important aspects of semantic modelling will be discussed. For example, how to make sure we all talk about the same Led Zeppelin by using Uniform Resource Identifiers (URIs). Moreover, you are not the first one thinking and learning about knowledge representation: many domain experts have spent serious time and effort in defining the major concepts of their field, called ontologies. To prevent you from re-inventing the wheel, we list the most important resources and explain their origin.

Are you a Data Engineer that has just started with semantic modelling? Want to refresh your memory? Maybe you have no experience with semantic modelling yet but feel it might come in handy? Well, this guide is for you!

Click here to access a data engineer’s guide to semantic modelling

Written by Tess Korthout

A Brief Introduction to the gist Semantic Model

Phil Blackwood, Ph.D.

It’s no secret that most companies have silos of data and continue to create new silos.  Data that has the same meaning is often represented hundreds or thousands of different ways as new data models are introduced with every new software application, resulting in a high cost of integration.

By contrast, the data-centric approach starts with the common meaning of the data to address the root cause of data silos:

An enterprise is data-centric to the extent that all application functionality is based on a single, simple, extensible, federate-able data model.

An early step along the way to becoming data-centric is to establish a semantic model of the common concepts used across your business.  This might sound like a huge undertaking, and perhaps it will be if you start from scratch.  A better option is to adopt an existing core semantic model that has been designed for businesses and has a track record of success, such as gist.

Gist is an open source semantic model created by Semantic Arts. 

Gist is an open source semantic model created by Semantic Arts.  It is the result of more than a decade of refinement based on data-centric projects done with major corporations in a variety of lines of business.  Semantic Arts describes gist as “… designed to have the maximum coverage of typical business ontology concepts with the fewest number of primitives and the least amount of ambiguity.”  The Wikipedia entry for upper ontologies compares gist to other ontologies, and gives a sense of why it is a match for corporate data management.

 

This blog post introduces gist by examining how some of the major Classes and Properties can be used.  We will not go into much detail; just enough to convey the general idea.

Everyone in your company would probably agree that running the business involves products, services, agreements, and events like payments and deliveries.  In turn, agreements and events involve “who, what, where, when, and why”, all of which are included in the gist model.  Gist includes about 150 Classes (types of things), and different parts of the business can be often be modeled by adding sub-classes.  Here are a few of the major Classes in gist:

Gist also includes about 100 standard ways things can be related to each other (Object Properties), such as:

  • owns
  • produces
  • governs
  • requires, prevents, or allows
  • based on
  • categorized by
  • part of
  • triggered by
  • occurs at (some place)
  • start time, end time
  • has physical location
  • has party (e.g. party to an agreement)

For example, the data representing a contract between a person and your company might include things like:

In gist, a Contract is a legally binding Agreement, and an Agreement is a Commitment involving two or more parties.  It’s clear and simple.  It’s also expressed in a way that is machine-readable to support automated inferences, Machine Learning, and Artificial Intelligence.

The items and relationships of the contract can be loaded into a knowledge graph, where each “thing” is a node and each relationship is an edge.  Existing data can be mapped to this standard representation to make it possible to view all of your contracts through a single lens of terminology.  The knowledge graph for an individual contract as sketched out above would look like:

Note that this example is just a starting point.  In practice, every node in the diagram would have additional properties (arrows out) providing more detail.  For example, the ID would link to a text string and to the party that allocated the ID (e.g. the state government that allocated a driver’s license ID).  The CatalogItem would be a detailed Product or Service Specification.

In the knowledge graph, there would be a single Person entry representing a given individual, and if two entries were later discovered to represent the same person, they could be linked with a sameAs relationship.

Relationships in gist (Properties) are first class citizens that have a meaning independent of the things they link, making them highly re-usable.  For example, identifiedBy is not limited to contracts, but can be used anywhere something has an ID.  Note that the Properties in gist are used to define relationships between instances rather than Classes; there are also a few standard relationships between Classes such as subClassOf and equivalentTo.

The categorizedBy relationship is a powerful one, because it allows the meaning of an item to be specified by linking to a taxonomy rather than by creating new Classes.  This pattern contributes to extensibility; adding new characteristics becomes comparable to adding valid values to a relational semantic model instead of adding new attributes.

Unlike traditional data models, the gist semantic model can be loaded into a knowledge graph and then the data is loaded into the same knowledge graph as an extension to the model.  There is no separation between the conceptual, logical, and physical models.  Similar queries can be used to discover the model or to view the data.

Gist uses the W3C OWL standard (Web Ontology Language), and you will need to understand OWL to get the most value out of gist.  To get started with OWL for corporate data management, check out the book Demystifying OWL for the Enterprise, by Michael Uschold.  There’s also a brief introduction to OWL and the way it uses set theory here.

The technology stack that supports OWL is well-established and has minimal vendor lock-in because of the simple standard data representation.  A semantic model created in one knowledge graph (triple store) can generally be ported to another tool without too much trouble.

To explore gist in more detail, you can download an ontology editor such as Protégé and then select File > Open From URL and enter: https://ontologies.semanticarts.com/o/gistCore9.4.0  Once you have the gist model loaded, select Entities and then review the descriptions of Classes, Object Properties (relationships between things), and Data Properties (which point to string or numeric values with no additional properties).  If you want to investigate gist in an orderly sequence, I’d suggest viewing items in groups of “who, what, when, where, and how.”

Take a look at gist.  It’s worth your time, because having a standard set f common terms like gist is a significant step toward reversing the trend toward more and more expensive data silos.

Click here to learn more about gist.

Setting the Stage for Success Part 2

Envisioning Your Dream System with the Marie Kondo Method

Before you begin gathering your belongings, discarding, or reorganizing, Marie Kondo asks you to envision your dream lifestyle.  She insists that this is the critical first step to ensuring success Envisioning Your Dream System with the Marie Kondo Methodwith her method, and she provides some guidance on how to do so and examples from her clients.  The example Marie Kondo uses in her book is a young woman who lives in a tiny apartment, typical of Japanese cities.  Her floor is covered with things and her bed is a storage space when she isn’t sleeping on it.  She comes home from work tired and her living space compounds that exhaustion.  Maria Kondo has a dream and that dream is simple: to have the space be free from clutter, like a hotel suite, where she can come home and relax with tea and a bath before bed.

While the situation may be different for someone who has responsibility for stores of corporate data and systems, the process of envisioning your ideal environment is not.  As you begin to examine your systems, information architecture, data—an information landscape, in general—it’s absolutely critical to have in mind what you want.  Having in mind “better” or “new technology” leads you towards trends and vendors with cool product features that may meet your needs, but more likely will end up contributing to the data and system clutter in the long run.  It may seem like a simplistic question, “What do you want?” but your efforts in defining that will help you navigate the marketplace of emerging technology.  At this point, it is important not to focus on the process or the items in front of you that you may or may not want to keep; rather, envisioning your ideal end-state, be it a living space filled with only things you love or a database filled only with data that supports your business, is what empowers you to move forward.

If you’re a savvy tech professional, you’re already thinking, “This is the requirements gathering process,” and you would be right.  There is no shortage of requirements gathering methodologies out there and most of them are pretty good.  If it gets you to envision an ideal that is vendor and tool agnostic and is based on the needs and desires of your key stakeholders and end-users, your method is fine.  If your requirements include things like, “better search functionality,” or, “more insight into what data we have,” it’s very likely that you’re also in need of some data decluttering.

Get Started by Defining your Categories

The Marie Kondo method requires you to see your belongings in two overarching categories: things that spark joy and everything else.  Everything else should be discarded.  For our purposes, data that sparks joy is data that serves your business.  It is helpful to look at the antithesis of joy to get an idea of what should be kept or discarded.  For example, if you are facing an audit, the Get Started by Defining your Categoriesantithesis of joy is not being able to produce the documentation that the auditor needs to conduct the audit.  That could be because you can’t access it, because what you have isn’t what they need, you don’t have what they need, or what they need is too difficult to find amidst data and information that you have.  In this example, the information that allows you to have peace of mind during an audit is what you should keep. The bigger pattern here is that it’s important to know what business processes, data flows, decision points, and dependencies are impacting your business, and what the inputs and outputs are to those process steps.

Before you can begin to discard by category, you must know what categories drive your business.  Marie Kondo starts by outlining a series of categories that guide her clients through the process of discarding.  She starts with clothing, then books, then papers, then everything else.  She breaks down these categories even further, allowing people with astonishingly large and complex collections of things to take a systematic approach to decluttering. With organizational data, this approach will work, but the way you define the categories depends on the kind of organization you are.

The categories you need should emerge out of your efforts at process improvement. From Investopedia: “Kaizen is a Japanese term meaning ‘change for the better’ or ‘continuous improvement.’ It is a Japanese business philosophy regarding the processes that continuously improve operations and involve all employees. Kaizen sees improvement in productivity as a gradual and methodical process.”(1) Often, semantic work is done alongside large-scale business process improvement efforts.  Businesses want to know what the information inputs and outputs are, and they want to know how that information influences decisions and actions.  These efforts are often iterative, and it’s not uncommon to uncover conflicts in how people understand the data, or what they use it for.  I remember working with a team of medical experts who all used “normal” as a data point in their diagnostic processes.  It took our team years to come up with a good way to encode “normal” because each expert meant something different by the term.  There were heated debates about whether or not “normal” meant within the context of a patient who might be legally blind, in which case a low visual acuity score might be considered normal, or if normal was a cohort or population average, in which case that patient’s low score was not normal.  These conflicts and pain points are like mismatched socks and poorly-fitting jeans: they’re your clue about where you need to look at your data. This is also the starting point for determining which categories you need to use to evaluate your data. Do not strong-arm your conflicts into silence; use them to light the way ahead.

Building the Categories that Matter to You with the Marie Kondo Method

The Marie Kondo method categories are presented in an order that begins by teaching us what it means to feel that spark of joy (clothing) and works through household items that might be useful but not particularly exciting, and ends with items of sentimental value (photos and heirlooms).  One of the big challenges of applying the Marie Kondo method to organizational data is that this rubric and categorization doesn’t easily map to things like clothing and photos.  However, the underlying idea of what is essential to our survival and our comfort does easily translate to data.  Don’t getBuilding the Categories that Matter to You with the Marie Kondo Method bogged down in the details too soon. Marie Kondo advises that you create subcategories according to your need.

When I was organizing my miscellaneous items, I uncovered some camping gear I had purchased a couple years ago with the intention of going on a long bike ride that involved camping at night.  I was unable to go, so I packed the gear away for another time.  As I went through the process of evaluating my belongings using the Marie Kondo method, I decided I’ve always enjoyed camping and I was going to make space in my life for it.  I booked a camping trip for a few days, loaded my gear into a rental car, and put my gear to the test.

This camping trip was rich with lessons, pleasant and painful both. I took the gear I had bought for the bike trip, but since I had a car, I also supplemented it with larger and heavier items I knew would be useful now that I had the space.  Things I thought would be overkill turned out to be very useful: extra flashlight, large water container, spare book of matches, extra pillow, folding chair, extra plastic tub, etc.  Things I was certain I would use ended up coming home unused: pancake mix, spare sleeping bag, two changes of clothes, packets of sample skin and hair products, etc. And I found there were things I needed in the moment that I didn’t have: a lighter, fire starters, strong bug spray, an umbrella, and 4WD.  The underlying lesson here is that your gear should enable the activities you want to do.  And different types of gear serve different types of experiences, even if they’re categorically similar. If you look at the gear belonging to someone who likes glamping and compare it to someone who likes to through-hike the Appalachian Trail, there may not be a whole lot of overlap in the specifics, even though the categories are the same.  This is because your process determines your needs.

Camping gear is often designed to meet basic human needs and provide basic creature comforts.  Complex business processes can draw from this analog example, in that your categories are going to appear around the essential tasks of your business. In many of the projects I’ve done in the past, some effort has been made to identify key information areas that need development using Continuous Improvement or Kaizen principles.  Information artifacts, key concepts, subject headings, however you choose to refer to them, are the overarching conceptual subjects that drive your business.  Using the camping example, this might look like the following: Sleep, Food, Hygiene, Recreation. If you break down sleep, the process could be as simple as laying out a tarp and a blanket and wrapping yourself up in it and going to sleep.  Or it might be as complex as building a platform, building a tent, constructing a bed frame, unfolding sheets, pillows, and blankets, securing the tent, and finally going to sleep. In both scenarios, there are categories for sleep surface, shelter, and bedding.

Another key comparison comes up when considering duplication and re-use.  Chances are, you aren’t going to need a different sleeping bag for each camping scenario.  It’s interesting to note that if you go into an outdoor supply outfitter looking for sleeping bags, you will find a range of options based on very specific situations.  If your business is camping, you just might need several different bags!  But for most people, this just adds complexity and expense.  You do want to make sure the zipper works so you can control the amount of body heat you’re trapping in the bag, and if you’re camping in the cold you might add a blanket. But otherwise, a multi-season sleeping bag that’s comfortable and easy to care for is going to be re-used over and over in many camping scenarios.

For a business, the examples might range from a child’s lemonade stand to Starbucks. The information objects are going to be similar: menu, supplies metrics. Once you’ve established these categories, you can look at your data systematically.  Coming up with these key concepts allows you to define the scope of your work and priorities for development.

What’s Next?

Now that you’ve got a sense of how to create a list of categories based on your business processes, you can begin the process of discarding.  As with the process so far, it’s not as simple as it is for your possessions at home.  Disposition of data within an enterprise, large or small, comes with politics and legal requirements.  In part three, you will see some ideas about where to start with data disposition and how to use your company’s data disposition strategies to your advantage.

Click Here to Read Part 1 of this Series

Footnotes:
(1) https://www.investopedia.com/terms/k/kaizen.asp

A Mathematician and an Ontologist walk into a bar…

The Ontologist and Mathematician should be able to find common ground because Cantor introduced set theory into the foundation of mathematics, and W3C OWL uses set theory as a foundation for ontology language.  Let’s listen in as they mash up Cantor and OWL …

Ontologist: What would you like to talk about?

Mathematician: Anything.

Ontologist: Pick a thing. Any. Thing. You. Like.

Mathematician: [looks across the street]

A Mathematician and an Ontologist walk into a bar…

Ontologist: Sure, why not?  Wells Fargo it is.  If we wanted to create an ontology for banking, we might need to have a concept of a company being a bank to differentiate it from other types of companies.  We would also want to generalize a bit and include the concept of Organization.

Mathematician: That’s simple in the world of sets.

A Mathematician and an Ontologist walk into a bar…

Ontologist: In my world, every item in your diagram is related to every other item.  For example, Wells Fargo is not only a Bank, but it is also an Organization.  Relationships to “Thing” are applied automatically by my ontology editor.  When we build our ontology, we would first enter the relationships in the diagram below (read it from the bottom to the top):

A Mathematician and an Ontologist walk into a bar…

Then we would run a reasoner to infer other relationships.  The result would look like this:

A Mathematician and an Ontologist walk into a bar…

Mathematician: My picture has “Banks” and yours has “Bank”.  You took off the “s”.

Ontologist: Well, yes, I changed all the set names to make them singular because that’s the convention for Class names.  Sorry.  But now that you mention it … whenever I create a new Class I use a singular name just like everyone else does, but I also check to see if the plural is the good name for the set of things in the Class.  If the plural doesn’t sound like a set, I rethink it.  Try that with “Tom’s Stamp Collection” and see what you get.

Mathematician: I’d say you would have to rethink that Class name if you wanted the members of the Class to be stamps.  Otherwise, people using your model might not understand your intent.  Is a Class more like a set, or more like a template?

Ontologist: Definitely not a template, unlike object-oriented programming.  More like a set where the membership can change over time.

Mathematician: OK.  S or no S, I think we are mostly talking about the same thing.  In fact, your picture showing the Classes separated out instead of nested reminds me of what Georg Cantor said: “A set is a Many that allows itself to be thought of as a One.”

Ontologist: Yes.  You can think of a Class as a set of real world instances of a concept that is used to describe a subject like Banking.  Typically, we can re-use more general Classes and only need to create a subclass to differentiate its members from the other members of the existing Class (like Bank is a special kind of Company).  We create or re-use a Class when we want to give the Things in it meaning and context by relating them to other things.

Mathematician: Like this?

A Mathematician and an Ontologist walk into a bar…

Ontologist: Exactly.  Now we know more about Joan, and we know more about Wells Fargo.  We call that a triple.

Mathematician: A triple.  How clever.

Ontologist: Actually, that’s the way we store all our data.  The triples form a knowledge graph.

Mathematician: Oh, now that’s interesting …  nice idea. Simple and elegant.  I think I like it.

Ontologist: Good.  Now back to your triple with Joan and Wells Fargo.  How would you generalize it in the world of sets?

Mathematician: Simple.  I call this next diagram a mapping, with Domain defined as the things I’m mapping from and Range defined as the things I’m mapping to.

A Mathematician and an Ontologist walk into a bar…

Ontologist: I call worksFor an Object Property.  For today only, I’m going to shorten that to just “Property”.  But.  Wait, wait, wait.  Domain and Range?

A Mathematician and an Ontologist walk into a bar…

In my world, I need to be careful about what I include in the Domain and Range, because any time I use worksFor, my reasoner will conclude that the thing on the left is in the Domain and the thing on the right is in the Range.

Ontologist continues: Imagine if I set the Domain to Person and the Range to Company, and then assert that Sparkplug the horse worksFor Tom the farmer.  The reasoner will tell me Sparkplug is a Person and Tom is a Company.  That’s why Domain and Range always raise a big CAUTION sign for me.  I always ask myself if there is anything else that might possibly be in the Domain or Range, ever, especially if the Property gets re-used by some else.  I need to define the Domain and Range broadly enough for future uses so I won’t end up trying to find the Social Security number of a horse.

Mathematician: Bummer.  Good luck with that.

Ontologist: Oh, thank you.  Now back your “mapping”.  I suppose you think of it as a set of arrows and you can have subsets of them.

Mathematician: Yes, pretty much.  If I wanted to be more precise, I would say a mapping is a set of ordered pairs.  I’m going to use an arrow to show what order the things are in; and voila, here is your set diagram for the concept:

A Mathematician and an Ontologist walk into a bar…

You will notice that there are two different relationships:

A Mathematician and an Ontologist walk into a bar…

The pair (Joan, Wells Fargo) is in both sets, so it is in both mappings.  Does that make sense to you?

Ontologist: Yes, I think it makes sense.  In my world, if I cared about both of these types of relationships, I would make isAManagerAt a subProperty of worksFor, and enter the assertion that Joan is a manager at Wells Fargo.  My reasoner would add the inferred relationship that Joan worksFor Wells Fargo.

Mathematician: Of course!  I think I’ve got the basic idea now.  Let me show you what else I can do with sets.  I’ll even throw in some your terminology.

Ontologist: Oh, by all means. [O is silently thinking, “I bet this is all in OWL, but hey, the OWL specs don’t have pictures of sets.”]

Mathematician: [takes a deep breath so he can go on and on … ]

Let’s start with two sets:

A Mathematician and an Ontologist walk into a bar…

The intersection is a subset of each set, and each of the sets is a subset of the union.  If we want to use the intersection as a Class, we should be able to infer:

A Mathematician and an Ontologist walk into a bar…And if we want to use the union as a Class, then each original Class is a Sub Class of the union:

A Mathematician and an Ontologist walk into a bar…

If two Classes A and B have no members in common (disjoint), then every Sub Class of A is disjoint from every sub class of B:

A Mathematician and an Ontologist walk into a bar…A mapping where there is at most one arrow out from each starting point is called a function.

A Mathematician and an Ontologist walk into a bar…A mapping where there is at most one arrow into each ending point is called inverse-functional.

A Mathematician and an Ontologist walk into a bar…

You get the inverse of a mapping by reversing the direction of all the arrows in it.  As the name implies, if a mapping is inverse-functional, it means the inverse is a function.

Sometimes the inverse mapping ends up looking just like the original (called symmetric), and sometimes it is “totally different” (disjoint or asymmetric).

A Mathematician and an Ontologist walk into a bar…Sometimes a mapping is transitive, like our diagram of inferences with subClassOf, where a subclass of a subclass is a subclass.  I don’t have a nice simple set diagram for that, but our Class diagram is an easy way to visualize it.  Take two hops using the same relationship and you get another instance of the relationship:

A Mathematician and an Ontologist walk into a bar…

Sets can be defined by combining other sets and mappings, such as the set of all people who work for some bank (any bank).

Ontologist: Not bad.  Here’s what I would add:

Sometimes I define a set by a phrase like you mentioned (worksFor some Bank), and in OWL I can plug that phrase into any expression where a Class name would make sense.  If I want to turn the set into a named Class, I can say the Class is equivalent to the phrase that defines it.  Like this:

BankEmployee is equivalentTo (worksFor some Bank).

The reasoner can often use the phrase to infer things into the Class BankEmployee, or use membership in the Class to infer the conditions in the phrase are true.  A lot of meaning can be added to data this way.  Just as in a dictionary, we define things in terms of other things.

When two Classes are disjoint, it means they have very distinct and separate meanings.  It’s a really good thing, especially at more general levels.  When we record disjointness in the ontology, the reasoner can use it to detect errors.

Whenever I create a Property, I always check to see if it is a function.  If so, I record the fact that it is a function in the ontology because it sharpens the meaning.

We never really talked about Data Properties.  Maybe next time.  They’re for simple attributes like “the building is 5 stories tall”.

A lot of times, a high level Property can be used instead of creating a new subProperty.  Whenever I consider creating a new subProperty, I ask myself if my triples will be just as meaningful if I use the original Property.  A lot of times, the answer is yes and I can keep my model simple by not creating a new Property.

An ontology is defined in terms of sets of things in the real world, but our data base usually does not have a complete set of records for everything defined in the ontology.  So, we should not try to infer too much from the data that is present.  That kind of logic is built in to reasoners.

On the flip side, the data can include multiple instances for the same thing, especially when we are linking multiple data sets together.  We can use the sameAs Property to link records that refer to the same real-world thing, or even to link together independently-created graphs.

The OWL ontology language is explained well at: https://www.w3.org/TR/owl-primer/

However, even if we understand the theory, there are many choices to be made when creating an ontology.  If you are creating an ontology for a business, a great book that covers the practical aspects is Demystifying OWL for the Enterprise by Michael Uschold.

Mathematician: I want the last word.

Ontologist: OK.

Mathematician:

A Mathematician and an Ontologist walk into a bar…Ontologist: I agree, but that wasn’t a word.  🙂

Mathematician: OK.  I think I’m starting to see what you are doing with ontologies.  Here’s what it looks like to me: since it is based on set logic and triples, the OWL ontology language has a rock-solid foundation.

Written By: Phil Blackwood, Ph.D.

Does your Data Spark Joy? Part 1

Why is Marie Kondo so popular for home organization?

Does your Data Spark Joy?Marie Kondo released her book, “The Life-Changing Magic of Tidying up,” almost ten years ago and has since gained much notoriety for motivating millions of people to de-clutter their homes, offices, and lives. Some people are literally buried in their possessions with no clear way to get from room to room.  Others simply struggle to get out the door in the morning because their keys, wallet, and phone play a daily game of hide-and-seek. Whatever the underlying cause of this overwhelm, Marie Kondo offers a simple, clear method for getting stuff under control. Not only that, but she promises that tidying up will clear the spaces in our lives, leaving room for peace and joy.

Why does this method apply to Data-Centric Architecture?

You might be wondering what this has to do with data-centric architecture.  In many ways the Marie Kondo method is easily extrapolated out of the realm of physical possessions and applied to virtual things: bits of data, documents, data storage containers, etc. In the world of information and data, it’s not surprising that people have seen parallels between belongings and data.  That said, it’s not enough to just say that new applications, storage methods, or business processes will solve the problems of information overload, data silos, or dirty data.  Instead, it’s important to examine your company’s data and the business that data serves.

Overarching Data-Centric Principles

For most businesses and agencies, data is essential to function and is ensconced in legal requirements and data lifecycle policy.  It simply isn’t realistic to say, “Throw it all out!”  Instead, the principles behind acquiring, using, storing, and eventually discarding things must be understood.  And in the virtual space, we can understand “things” to be data-centric, metadata, and systems.

Her Method Starts with “Why?”

In her book, Marie Kondo says, “Before you start, visualize your destination.”  And she expands on this, asking readers to think deeply about the question and visualize the outcome of having a tidy space: “Think in concrete terms so that you can vividly picture what it would be like to live in a clutter-free space.” Our clients will often engage us with some ideal data situation in mind.  It might be expressed in terms of requirements or use cases, but it often has to do with being able to harmonize and align data, do large-scale systems integration, or add more sophisticated querying capabilities to existing databases or tools.  In fact, the first steps of our client engagements have to do with developing these questions into statements of work.

Also, we encourage clients to envision their data and what it can tell them independently of applications, systems, and capabilities precisely to avoid the pitfall of thinking in terms of using new tools to solve undefined problems.  It’s uncanny that this method of interrogation into underlying motivations is common between data-centric development and spark-joy tidying up.

Her Method is About the Psychology of Belongings.

It is important to understand how organizations come to have their data.  In the US Government, entire programs are devoted to managing acquisition. In finance, manufacturing, and other industries, the process of acquiring systems and data is often a business unto itself. It’s not uncommon to hear people working with data to refer to “data silos” when talking about partitioned and disconnected collections of data.  Sometimes this data is shuffled into classified folders and proprietary systems unnecessarily, simply because someone wants to retain control of it. In my work at the Federal Government, I found that the process of determining the system of record to be intensely political and time-consuming.  It’s not a trivial process and not simple, but it is essential to the effort of tidying your data-centric environment.

Sort your Data by Category.

Marie Kondo recommends going categorically for a reason.  In her book, she talks about her process of evaluating her belongings by location, drawer by drawer, room by room, and discovering that she found herself organizing multiple drawers with the same things repeatedly.  She tells us, “The root of the problem lies in the fact that people often store the same type of item in more than one place.  When we tidy each place separately, we fail to see that we’re repeating the same work in many locations and become locked into a vicious circle of tidying.” If this doesn’t sound familiar, you aren’t even working with data.

For me, this principle became clear when I gathered all my office supplies in one place. I was astounded by the small mountain of binder clips (and Sharpies) that seemed to materialize out of nowhere. I always seem to be looking for binder clips and sharpies, so I was shocked by how many I had.

I can think of no closer parallel than the proliferation of siloed systems that appear in each department within an agency.  When I worked for a government agency, I was part of a team whose job it was to survey the offices to find out who was using flight data.  There were several billion-dollar systems in development and in maintenance that held flight data. Over the course of a few years, I would hear quotes about the agency-wide number of flight data systems go from 15 systems, to 20, to 30, and beyond.  It literally became an inside-joke with leadership. And at times, we would hear rumors about some small branch office that had their own Microsoft Access database to keep track of their own data, because they couldn’t get what they needed from the systems of record.  Systems are like the binder clips of enterprise data, except that this kind of proliferation is as easy as making a copy. You don’t even need to make a trip to the office supply store to end up with a pile of duplicates.  If you want to understand how much data redundancy you have, search for specific categories of data across all systems.

Does it spark Joy? What does joy mean in the context of systems and data?

How do you know what sparks joy?  First, look at how the principle of looking for joy is applied.  Presumably, you are in your line of business because on some level it brings you joy – joy that derives from fulfilling a purpose.  Remember the first step of understanding why you are embarking on a transformative process and go back to what you envisioned.  Another way that you can look at joy is whether your space and the things in it allow for that spark to happen.  Ideally, you remove the items from your space that hinder that spark, after acknowledging the lessons they’ve taught you.  Do you feel that spark of joy when you grab your keys in the morning on the way out the door? If you’ve ever tried to find misplaced keys while you’re in a rush, you know the antithesis of joy. Having done the work of creating a space where your keys are easy to find is a way of facilitating joy in your morning routine.

One of the supposed failures of the Marie Kondo method as it applies to data clutter is that it is impossible to physically hold, or even look at, every single piece of data in your system.  Again, rely on the principle behind her method, which is that it is important to be thorough and aim for an environment that facilitates ease and joy.  Don’t say, “We can’t delete any personnel data!” and quit.  Commit to taking an inventory of your personnel systems and the systems that use personnel data. If that process reveals that you have ten different personnel systems and personnel data scattered in several other systems, you must take a closer look at your data environment.  At one point in my physical de-cluttering, I found a tin full of paper clips.  I didn’t handle each shiny paper clip individually; rather, I acknowledged the paper clips served me when I printed more documents onto paper, and since I no longer had a printer, I decided to toss them into the recycling bin.

Remember why you’re considering a solution to data problems in the first place and make a commitment to doing the work of determining your real data needs. Purpose is key, because the way data sparks joy is by enabling you to fulfil that purpose.  This can be difficult where the work you do is abstract and somewhat removed from business that is easy to understand.  However, the critical point to knowing whether or not the data in front of you serves its purpose in your business is to fully understand your business.

Discard and Delete your Data

Take a wardrobe full of clothing for example.  Many of Marie Kondo’s clients are surprised when they start organizing their wardrobes. It’s surprising when you can see the amount of clothing that is unserviceable, the number of items that still have tags on them, the number of hand-me-downs or gifts that don’t suit you, etc. These items are sometimes difficult to discard because of several reasons:

  • It’s kept out of obligation to the giver.
  • It cost a lot of money to buy it.
  • It’s still in good repair.
  • It might be the perfect thing to wear at an unspecified event in the future.
  • It reminds you of the lovely event at which you wore it.
  • It reminds you of the person who left it with you.

It may seem far-fetched to apply these reasons to data storage, but a quick glance through failed data projects will show you otherwise.  Consider the proprietary data locked in a system owned by a vendor for which your license has lapsed, or the system that’s coded in an outdated language whose expert programmers have to be called out of retirement to access, or that directory of data that doesn’t really match the fields in your database, but you requested through a complex data-sharing agreement with another agency.  If you can’t think of an example of a system that has been paid for but hasn’t been used, just consider that the terms shelfware and vaporware exist. It’s easy to be cynical about data precisely because of the overlaps between why we keep things in our closets and garages, and why we keep systems and data in our repositories. When you consider these parallels and understand the principles behind evaluating the items you keep with the hope that they will make your life better, sparking joy becomes easier.

Storage experts are hoarders.

Marie Kondo says you don’t need more storage.  That new Cloud service that can take all the old databases you have and make them accessible is not going to solve your problem. Data storage is expensive, and you do not need a new data storage solution.  What you need is to understand your business process, the business need for the data you believe you have, and a disposition plan for everything else.

How do you start?

In summary, if you’re looking for smart data-centric solutions to help you manage an overwhelming amount of data, or you’re looking for ways to access your vast stores of data in a way that enables smarter business solutions, your bigger issue might be data hoarding.  Looking at your business needs, closely examining the data you have, and coming up with strategies for aligning your data to a manageable data lifecycle can seem overwhelming.  Using a data-centric approach will bring that dream into focus. Keep an eye out for part two of this series to learn how to get your data to spark joy for you.

Click Here to Read Part 2 of this Series