Skip to content

DCAF 2021: Third Annual Data-Centric Architecture Forum Re-Cap

Written by Peter Winstanley

The Data-Centric Architecture Forum was a success!

If growth of participants is an indicator of the popularity of an idea, then the Third Annual Data-Centric Architecture Forum is reflecting a strong increase in popularity, for this year over 230 participants joined together for a three-day conversation.  This is a huge increase on the 50 or so who braved the snows of Fort Collins last year.  Perhaps the fact that the meeting was online was a contributing factor, but then spending three days in online meetings with presentations, discussion forums, and discussions with vendors needs a distinct kind of stamina, as it misses out on the usual conviviality of a conference – meals together, deep discussions over beer or coffee, and thoughtful walks and engaging sightseeing trips.  This year’s Data-Centric Architecture Forum was a paradigm shift in itself.  The preparatory work of Matt Faye and colleagues at Authentric provided us with a virtual auditorium, meeting rooms, Q&A sessions, socialization venues, and vendor spaces.  The Semantic Arts team were well-rehearsed with this new environment, but it was reassuring to find that conference attendees soon became acquainted with the layout and, quite quickly, the conference was on a roll with only the very infrequent glitch that was quickly sorted by Matt and team.

Paradigm shift was not only evident in the venue, it was also a central theme of the conference, the idea that we are on the cusp of a broad transformation of practice in informatics, particularly within the enterprise.  Dave McComb placed the Kuhnian idea of revolution squarely on the table as he commenced proceedings.   In many ways this is something we have become all too familiar with as the internet has given us hospitality companies with no hotels, taxi companies with no cars, and so on.  Here we are moving to there being applications with no built-in data store.  How can this possibly work?  It flies in the face of decades, perhaps centuries, of system design.  This Forum focused on architecture – the key elements necessary to implement a data-centric approach.

Some of the presentations covered the whole elephant, trunk to tail, whereas others focused on specific aspects.  I’ll take a meander through the key messages for me, but as ever in these sorts of reviews, there is no easy way to do justice to everyone’s contribution, and my focus may not be your focus.  However, given that the Forum was a ‘digital first’ production, you will be able to access the talks, slide decks and discussions yourself to make up your own mind—and I hope that you do.  A full complement of all recorded presentations can be available for purchase at the same price as admission.  They can be purchased here or inquire further at admin@semanticarts.com

Understanding that “The future is already here — it’s just not evenly distributed” means that we have to disentangle the world around us and sift out the ideas and the implementations that show this future, and perhaps recognise early places where the future is likely to arise from places where the technology isn’t perhaps the most sophisticated, but the marketing is more advanced (thinking Betamax vs VHS here).  As Mark Musen pointed out in “A Data-Centric Architecture for Ensuring the Quality of Scientific Data”, when given a free reign people make a mess of adding metadata, and this can be remedied by designing minimal subsets that do a ‘good enough’ job.  Once a community realises that a satisficing minimum metadata set can deliver benefit in a domain, this model can be rolled out with similar good effect to other domains.  We know from Herbert Simon’s work that organisations naturally fill this ‘satisficing’ concept of operations, and as Alan Morrison and Mark Ouska discussed in their presentation on lowering the barriers to entry, going with the organisational flow – ensuring that there was an organisational language to express the new ideas – is a key element in successful adoption.  How else are we to bring to market the range of technologies presented by the 13 vendors exhibiting at the Forum?  Their benefits need to be describable in user stories that have resonance in all enterprises, for this isn’t just a revolution for science or for engineering, just like Berners-Lee tweeted about both the Olympic Games and the World Wide Web at the start of the London Olympic Games, “This is for everyone”.

Being for everyone requires the technologies, such as the inclusion of time and truth or the responding to events that are possible in modern triplestores, are able to be populated at scale with soundly-created information assets.  The approaches to “SemOps,”an automation ecosystem to provide scalable support to people managing enterprise data assets in a data-centric modality, was the focus of the presentation by Wallace and Karii.  Being for everyone also means that information needs to be used across domains, and not just within the highly tailored channels that are typical of current application-centric architectures.  Jay Yu from Intuit and Dan Gshwend from Amgen, among others, showed their organisation’s paths to this generalised, cross-domain use of enterprise information, and the social dimension of this liberation of data across the enterprise was considered by Mike Pool and also by Laura Madsen, who both provided their experiences on governance in data-centric worlds.  Security was also covered, albeit later in a video presentation, by Rich Sinnott from Melbourne.

So, where are we at?  With attendance from North and South America, Europe and Oceania, the Forum showed us that there is a global appeal to the ideas of data-centricity.  There is commercial activity by various scales of solution vendors and implementing enterprises.  There is also consideration both within enterprises and by specialist consultants in the human factors associated with implementation and management of data-centric architectures.  However, there are still considerable challenges in cross-domain implementation of data-centricity, and the need to scale simultaneously not only the technical infrastructures and human skills, but also the involvement of individuals at a personal level in the management of their information and their active involvement in the contribution of that information to the global web of data.  The news from the BBC on their work with Solid pods and other personal information stores gave the Forum an inkling of the scale of change that is about to hit us.  Let us hope that the Third Data-Centric Architecture Forum has played a catalytic role in this global transformation, and I hope to have many enjoyable discussions with readers as we evaluate progress on our journey at the next Forum in a year’s time.

Click here to purchase DCAF 2021 Presentation recordings.

Telecom Frameworx Model: Simplified with “gist”

We recently recast large portions of the telecom Frameworx Information Model into an Enterprise Ontology using patterns and reusable parts of the gist upper ontology.  We found that extendingTelecom Frameworx Model: Simplified with “gist” gist with the information content of the Frameworx model yields a simple telecom model that is easy to manage, federate, and extend, as described below.  Realizing accelerating time to market along with simplifying for cognitive consumption being typical barriers for success within the telecom industry, we’re certain this will help overcome a few hurdles to expediting adoption.

The telecommunications industry has made a substantial investment to define the Frameworx Information Model (TMF SID), an Enterprise-wide information model commonly implemented in a relational data base, as described in the GB922 User’s Guide.

Almost half of the GB922 User’s Guide is dedicated to discussing how to translate the Information Model to a Logical Model, and then translate the Logical Model to a Physical Model. With gist and our semantic knowledge graph approach, these transformations were no longer required. The simple semantic model and the data itself are linked together and co-exist in a triple-store data base without requiring transformations.

Click here to read more.

Semantic Arts, co-produced by Phil Blackwood and Dave McComb

The 90s Are Over, Let’s Stop Basing Capital Cost Decisions on Lagging Indicators

Let’s Stop Basing Capital Cost Decisions on Lagging IndicatorsLet’s Stop Basing Capital Cost Decisions on Lagging Indicators. Remember the good old days of emerging digital technology? Accessing information through a dial-up internet connection. Saving data to floppy discs or CDs. Sending emails to have them printed for storage. Mobile connectivity was new, exciting, and… slow compared to what we have today.

In the energy sector, data access limitations influenced the structure of traditional execution workflows for capital projects. It was common – and still is – for project execution models to focus on document-based deliverables over raw data.

The inherent problem with a document-centric approach is that documents take time to produce. Let’s imagine the workflow for a technology evaluation study that:

  • Begins with initial input from multiple departments.
  • Gets reviewed by 2-3 management layers on the project organizational chart.
  • Finally lands on the desk of a senior decision-maker.

This process could easily take two weeks or longer. But what happens during those two weeks? Work doesn’t get paused. The project continues to progress. The information initially collected for the study no longer represents current project conditions. By the time it gets to the decision-maker, the study is based on two-week-old lagging indicators.

A lot can change on a project in that amount of time. Execution workflows built around lagging indicators tend to:

  • Lead to costly and unnecessary errors caused by decisions based on old information.
  • Stymie innovation with rigid and slow processes that limit experimentation.

Click here to read more. 

Originally posted in: Digital Transformation

Click here to Read an Advanced Chapter from the Data-Centric Revolution by Dave McComb

A Data-Centric Approach to Managing Customer Data

by Phil Blackwood, Ph.D.

Without a doubt every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of each customer. However, this customer data is typically scattered across hundreds of applications, its meaning is embedded in code written years ago, and much of its value is locked away in silos. Compounding the problem, stakeholders in different parts of the business are likely to have different views of what the word “customer” means because they support different kinds of interactions with customers.

In this post, we’ll outline how to tackle these issues and unlock the value of customer data. We’ll use semantics to establish simple common terminology, show how a knowledge graph can provide 360 degree views, and explain how to classify data without writing code.

The semantic analysis will have three parts: first we consider the simple use case illustrated in the diagram below, then take a much broader view by looking at Events, and finally we will dive deeper into the meaning of the diagram by using use the concept of Agreements.

Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

The diagram shows an event in which a customer purchases a shirt from a shop. Ask stakeholders around your company what types of events customers participate in, and you are likely to get a long list. It might look something like this (the verbs are from the viewpoint of your company):

  • Answer general questions about products and services
  • Create billing account for products and services
  • Create usage account for a product or service
  • Deliver product or service (including right-to-use)
  • Finalize contract for sale of product or service
  • Help a customer use a product or service
  • Identify a visitor to our web site.
  • Determine a recommender of a product or service
  • Find a user of a product or service
  • Migrate a customer from one service to another
  • Migrate a service from one customer to another
  • Prepare a proposal for sale of product or service
  • Receive customer agreement to terms and conditions
  • Receive payment for product or service
  • Rent product or service
  • Sell product or service
  • Send bill for product or service
  • Ship product

We can model these events using classes from the gist ontology, with one new class consisting of the categories of events listed above. When we load data into our knowledge graph, we link each item to its class and we relate the items to each other with object properties. For example, an entry for one event might look like:

Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

By using categories instead of creating 18 new classes of events, we keep the model simple and flexible. We can round out the picture by realizing that the Person could instead be an Organization (company, non-profit, or government entity) and the Product could instead be a Service (e.g. window washing).

In a green-field scenario, the model and the data are seamlessly linked in a knowledge graph and we can answer many different questions about our customers. However, in most companies a considerable amount of customer data exists in application-centric silos. To unlock existing customer data, we have to first understand its meaning and then we can link it into the knowledge graph by using the R2RML data mapping language. This data federation allows us to write queries using the simple, standard semantic model and get results that include the existing data.

For any node in the knowledge graph, we have a 360 degree view of the data about the node and its context. A Person node can be enriched with data from social media. An Organization node can be enriched with data about corporate structure, subsidiaries, or partnerships.

Now let’s pivot from the broad event-based perspective to look more closely at the meaning of the original example. Implicit in the idea of a sale is an agreement between the buyer and the seller; once the agreement is made, the seller is obligated to deliver something, while the buyer must pay for it. The “something” is a product or service. We can model the transaction like this:Every business needs to have a clear idea of who its customers are and would love to have a 360 degree view of their customer data.

This basic pattern of agreement and obligation covers many use cases. The agreement could be the simple act of placing the shirt on the check-out counter, or it could be a contract. Delivery and payment could coincide in time, or not. Payments or deliveries, or both, could be monthly.

If our Contract Administration group wants a simple way to identify all the customers who have a contract, we can create a Class named ContractCustomer and populate it automatically from the data in our knowledge graph. To do this, we would write an expression similar to a query that defines what we mean by ContractCustomer, declare the Class to be equivalent to the expression, and then run an off-the-shelf, standards-based inference engine to populate the new class. With no code needed … it’s model-driven.

This method of automatically populating classes can be used to support the wide variety of needs of stakeholders in different parts of the company, even though they do not have the same definition of customer. For example, you could provide classes like PayingCustomer and ProductUsers that can be used to simplify the way the data is accessed or to become building blocks in the model to build upon. With this approach, there is no need to try to get everyone to agree on a single definition of customer. It lets everyone stay focused on what will help them run their part of the business.

While many refinements are possible, we’ve outlined the core of a data-centric solution to the knotty problem of managing customer data. The semantic analysis reveals a simple way to capture information about customer interactions and agreements. A knowledge graph supports 360 degree views of the data, and an inference engine allows us to populate classes automatically without writing a single line of code.

I hope you can glean some ideas from this discussion to help your business, and that you get a sense of why semantics, knowledge graphs, and model-driven-everything are three keys to data-centric architecture.

Dispose, Delete, and Discard: Keep your Enterprise Data Tidy Part 3

Those who are familiar with Marie Kondo know that she is a ruthless disposer. If you’ve read parts one and two of this series, you know that the process is more nuanced than just “throw it all away,” but we’ve come to the point in the process where it’s important to focus on discarding. If you haven’t read parts one and two of this series, please do so; they provide context for the content of this post.  Armed with categories that work for your organization and a solid set of values that the data you keep must uphold to be useful to your business, this part of the process is primarily dedicating time to pruning your files and records, and documentation.

Data Lifecycle Policies

“The fact that you possess a surplus of things that you can’t bring yourself to discard doesn’t mean you are taking good care of them.  In fact, it is quite the opposite.” It’s interesting to note that, while there are many book collectors who lament Kondo’s popularity and cry, “You can pry my books out of my cold, dead hands,” there aren’t many librarians who hold this sentiment.  Professionals know that collections must be pruned and managed. In fact, your organization may have one or more policies about managing data and documents.  At a minimum, data lifecycle policies cover three points of a document’s existence within an organization: creation or acquisition, use and storage, and disposition.  These policies may be driven by the systems used to manage your documents (Microsoft SharePoint comes to mind) or they may be driven by government mandates. These should be your guide on what and when you discard.  If your organization has these policies outlined clearly, the hard work is already done, and you can begin using parts one and two as your guide to systematically deleting unneeded data and documentation. It may also be that some of this lifecycle management functionality is encoded in your systems, but it’s important to understand the policies if you’re making the decisions about data disposition. If your organization does not have a data lifecycle policy, you can explore creating one while you work on becoming data centric.

Data Configuration Management

Outside of an overarching strategy or policy for managing your organization’s data and information, your organization may have various configuration management tools in place (e.g., Git or Subversion) to manage drafts and backups. Many large organizations use file sharing systems to govern who has privileges to directories and files.  If you’re attempting to KonMari your files when such systems are in place, it will be necessary to work collaboratively to get access to the files in your control.

When do you actually discard???

One of the key ideas in Marie Kondo’s method is that when you discard, you only discard your own belongings.  If you are the owner and CTO of a company, then you have the freedom to discard what no longer sparks joy.  In a large company, that question of ownership is far more complex and possibly beyond the reader’s paygrade. It might be beyond the CEO’s paygrade. It is certainly beyond the paygrade of the writer, except with a select few files on a laptop and in a removable storage device used for backups.  But the question of ownership can often be established by completing the work recommended in this series of blog posts.  And once you’ve established ownership, even complex ownership, you can use metadata to describe ownership and provenance, making it easier to manage that data’s future state, discarded or otherwise.

Futureproofing your Data

Now that we’ve considered the end of the data lifecycle management picture, take a look at the start—data acquisition and creation.  If you’ve done the work so far of identifying your business processes and assessed how well your data supports your goals and aligned to your data lifecycle management policy (formal or otherwise), you know how important it is to also consider the introduction of new data.  We touched on this in the first two parts, but there’s a subtle difference between considering how data came to be in your collection and considering data that you will include in your collection from this point forward.

This is something you can specify with policy, and it’s something you can anticipate with a robust ontology. However, it’s not as simple as building robust metadata.  An ontology that is carefully anchored to your organization’s processes, has sufficient input from the right subject matter experts, and is developed within a hospitable IT infrastructure, is far more likely to be a sound gatekeeper for your incoming data.

In the IT industry, this is referred to as Futureproofing, and is designed to minimize the need for down-stream development to make corrections to work you’re doing now. It’s often a judgment call as to whether the application or system is introducing too much technical debt, but there is no argument that being able to understand each piece of data that goes into your system is critical to avoiding such debt. The way to ensure your data will be understandable downstream is to have adequate metadata.  If you want your data to be sophisticated and able to support complex information needs, you need to use semantics.

“The secret to maintaining an uncluttered room is to pursue ultimate simplicity in storage so that you can tell at a glance how much you have.” -Marie Kondo

Read Part 1: Does your Data Spark Joy?

Read Part 2: Setting the Stage for Success

Written by Meika Ungricht

The Data-Centric Revolution: The Role of SemOps (Part 1)

We’ve been working on something we call “SemOps” (like DevOps but for Semantic Technology + IT Operations).  The basic idea is how can we create a pipeline to go from proposed enterprise ontology or taxonomy enhancements to “in-production” as frictionlessly as possible.

As so often happens, when we shine the Semantic Light on a topic area, we see things anew.  In this very circuitous way, we’ve come to some observations and benefits that we think will be of interest even to those who aren’t on the Semantic path.

DevOps for Data People

If you’re completely on the data side, you may not be aware of what developers are doing these days.  Most mature development teams have deployed some version of DevOps (Software Development + IT Operations) along with CI/CD (Continuous Integration / Continuous Deployment).

To understand what they are doing it helps to harken back to what preceded DevOps and CI/CD.  Once upon a time, software was delivered via the waterfall methodology.  Months or occasionally years would be spent getting the requirements for a project “just right.” The belief was that if you didn’t get the requirements right up front, the cost to add even a single new feature would cost 40 times what it would cost if the requirement were identified up front.  It turns out there was some good data on this cost factor, and it still casts its shadow any time you try to make a modification to a packaged enterprise application, 40 x is a reasonable benchmark compared to what it would cost to implement that feature outside the package.  This as a side note is the economics that creates the vast number of “satellite systems” that seem to spring up alongside large packaged applications.

Once the requirements were signed off on, the design began (more months or years) then coding (more months or years) finally systems testing (more months or years).  Then the big conversion weekend, the system goes into production, tee shirts are handed out to the survivors and the system becomes IT Operations problem.

There really was only ever, one “move to production” and few thought it worthwhile to invest the energy in making this more efficient.  Most sane people, once they’d stayed up all night on a conversion weekend, were loath to sign up for another, and it certainly didn’t occur to them to find out a way to make it better.

Then agile came along.  One of the tenets of agile was that you always had a working version that you could, in theory, push to production.  In the early days it wasn’t that people were pushing to production on any frequent schedule, but the fact that you always could was a good discipline to avoid technical debt and straying off building hypothetical components.

Over time, the idea that you could push to production became the idea that you should.  As people invested more and more in their unit testing and regression testing, and pipelines to move from dev to QA to production, people became used to the idea of pushing small incremental changes into production systems.  That was the birth of DevOps and CI/CD.  In mature organizations like Google and Amazon, new versions of their software are being pushed to production many times per day (some reports say many times per second, but this may be hyperbole).

The reason I bring it up is because there are some things in there that we expect to duplicate with SemOps, and some that we already have with data (as I was writing this sentence, I was tempted to write “DataOps” and I thought: “is there such a thing?”) A nanosecond of googling later and I found this extremely well written article on the topic from our friends at DataKitchen. They are focusing more on the data analytics part of the enterprise, which is a hugely important area. The points I was going to make were more focused on the data source end of the pipeline, but the two ideas tie together nicely.

Click here to read more on TDAN.com

Sharing Ontologies Globally To Speed Science And Healthcare Solutions

Sharing Ontologies Globally To Speed Science And Healthcare SolutionsThe COVID-19 pandemic is a clear example of how healthcare practitioners require swift access to enormous amounts of diverse information to efficaciously treat patients. They must synthesize individual data (vital signs, clinical history, demographics, and more) with rapidly evolving knowledge about COVID-19 and make decisions relevant to the conditions from which specific patients suffer.ners rely on point-of-care decision support systems to accelerate patient-care analysis and to scale treatments for intake quantities of global pandemics. They analyze a plethora of inputs to produce tailored treatment recommendations, in near real-time, which significantly enhance the quality of treatment.

Ontologies Create The Foundation For Complex Data Analysi

The underlying utility of these systems is widely based on the vast quantities of healthcare knowledge analyzed. Such knowledge must be uniformly represented (at scale) with rich, contextualized descriptions of the full scope of clinical trials, pharmaceutical information, and research germane to the biomedical field that expands daily with each published paper and new findings. This knowledge should be rapidly accessible, reusable, and a sturdy foundation on which to base present and future research in this field, encompassing everything from long-standing maladies like peanut allergies to emergent ones like COVID-19.

Ontologies—evolving conceptual data models with standardized concepts and uniquely fulfill each of these requirements to fuel healthcare research and point-of-care decision support systems, helping save lives when they need saving most.

International Ontology Sharing Is Becoming A Reality

A consortium of researchers recently formed an organization dedicated to standardizing how scientists define their ontologies, which are essential for retrieving datasets as well as understanding and reproducing research. The group called OntoPortal Alliance is creating a public repository of internationally shared domain-specific ontologies. All the repositories will be managed with a common OntoPortal appliance that has been tested with AllegroGraph Semantic Knowledge Graph software. This enables any OntoPortal adopter to get all the power, features, maintainability, and support benefits that come from using a widely adopted, state-of-the-art semantic knowledge graph database.

The first set of ontology repositories making up the OntoPortal Alliance include BioPortal (biomedical and other ontologies used internationally), SIFR (biomedical ontologies in the French language), BMICC MedPortal (biomedical ontologies focused on Chinese users),  AgroPortal (ontologies focused on agronomy and related sciences), and) EcoPortal (ontologies focused on environmental science. The OntoPortal Alliance will be adding more ontology repositories and is open to working with researchers in other domains who want to offer ontologies publicly.

Click here to read the full article at HealthITOutcomes.com

Setting the Stage for Success Part 2

Envisioning Your Dream System with the Marie Kondo Method

Before you begin gathering your belongings, discarding, or reorganizing, Marie Kondo asks you to envision your dream lifestyle.  She insists that this is the critical first step to ensuring success Envisioning Your Dream System with the Marie Kondo Methodwith her method, and she provides some guidance on how to do so and examples from her clients.  The example Marie Kondo uses in her book is a young woman who lives in a tiny apartment, typical of Japanese cities.  Her floor is covered with things and her bed is a storage space when she isn’t sleeping on it.  She comes home from work tired and her living space compounds that exhaustion.  Maria Kondo has a dream and that dream is simple: to have the space be free from clutter, like a hotel suite, where she can come home and relax with tea and a bath before bed.

While the situation may be different for someone who has responsibility for stores of corporate data and systems, the process of envisioning your ideal environment is not.  As you begin to examine your systems, information architecture, data—an information landscape, in general—it’s absolutely critical to have in mind what you want.  Having in mind “better” or “new technology” leads you towards trends and vendors with cool product features that may meet your needs, but more likely will end up contributing to the data and system clutter in the long run.  It may seem like a simplistic question, “What do you want?” but your efforts in defining that will help you navigate the marketplace of emerging technology.  At this point, it is important not to focus on the process or the items in front of you that you may or may not want to keep; rather, envisioning your ideal end-state, be it a living space filled with only things you love or a database filled only with data that supports your business, is what empowers you to move forward.

If you’re a savvy tech professional, you’re already thinking, “This is the requirements gathering process,” and you would be right.  There is no shortage of requirements gathering methodologies out there and most of them are pretty good.  If it gets you to envision an ideal that is vendor and tool agnostic and is based on the needs and desires of your key stakeholders and end-users, your method is fine.  If your requirements include things like, “better search functionality,” or, “more insight into what data we have,” it’s very likely that you’re also in need of some data decluttering.

Get Started by Defining your Categories

The Marie Kondo method requires you to see your belongings in two overarching categories: things that spark joy and everything else.  Everything else should be discarded.  For our purposes, data that sparks joy is data that serves your business.  It is helpful to look at the antithesis of joy to get an idea of what should be kept or discarded.  For example, if you are facing an audit, the Get Started by Defining your Categoriesantithesis of joy is not being able to produce the documentation that the auditor needs to conduct the audit.  That could be because you can’t access it, because what you have isn’t what they need, you don’t have what they need, or what they need is too difficult to find amidst data and information that you have.  In this example, the information that allows you to have peace of mind during an audit is what you should keep. The bigger pattern here is that it’s important to know what business processes, data flows, decision points, and dependencies are impacting your business, and what the inputs and outputs are to those process steps.

Before you can begin to discard by category, you must know what categories drive your business.  Marie Kondo starts by outlining a series of categories that guide her clients through the process of discarding.  She starts with clothing, then books, then papers, then everything else.  She breaks down these categories even further, allowing people with astonishingly large and complex collections of things to take a systematic approach to decluttering. With organizational data, this approach will work, but the way you define the categories depends on the kind of organization you are.

The categories you need should emerge out of your efforts at process improvement. From Investopedia: “Kaizen is a Japanese term meaning ‘change for the better’ or ‘continuous improvement.’ It is a Japanese business philosophy regarding the processes that continuously improve operations and involve all employees. Kaizen sees improvement in productivity as a gradual and methodical process.”(1) Often, semantic work is done alongside large-scale business process improvement efforts.  Businesses want to know what the information inputs and outputs are, and they want to know how that information influences decisions and actions.  These efforts are often iterative, and it’s not uncommon to uncover conflicts in how people understand the data, or what they use it for.  I remember working with a team of medical experts who all used “normal” as a data point in their diagnostic processes.  It took our team years to come up with a good way to encode “normal” because each expert meant something different by the term.  There were heated debates about whether or not “normal” meant within the context of a patient who might be legally blind, in which case a low visual acuity score might be considered normal, or if normal was a cohort or population average, in which case that patient’s low score was not normal.  These conflicts and pain points are like mismatched socks and poorly-fitting jeans: they’re your clue about where you need to look at your data. This is also the starting point for determining which categories you need to use to evaluate your data. Do not strong-arm your conflicts into silence; use them to light the way ahead.

Building the Categories that Matter to You with the Marie Kondo Method

The Marie Kondo method categories are presented in an order that begins by teaching us what it means to feel that spark of joy (clothing) and works through household items that might be useful but not particularly exciting, and ends with items of sentimental value (photos and heirlooms).  One of the big challenges of applying the Marie Kondo method to organizational data is that this rubric and categorization doesn’t easily map to things like clothing and photos.  However, the underlying idea of what is essential to our survival and our comfort does easily translate to data.  Don’t getBuilding the Categories that Matter to You with the Marie Kondo Method bogged down in the details too soon. Marie Kondo advises that you create subcategories according to your need.

When I was organizing my miscellaneous items, I uncovered some camping gear I had purchased a couple years ago with the intention of going on a long bike ride that involved camping at night.  I was unable to go, so I packed the gear away for another time.  As I went through the process of evaluating my belongings using the Marie Kondo method, I decided I’ve always enjoyed camping and I was going to make space in my life for it.  I booked a camping trip for a few days, loaded my gear into a rental car, and put my gear to the test.

This camping trip was rich with lessons, pleasant and painful both. I took the gear I had bought for the bike trip, but since I had a car, I also supplemented it with larger and heavier items I knew would be useful now that I had the space.  Things I thought would be overkill turned out to be very useful: extra flashlight, large water container, spare book of matches, extra pillow, folding chair, extra plastic tub, etc.  Things I was certain I would use ended up coming home unused: pancake mix, spare sleeping bag, two changes of clothes, packets of sample skin and hair products, etc. And I found there were things I needed in the moment that I didn’t have: a lighter, fire starters, strong bug spray, an umbrella, and 4WD.  The underlying lesson here is that your gear should enable the activities you want to do.  And different types of gear serve different types of experiences, even if they’re categorically similar. If you look at the gear belonging to someone who likes glamping and compare it to someone who likes to through-hike the Appalachian Trail, there may not be a whole lot of overlap in the specifics, even though the categories are the same.  This is because your process determines your needs.

Camping gear is often designed to meet basic human needs and provide basic creature comforts.  Complex business processes can draw from this analog example, in that your categories are going to appear around the essential tasks of your business. In many of the projects I’ve done in the past, some effort has been made to identify key information areas that need development using Continuous Improvement or Kaizen principles.  Information artifacts, key concepts, subject headings, however you choose to refer to them, are the overarching conceptual subjects that drive your business.  Using the camping example, this might look like the following: Sleep, Food, Hygiene, Recreation. If you break down sleep, the process could be as simple as laying out a tarp and a blanket and wrapping yourself up in it and going to sleep.  Or it might be as complex as building a platform, building a tent, constructing a bed frame, unfolding sheets, pillows, and blankets, securing the tent, and finally going to sleep. In both scenarios, there are categories for sleep surface, shelter, and bedding.

Another key comparison comes up when considering duplication and re-use.  Chances are, you aren’t going to need a different sleeping bag for each camping scenario.  It’s interesting to note that if you go into an outdoor supply outfitter looking for sleeping bags, you will find a range of options based on very specific situations.  If your business is camping, you just might need several different bags!  But for most people, this just adds complexity and expense.  You do want to make sure the zipper works so you can control the amount of body heat you’re trapping in the bag, and if you’re camping in the cold you might add a blanket. But otherwise, a multi-season sleeping bag that’s comfortable and easy to care for is going to be re-used over and over in many camping scenarios.

For a business, the examples might range from a child’s lemonade stand to Starbucks. The information objects are going to be similar: menu, supplies metrics. Once you’ve established these categories, you can look at your data systematically.  Coming up with these key concepts allows you to define the scope of your work and priorities for development.

What’s Next?

Now that you’ve got a sense of how to create a list of categories based on your business processes, you can begin the process of discarding.  As with the process so far, it’s not as simple as it is for your possessions at home.  Disposition of data within an enterprise, large or small, comes with politics and legal requirements.  In part three, you will see some ideas about where to start with data disposition and how to use your company’s data disposition strategies to your advantage.

Click Here to Read Part 1 of this Series

Footnotes:
(1) https://www.investopedia.com/terms/k/kaizen.asp

The Data-Centric Revolution: Data-Centric vs. Centralization

We just finished a conversation with a client who was justifiably proud of having centralized what had previously been a very decentralized business function (in this case, it was HR, but it could have been any of a number of functions). They had seemingly achieved many of the benefits of becoming data-centric through decentralization: all their data in one place, a single schema (data model) to describe the data, and dozens of decommissioned legacy systems.

We decided to explore whether this was data-centric and the desirable endgame for all their business functions.

A quick review. This is what a typical application looks like:

The metadata is the key. The application, the business logic and the UI are coded to the metadata (Schema), and the data is accessed through and understood by the metadata. What happens in every large enterprise (and most small ones) is that different departments or divisions implement their own applications.

Click on the image to see a larger version.

Many of the applications were purchased, and today, some are SaaS (Software as a Service) or built in-house. What they all fail to share is a common schema. The metadata is arbitrarily different and, as such, the code base on top of the metadata is different, so there is no possibility of sharing between departments. Systems integrators try to work out what the data means and piece it together behind the scenes. This is where silos come from. Most large firms don’t have just four silos, they have thousands of them.

One response to this is “centralization.” If you discover that you have implemented, let’s say, dozens of HR systems, you may think it’s time to replace them with one single centralized HR system. And you might think this will make you Data-Centric. And you would be, at least, partially right.

Recall one of the litmus tests for Data-Centricity:

Let’s take a deeper look at the centralization example.

Click on the image to see a larger version.

Centralization replaces a lot of siloed systems with one centralized one. This achieves several things. It gets all the data in one place, which makes querying easier. All the data conforms to the same schema (and single shared model). Typically, if this is done with traditional technology, this is not a simple model, nor is it extensible or federate-able, though there is some progress.

The downside is that everyone now must use the same UI and conform to the same model, and that’s the tradeoff.

Click on the image to see a larger version.

The tradeoff works pretty well for business domains where the functional variety from division to division is slight, or where the benefit to integration exceeds the loss due to local variation.  For many companies, centralization will work for back office functions like HR, Legal, and some aspects of Accounting.

However, in areas where the local differences are what drives effectiveness and efficiency (sales, production, customer service, or supply chain management) centralization may be too high a price to pay for lack of flexibility.

Let’s look at how Data-Centricity changes the tradeoffs.

Click here to read more on TDAN.com

Does your Data Spark Joy? Part 1

Why is Marie Kondo so popular for home organization?

Does your Data Spark Joy?Marie Kondo released her book, “The Life-Changing Magic of Tidying up,” almost ten years ago and has since gained much notoriety for motivating millions of people to de-clutter their homes, offices, and lives. Some people are literally buried in their possessions with no clear way to get from room to room.  Others simply struggle to get out the door in the morning because their keys, wallet, and phone play a daily game of hide-and-seek. Whatever the underlying cause of this overwhelm, Marie Kondo offers a simple, clear method for getting stuff under control. Not only that, but she promises that tidying up will clear the spaces in our lives, leaving room for peace and joy.

Why does this method apply to Data-Centric Architecture?

You might be wondering what this has to do with data-centric architecture.  In many ways the Marie Kondo method is easily extrapolated out of the realm of physical possessions and applied to virtual things: bits of data, documents, data storage containers, etc. In the world of information and data, it’s not surprising that people have seen parallels between belongings and data.  That said, it’s not enough to just say that new applications, storage methods, or business processes will solve the problems of information overload, data silos, or dirty data.  Instead, it’s important to examine your company’s data and the business that data serves.

Overarching Data-Centric Principles

For most businesses and agencies, data is essential to function and is ensconced in legal requirements and data lifecycle policy.  It simply isn’t realistic to say, “Throw it all out!”  Instead, the principles behind acquiring, using, storing, and eventually discarding things must be understood.  And in the virtual space, we can understand “things” to be data-centric, metadata, and systems.

Her Method Starts with “Why?”

In her book, Marie Kondo says, “Before you start, visualize your destination.”  And she expands on this, asking readers to think deeply about the question and visualize the outcome of having a tidy space: “Think in concrete terms so that you can vividly picture what it would be like to live in a clutter-free space.” Our clients will often engage us with some ideal data situation in mind.  It might be expressed in terms of requirements or use cases, but it often has to do with being able to harmonize and align data, do large-scale systems integration, or add more sophisticated querying capabilities to existing databases or tools.  In fact, the first steps of our client engagements have to do with developing these questions into statements of work.

Also, we encourage clients to envision their data and what it can tell them independently of applications, systems, and capabilities precisely to avoid the pitfall of thinking in terms of using new tools to solve undefined problems.  It’s uncanny that this method of interrogation into underlying motivations is common between data-centric development and spark-joy tidying up.

Her Method is About the Psychology of Belongings.

It is important to understand how organizations come to have their data.  In the US Government, entire programs are devoted to managing acquisition. In finance, manufacturing, and other industries, the process of acquiring systems and data is often a business unto itself. It’s not uncommon to hear people working with data to refer to “data silos” when talking about partitioned and disconnected collections of data.  Sometimes this data is shuffled into classified folders and proprietary systems unnecessarily, simply because someone wants to retain control of it. In my work at the Federal Government, I found that the process of determining the system of record to be intensely political and time-consuming.  It’s not a trivial process and not simple, but it is essential to the effort of tidying your data-centric environment.

Sort your Data by Category.

Marie Kondo recommends going categorically for a reason.  In her book, she talks about her process of evaluating her belongings by location, drawer by drawer, room by room, and discovering that she found herself organizing multiple drawers with the same things repeatedly.  She tells us, “The root of the problem lies in the fact that people often store the same type of item in more than one place.  When we tidy each place separately, we fail to see that we’re repeating the same work in many locations and become locked into a vicious circle of tidying.” If this doesn’t sound familiar, you aren’t even working with data.

For me, this principle became clear when I gathered all my office supplies in one place. I was astounded by the small mountain of binder clips (and Sharpies) that seemed to materialize out of nowhere. I always seem to be looking for binder clips and sharpies, so I was shocked by how many I had.

I can think of no closer parallel than the proliferation of siloed systems that appear in each department within an agency.  When I worked for a government agency, I was part of a team whose job it was to survey the offices to find out who was using flight data.  There were several billion-dollar systems in development and in maintenance that held flight data. Over the course of a few years, I would hear quotes about the agency-wide number of flight data systems go from 15 systems, to 20, to 30, and beyond.  It literally became an inside-joke with leadership. And at times, we would hear rumors about some small branch office that had their own Microsoft Access database to keep track of their own data, because they couldn’t get what they needed from the systems of record.  Systems are like the binder clips of enterprise data, except that this kind of proliferation is as easy as making a copy. You don’t even need to make a trip to the office supply store to end up with a pile of duplicates.  If you want to understand how much data redundancy you have, search for specific categories of data across all systems.

Does it spark Joy? What does joy mean in the context of systems and data?

How do you know what sparks joy?  First, look at how the principle of looking for joy is applied.  Presumably, you are in your line of business because on some level it brings you joy – joy that derives from fulfilling a purpose.  Remember the first step of understanding why you are embarking on a transformative process and go back to what you envisioned.  Another way that you can look at joy is whether your space and the things in it allow for that spark to happen.  Ideally, you remove the items from your space that hinder that spark, after acknowledging the lessons they’ve taught you.  Do you feel that spark of joy when you grab your keys in the morning on the way out the door? If you’ve ever tried to find misplaced keys while you’re in a rush, you know the antithesis of joy. Having done the work of creating a space where your keys are easy to find is a way of facilitating joy in your morning routine.

One of the supposed failures of the Marie Kondo method as it applies to data clutter is that it is impossible to physically hold, or even look at, every single piece of data in your system.  Again, rely on the principle behind her method, which is that it is important to be thorough and aim for an environment that facilitates ease and joy.  Don’t say, “We can’t delete any personnel data!” and quit.  Commit to taking an inventory of your personnel systems and the systems that use personnel data. If that process reveals that you have ten different personnel systems and personnel data scattered in several other systems, you must take a closer look at your data environment.  At one point in my physical de-cluttering, I found a tin full of paper clips.  I didn’t handle each shiny paper clip individually; rather, I acknowledged the paper clips served me when I printed more documents onto paper, and since I no longer had a printer, I decided to toss them into the recycling bin.

Remember why you’re considering a solution to data problems in the first place and make a commitment to doing the work of determining your real data needs. Purpose is key, because the way data sparks joy is by enabling you to fulfil that purpose.  This can be difficult where the work you do is abstract and somewhat removed from business that is easy to understand.  However, the critical point to knowing whether or not the data in front of you serves its purpose in your business is to fully understand your business.

Discard and Delete your Data

Take a wardrobe full of clothing for example.  Many of Marie Kondo’s clients are surprised when they start organizing their wardrobes. It’s surprising when you can see the amount of clothing that is unserviceable, the number of items that still have tags on them, the number of hand-me-downs or gifts that don’t suit you, etc. These items are sometimes difficult to discard because of several reasons:

  • It’s kept out of obligation to the giver.
  • It cost a lot of money to buy it.
  • It’s still in good repair.
  • It might be the perfect thing to wear at an unspecified event in the future.
  • It reminds you of the lovely event at which you wore it.
  • It reminds you of the person who left it with you.

It may seem far-fetched to apply these reasons to data storage, but a quick glance through failed data projects will show you otherwise.  Consider the proprietary data locked in a system owned by a vendor for which your license has lapsed, or the system that’s coded in an outdated language whose expert programmers have to be called out of retirement to access, or that directory of data that doesn’t really match the fields in your database, but you requested through a complex data-sharing agreement with another agency.  If you can’t think of an example of a system that has been paid for but hasn’t been used, just consider that the terms shelfware and vaporware exist. It’s easy to be cynical about data precisely because of the overlaps between why we keep things in our closets and garages, and why we keep systems and data in our repositories. When you consider these parallels and understand the principles behind evaluating the items you keep with the hope that they will make your life better, sparking joy becomes easier.

Storage experts are hoarders.

Marie Kondo says you don’t need more storage.  That new Cloud service that can take all the old databases you have and make them accessible is not going to solve your problem. Data storage is expensive, and you do not need a new data storage solution.  What you need is to understand your business process, the business need for the data you believe you have, and a disposition plan for everything else.

How do you start?

In summary, if you’re looking for smart data-centric solutions to help you manage an overwhelming amount of data, or you’re looking for ways to access your vast stores of data in a way that enables smarter business solutions, your bigger issue might be data hoarding.  Looking at your business needs, closely examining the data you have, and coming up with strategies for aligning your data to a manageable data lifecycle can seem overwhelming.  Using a data-centric approach will bring that dream into focus. Keep an eye out for part two of this series to learn how to get your data to spark joy for you.

Click Here to Read Part 2 of this Series