Dave McComb

Ontology-based Applications

June 21, 2021December 27, 2017 by Dave McComb

Once you have your ontology, you want to put it to use. We will describe a common scenario where data is extracted from various sources including relational databases. That data is then used in conjunction with an application instead of a traditional relational database. Things have advanced from just a few years ago when the main technologies were for representing the schema (RDF, RDFS), the data (RDF), and a query language (SPARQL). Two new and important standards have come out to address extracting data from relational databases and for specifying constraints that are not available in OWL.

One good way to go about building an ontology-based application is as follows:

Create ontology
Create SHACL constraints
Create triples
Build program logic and user interface

This parallels how to build a traditional application. The main difference is you are going to use a triple store to answer SPARQL queries instead of posing SQL queries to a relational database. Instead of creating conceptual, logical, and physical data models along with various integrity constraints, you will be building an ontology and SHACL constraints. Instead of having just one database and one data model per application, you can reuse either or both for multiple applications around the enterprise.

Create Ontology

Create the ontology for the chosen subject matter. Start with a core ontology that can be extended and used in a variety of applications across the enterprise. This is similar to an agile approach, in that you start small and extend. From the start, think about the medium and long term so that additions are natural extensions of the core ontology, which should be relatively stable.

Create SHACL Constraints

The ontology is modeling the real world, independently from any particular application. To build a specific application, you will be choosing a subset of the ontology classes and properties to use. Many but not all of the properties that are optional in the real world will remain optional in your application. Some properties that necessarily hold in the real world as reflected in the ontology will be of no interest for a particular application.

SHACL is a rich and complex standard with many intended uses. Three key ones are:

Communicate what part of the ontology is to be used in the application.
Communicate exactly what the triples need to look like that will be created and loaded into the triple store.
Communicate to a SHACL engine exactly what integrity constraints are to be respected.

This process also forces you to examine all the aspects of the ontology that are needed for the application. It usually uncovers mistakes or gaps in the ontology. See Figure 1.

Figure 1: Creating Ontology, Constraints, and Triples

Create Triples

Triples can come from many sources, including text documents, web pages, XML documents, spreadsheets, and relational databases. The latter two are the most common, and the vendors have supplied tools to support this process. The W3C has also created a standard for mapping a relational schema to an ontology so that triples may be extracted directly from a relational database. That standard is called R2RML[1]. See Figure 2 to see how this works. An R2RML specification for this simple example would indicate the following:

Each row in the corporation table will be an instance of the :Corporation.
The IRI for each instance of :Corporation will use the myd: namespace, and the local name (after the colon) is to be an underscore followed by the value in the ‘CorporationID’ column.
The ‘Subsidiary Of’ column corresponds to the :isSubsidiaryOf property.
The ‘CEO’ column corresponds to the :hasCEO property.
There is a foreign key connecting values of the ‘CEO’ column to a Person table.

With this information, the R2RML engine can reach into the relational database table and extract triples as indicated in Figure 2. Importantly, exactly one triple results from each cell in the table. If there’s a NULL, no triple is created.

If you need to create triples from spreadsheets, you can use vendor tools, create your own tool, or write ad hoc scripts. There is not as much by way of out-of-the-box standards and tools for extracting triples from web pages, XML documents, and text documents. Specialized scraping and natural processing tools may be available.

Figure 2: Tables to Triples

Build Program Logic & User Interface

This phase works much like the development of any other application. The main difference is that instead of querying a relational store using SQL, you are using SPARQL to query a triple store. See Figure 3.

Figure 3: Semantic Application Architecture

[1] https://www.w3.org/TR/r2rml/

Data-Centric’s Role in the Reduction of Complexity

July 23, 2021December 6, 2017 by Dave McComb

Complexity Drives Cost in Information Systems

A system with twice the number of lines of code will typically cost more than twice as much to build and maintain.

There is no economy of scale in enterprise applications. There is dis economy of scale. In manufacturing, every doubling of output results in predictable reduction in the cost per unit. This is often called a learning curve or an experience curve.

Just the opposite happens with enterprise applications. Every doubling of code size means that additional code is added at ever lower productivity. This is because of complex dependency. When you manufacture widgets, each widget has no relationship to or dependency on, any of the other widgets. With code, it is just the opposite. Each line must fit in with all those that preceded it. We can reduce the dependency, with discipline, but we cannot eliminate it.

If you are interested in reducing the cost of building, maintaining, and integrating systems, you need to tackle the complexity issue head on.

The first stopping point on this journey is recognizing the role that schema has in the proliferation of code. Study software estimating methodologies, such as function point analysis, and you will quickly see the central role that schema size has on code bloat. Function point analysis estimates effort based on inputs such as the number of fields on a form, the elements in a transaction, or the columns in a report. Each of these is directly driven by the size of the schema. If you add attributes to your schema they must show up in forms, transactions, and reports, otherwise, what was the point?

I recently did a bit of forensics on a popular and well known high quality application: Quick Books, which I think is representative. The Quick Books code base is 10 million lines of code. The schema consists of 150 tables and 7500 attributes (or 7650 schema concepts in total). That means that each schema concept, on average, contributed another 1300 lines of code to the solutions. Given that most studies have placed the cost to build and deploy software at between $10 and $100 per line of code (it is an admittedly large range but you have to start somewhere) that means that each attribute added to the schema is committing the enterprise to somewhere between $13K and $130K of expense just to deploy, and probably an equal amount over the life of the product for maintenance.

I’m hoping this would give data modelers a bit of pause. It is so easy to add another column, let alone another table to a design; it is sobering to consider the economic impact.

But that’s not what this article is about. This article is about the insidious multiplier effect that not following the data centric approach is having on enterprises these days.

Let us summarize what is happening in enterprise applications:

The size of each application’s schema is driving the cost of building, implementing, and maintaining it (even if the application is purchased).
The number of applications drives the cost of systems integration (which is now 30-60% of all IT costs).
The overlap, without alignment, is the main driver of integration costs (if the fields are identical from application to application, integration is easy; if the applications have no overlap, integration is unnecessary).

We now know that most applications can be reduced in complexity by a factor of 10-100. That is pretty good. But the systems of systems potential is even greater. We now know that even very complex enterprises have a core model that has just a few hundred concepts. Most of the rest of the distinctions can be made taxonomically and not involve programming changes.

When each sub domain directly extends the core model, instead of the complexity being multiplicative, it is only incrementally additive.

We worked with a manufacturing company whose core product management system had 700 tables and 7000 attributes (7700 concepts). Our replacement system had 46 classes and 36 attributes (82 concepts) – almost a 100-fold reduction in complexity. They acquired another company that had their own systems, completely and arbitrarily different, smaller and simpler at 60 tables and 1000 attributes or 1060 concepts total. To accommodate the differences in the acquired company we had to add 2 concepts to the core model, or about 3%.

Normally, trying to integrate 7700 concepts with 1060 concepts would require a very complex systems integration project. But once the problem is reduced to its essence, we realize that there is a 3% increment, which is easily managed.

What does this have to do with data centricity?

Until you embrace data centricity, you think that the 7700 concepts and the 1060 concepts are valid and necessary. You’d be willing to spend considerable money to integrate them (it is worth mentioning that in this case the client we were working with had acquired the other company ten years ago and had not integrated their systems, mostly due to the “complexity” of doing so).

Once you embrace data centricity, you begin to see the incredible opportunities.

You don’t need data centricity to fix one application. You merely need elegance. That is a discipline that helps guide you to the simplest design that solves the problem. You may have thought you were doing that already. What is interesting is that real creativity comes with constraints. And when you constrain your design choice to be in alignment with a firms’ “core model,” it is surprising how rapidly the complexity drops. More importantly for the long-term economics, the divergence for the overlapped bits drops even faster.

When you step back and look at the economics though, there is a bigger story:

The total cost of enterprise applications is roughly proportional to:

These items are multiplicative (except for the last which is a divisor). This means if you drop any one of them in half the overall result drops in half. If you drop two of them in half the result drops by a factor of four, and if you drop all of them in half the result is an eight-fold reduction in cost.

Dropping any of these in half is not that hard. If you drop them all by a factor of ten (very do-able) the result is a 1000 fold reduction in cost. Sounds too incredible to believe, but let’s take a closer look at what it would take to reduce each in half or by a factor of ten.

Click here to read more on TDAN.com

The Core Model at the Heart of Your Architecture

July 23, 2021September 6, 2017 by Dave McComb

We have taken the position that a core model is an essential part of your data-centric architecture. In this article, we will review what a core model is, how to go about building one, and how to apply it both to analytics as well as new application development.

What is a Core Model?

A core model is an elegant, high fidelity, computable, conceptual, and physical data model for your enterprise.

Let’s break that down a bit.

Elegant

By elegant we mean appropriately simple, but not so simple as to impair usefulness. All enterprise applications have data models. Many of them are documented and up to date. Data models come with packaged software, and often these models are either intentionally or unintentionally hidden from the data consumer. Even hidden, their presence is felt through the myriad of screens and reports they create. These models are the antithesis of elegant. We routinely see data models meant to solve simple problems with thousands of tables and tens of thousands of columns. Most large enterprises have hundreds to thousands of these data models, and are therefore attempting to manage their datascape with over a million bits of metadata.

No one can understand or apply one million distinctions. There are limits to our cognitive functioning. Most of us have vocabularies in the range of 40,000-60,000, which should suggest the upper limit to a domain that people are willing to spend years to master.

Our experience tells us that at the heart of most large enterprises lays a core model that consists of fewer than 500 concepts, qualified by a few thousand taxonomic modifiers. When we use the term “concept” we mean a class (e.g., set, entity, table, etc.) or property (e.g., attribute, column, element, etc.). An elegant core model is typically 10 times simpler than the application it’s modeling, 100 times simpler than a sub-domain of an enterprise, and at least 1000 times simpler than the datascape of a firm.

Click here to continue reading on TDAN.com

The Data-Centric Revolution: Gaining Traction

July 23, 2021June 7, 2017 by Dave McComb

There is a movement afoot. I’m seeing it all around me. Let me outline some of the early outposts.

Data-Centric Manifesto

We put out the data-centric manifesto on datacentricmanifesto.org over two years ago now. I continue to be impressed with the depth of thought that the signers have put into their comments. When you read the signatory page (and I encourage you to do so now) I think you’ll be struck. A few just randomly selected give you the flavor:

This is the single most critical change that enterprise architects can advocate – it will dwarf the level of transformation seen from the creation of the Internet. – Susan Bright, Johnson & Johnson

Back in “the day” when I started my career we weren’t called IT, we were called Data Processing. The harsh reality is that the application isn’t the asset and never has been. What good is the application that your organization just spent north of 300K to license without the data? Time to get real, time to get back to basics. Time for a reboot! – Kevin Chandos

This seems a mundane item to most leaders, but if they knew its significance, they would ask why we are already not using a data-centric approach. I would perhaps even broaden the name to a knowledge-centric approach and leverage the modern knowledge management and representation technologies that we have and are currently emerging. But the principles stand either way. – David Chasteen, Enterprise Ecologist

Because I’ve encountered the decades of inertia and want to be an instrument of change and evolution. – Vince Marinelli, Medidata Solutions Worldwide

And I love this one for it’s simple frustration:

In my life I try to fight with silos – Enn Õunapuu, Tallinn University of Technology

Click here to continue reading on TDAN.com

A Semantic Bank

June 14, 2021May 25, 2017 by Dave McComb

What does it mean to be a “Semantic Bank"?

In the last two months I’ve heard at least 6 financial institutions declare that they intended to become “A Semantic Bank.” We still haven’t seen even the slightest glimmer as to what any of them mean by that.

Allow me to step into that breach.

What follows is our take on what it would mean to be a “Semantic Bank.”

The End Game

I’m reluctant to start with the end state, because pretty much anyone reading this, including those who aspire to be semantic banks, will find this to be a “bridge too far.” Bear with me. I know this will take at least a decade, perhaps longer to achieve. However, having the end in mind, allows us to understand with a clarity few currently have, exactly where it is we are wasting our money now.

If we had the benefit of time and could look back from 2026 and ask ourselves “which of our investments in 2016 were really investments, and which were wastes of money?” how would we handicap the projects we are now funding? Now to be clear, not all expenditures need to be leading to the semantic future. There are tactical projects that are worth so much in the short term that we can overlook the fact that we are anti-investing the future. But we should be aware of when we are doing this, and it should be an exception. The semantic bank of the future will be the organization that can intentionally divert the greatest percent of their current IT capital spend toward their semantic future.

A Semantic Bank will be known by the extent to which its information systems are mediated by a single (potentially fractal, but with a single simple core) conceptual model. Unlike conceptual models of the past, this one will be directly implemented. That is, a query to the conceptual model will return production data, and a transaction expressed in conceptual model terms will be committed, subject to permissions and constraints which will also be semantically described.

Semantics?

For those who just wandered into this conversation: semantics is the study of meaning. Semantic Technology allows us to implement systems, and to integrate systems at the level of conceptual meaning, rather that the level of structural description (which is what traditional technology relies on).

It may sound like a bit of hair splitting, but the hair splitting is very significant in this case. This technology allows practitioners to drop the costs of development, integration and change by over an order of magnitude, and allows incorporation of data types (unstructured, semi structure and social media for instance) that hitherto were difficult to impossible to integrate.

It accomplishes this through a couple of interesting departures from traditional development:

All data is represented in a single format (the triple). There aren’t hundreds or thousands of different tables, there is just the triple.
Each triple is an assertion, a mini sentence composed of a subject, predicate and object. All data can be reduced to a set of triples.
All the subjects, all the predicates, and most of the objects are identified with globally unique identifiers (URIs, which are analogous to URLs)
Because the identifiers are globally unique, the system can join records, without an analyst or programming having to write the explicit joins.
A database that assembles triples like this, is called a “triple store” and is in the family of “graph databases.” A semantic triple store is different from a non semantic database in that it is standards compliant and supports a very rich schema (even though it is not dependent on having a schema).
Every individually identifiable thing (whether a person, a bank account or even the concept of “Bank Account”) is given a uri. Whereever the uri is stored or used it always means exactly the same thing. Meaning is not dependent on context or location.
New concepts can be formed by combining existing concepts.
The schema can evolve in place, even in the presence of a very large database dependent on it.

A set of concepts so defined is called an “Ontology” (loosely an organized body of knowledge). When the definitions are shared at the level of the firm, this is called an “Enterprise Ontology.”

Our experience has been that using these semantic approaches an ontology can be radically simpler, and at the same time more precise and more complete, than traditional application databases. When the semantics are done at the firm level the benefits are even greater, because each additional application is benefiting from the concepts shared with the others.

Business Value

What is the business value of rethinking information systems? They come in two main varieties: generic and specific.

Generic Value

Dropping the cost of change by a factor of 10 has all sorts of positive value. Systems that were too difficult to change become malleable.

The integration story is even better: once all the similar concepts are expressed in a way that their similarity is obvious and baked into their identity, systems integration, currently one of the largest costs in IT will become almost free

Back to the End Game

In the end game, a semantic bank will have all their systems directly implemented on a shared semantic model. The scary thing is: who has a better shot at this, the established oligarchy (the “too big to fail”) or FinTech? Each have about half the advantages. Queue Clayton Christiansen’s “Innovators Dilemma” : in some situations a new upstart enters a market with superior technology and the incumbents crush the upstart. In other situations, the upstart creates a beachhead in an underserved market and continually walks their way up the value chain until the incumbents are on the ropes. What is the difference and how will it play out with the “Semantic Banks?” is the ultimate question.

A Bit More on the Target

Most vendors have a tendency to see the future in terms of the next version of their offering.

In the future, a progressive firm will have an “enterprise ontology” that represents the key concepts that they deal with. Currently they have thousands of application systems, each of which has thousands of tables, and tens of thousands of columns that they manage. In aggregate they are managing millions of concepts.

But really, there are a few hundred concepts that everything they deal with are based on. When we figure out what these few hundred concepts, we have started down the road of profound simplicity.

Once you have this model (the “core ontology”) you are armed with a weapon that delivers on three fronts:

All integration can be mediated through this model. By mapping into and out of the shared model, all integration becomes easier
New development can be made incredibly simpler. Building an app on a model that is 10 times simpler than normal and 100 times simpler than the collective model of the firm, economizes the application development and integration process.
The economics of change become manageable. Currently there is such a penalty for changing an information system, that we spend inordinate amount of energy staving off changes. In the semantic future change is economically (not free but far far less than current costs). Once we get to that point, the low cost of change translates into rapidly evolvable systems.

What Will Distinguish the Leaders in the Next Five Years?

Only the smallest start up will be completely semantic within the next five years. If they develop a semantic core, their challenge will be growing out to overtake the incumbents.

This white paper is mostly written for the incumbents (by the way we are happy to help FinTech startups, but our core market is established players dealing with legacy issues) .

Most financial services companies right now are executing “proof of concept” projects. Those that do this may well be the losers. NASA has a concept called “TRL” (Technology Readiness Level) they have a scale of 1-9 with 1-3 being levels wacky ideas that no one has any idea whether they could be implemented to 7-9 being technology that has already been commercialized and there is no more risk left in implementation. Experiments are typically done in level 1-3, to learn what else do we need to know to make this technology real. Proofs of Concept are typically done in levels 4-6 to narrow down some implementation parameters. The issue is, all the important semantic technology is at level 8 or 9. Everyone knows it works and knows how it works. The companies who are doing “proof of concept projects” in semantic technology at this point are vamping[1] and will ultimately be eclipsed by companies who can commit when appropriate.

What are the benefits of becoming semantic?

The benefits of adopting this approach are so favorable that many people would challenge our credibility for suggesting them (sounds like hype) but these differences are really true, so we won’t shrink from our responsibility for the sake of credibility.

Integration

When you map your existing (and especially future) systems to a simple, shared model, the cost of integration plummets. Currently integration consumes 30-60% of the already bloated IT budget because analysts are essentially negotiating agreement between systems that each have tens of thousands of concepts. That’s hard.

What’s easy (well relatively easy) is to maps a complex system to a simple model. Once you’ve done this, it is integrated to all the other systems that have also been mapped to that model. It becomes the network effect of knowledge.

Application Development

A great deal of the cost of application development is the cost of programming to a complex data model. Semantic Technology helps at two levels. The first level is by reducing the complexity of the model, any code dependent on the model is reduced proportionately. The second level is that semantic technology is very compatible with RESTful development. The RESTful approach encourages a style of development that is less dependent on and less coupled to, the existing schema. We have found that a semantic based system using RESTful APIs is amazingly resilient to changes in the model (other than those that introduce major structural changes, but that is a commercial for getting your core ontology right to start with)

New Data Types

Many leading edge projects are predicated on being able to incorporate data that was hitherto unrealistic to incorporate. This might be unstructured data, it might be open data, it might we social media. All of these are difficult for traditional technology, but semantic technology takes in stride.

Observations from other industries

Our observation about what has worked in other industries (which by the way are also minimally converted to semantic technology, but the early adopters provide some important signposts for what works and what doesn’t)

Vision and Constancy Trump Moon Shots

What we have seen from the firms that have implemented impressive architectures based on semantics, is that a small team, with continual funding vastly outperforms attempts to catch up with huge projects. The most impressive firms have had a core of 3-8 people who were at it continually for 2-4 years. Once you reach critical mass with these teams and the capability they create, putting 50-100 people on a catch up project will never catch them. The lead that can be established now with a small focused team, will open up an insurmountable lead 3-5 years from now, when this movement becomes obvious.

The Semantic Bank Maturity Model

Eventually we will come to the point where we will want to know “how semantic are you?” Click here to take an assessment to discover the answer to this questions.

We will take this up in a separate white paper, with considerably more detail, but the central concept is: what percent of your data stores are semantically enabled and how semantic are they really?

Getting Started

Let’s assume you want to take this on and become a “Semantic Bank”. How do you go about it?

What we know from other industries is the winner is not the firm that spends the most, or even who starts first (although at some point failing to start is going to be starting to fail). The issue is who can have a modest, but continual initiative. This means that the winner will be the firm that can finance a continual improvement project over several years. While you might make a bit of incremental progress through a series of tactic projects, the big wins will come from the companies that can set up an initiative and stick with it. We have seen this in healthcare, manufacturing and publishing, we expect it to be true in financial services as well.

Often this means that the sponsor must be at a position where they can dedicate a continual (but not very large) budget to achieve this goal. If that is not you, you may want to start the conversation with the person who can make a difference. If this is you, what are you waiting for?

[1] Vamping is term professional jugglers use to refer to the act you perform when you drop a juggling club. Vamping is the process of continuing the cadence with an imaginary club until you can find a moment to lift the dropped club back into the rotation.

The Data-Centric Revolution: Integration Debt

July 23, 2021March 1, 2017 by Dave McComb

Integration Debt is a Form of Technical Debt

As with so many things, we owe the coining of the metaphor “Technical Debt” to Ward Cunningham and the agile community. It is the confluence of several interesting conclusions the community has come to. The first was that being agile means being able to make a simple change to a system in a limited amount of time, and being able to test it easily. That sounds like a goal anyone could get behind, and yet, this is nearly impossible in a legacy environment. Agile proponents know that any well-intentioned agile system is only six months’ worth of entropy away from devolving into that same sad state where small changes take big effort.

One of the tenants of agile is that patterns of code architecture exist that are conducive to making changes. While these patterns are known in general (there is a whole pattern languages movement to keep refining the knowledge and use of these patterns), how they will play out on any given project is emergent. Once you have a starting structure for a system, a given change often perturbs that structure. Usually not a lot. But changes add up, and over time, can greatly impede progress.

One school of thought is to be continually refactoring your code, such that, at all times, it is in its optimal structure to receive new changes. The more pragmatic approach favored by many is that for any given sprint or set of sprints, it is preferable to just accept the fact that the changes are making things architecturally worse; as a result, you set aside a specific sprint every 2-5 sprints to address the accumulated “technical debt” that these un-refactored changes have added to the system. Like financial debt, technical debt accrues compounding interest, and if you let it grow, it gets worse—eventually, exponentially worse, as debt accrues upon debt.

Integration Debt

I’d like to coin a new term: “integration debt.” In some ways it is a type of technical debt, but as we will see here, it is broader, more pervasive, and probably more costly.

Integration debt occurs when we take on a new project that, by its existence, is likely to lead someone at some later point to incur additional work to integrate it with the rest of the enterprise. While technical debt tends to occur within a project or application, integration debt takes place across projects or applications. While technical debt creeps in one change at a time, integration debt tends to come in large leaps.

Here’s how it works: let’s say you’ve been tasked with creating a system to track the effectiveness of direct mail campaigns. It’s pretty simple – you implement these campaigns as some form of project and their results as some form of outcomes. As the system becomes more successful, you add in more information on the total cost of the campaign, perhaps more granular success criteria. Maybe you want to know which prospects and clients were touched by each campaign.

Gradually, it dawns that in order to get this additional information (and especially in order to get it without incurring more research time and re-entry of data), it will require integration with other systems within the firm: the accounting system to get the true costs, the customer service systems to get customer contact information, the marketing systems to get the overlapping target groups, etc. At this point, you recognize that the firm is going to consume a great deal of resources to get a complete data picture. Yet, this could have been known and dealt with at project launch time. It even could have been prevented.

Click here to read more on TDAN.com

Greatest hits from the Data-Centric Manifesto

July 1, 2021January 20, 2017 by Dave McComb

I was just reading through what some folks have written on the Data-Centric Manifesto web site. Thought I’d capture some of the more poignant:

“I believe [Linked] Data Centric approach is the way of the future. I am committing my company to assisting enterprises in their quest to Data-Centric transformation.” -Alex Jouravlev

“I have experienced first-hand in my former company the ravages of application-centric architectures. Development teams have rejected SQL-based solutions that performed 10 to 100 times better with less code and fewer resources, all because of application-centric dogma. Databases provide functional services, not just technical services – otherwise they’re not worth the money.” – Stew Ashton

“I use THE DATA-CENTRIC MANIFESTO as a mantra, a guide-line, a framework, an approach and a method, with which to add value as a consultant to large enterprises.” -Mark Besaans

“A data-centric approach will finally allow IT to really support the way we think and work instead of forcing us to think in capabilities of an application.” -Mark Schenk

“The principles of a data-centric approach would seem obvious, but the proliferation of application-centric implementations continues. Recognizing the difference is critical to positive change, and the benefits organizations want and need.” -Kim L Hoover

Data-centric is a major departure from the current application-centric approach to systems development and management. Migration to the data-centric approach will not happen by itself. It needs champions. If you’re ready to consider the possibility that systems could be more than an order of magnitude cheaper and more flexible, then become a signatory of the Data-Centric Manifesto.

Do Data Lakes Make My Enterprise Look Data-Centric?

July 14, 2021December 28, 2016 by Dave McComb

Dave McComb discusses data lakes, schema, and data-centricity in his latest post on the Data Centric Revolution for The Data Administration Newsletter. Here’s a brief excerpt to pique your interest: The Data-Centric Revolution: Implementing a Data-Centric Architecture

“I think it is safe to say that there will be declared successes in the Data Lake movement. A clever data scientist, given petabytes of data to troll through, will find insights that will be of use to the enterprise. The more enterprising will use machine learning techniques to speed up their exploration and will uncover additional insights.

But in the broader sense, we think the Data Lake movement will not succeed in changing the economics or overall architecture of the enterprise. In a way, the Data Lake is something to do instead of dealing with the very significant problems of legacy ecosystems and dis-economics of change.

Even at the analytics level, where the Data Lake has the most promise, we think it will fall short…

Conceptually, the Data Lake is not far off from the Data Centric Revolution. The data does have a more central position. However, there are three things that a Data Lake needs in order to be Data Centric…”

Click here to read the entire article.

Data-Centric vs. Data-Driven

July 23, 2021September 21, 2016 by Dave McComb

In this column, I am making the case for Data Centric architectures for enterprises. There is a huge economic advantage to converting to the data-centric approach, but curiously few companies are making the transition. One reason may be the confusion of Data Centric with Data Driven, and the belief that you are already on the road to data centric nirvana, when in fact you are nowhere near it.

Data-Centric

Data-centric refers to an architecture where data is the primary and permanent asset, and applications come and go. In the data-centric architecture, the data model precedes the implementation of any given application and will be around and valid long after it is gone.

Many people may think this is what happens now or what should happen. But it very rarely happens this way. Businesses want functionality, and they purchase or build application systems. Each application system has its own data model, and its code is inextricably tied with this data model. It is extremely difficult to change the data model of an implemented application system, as there may be millions of lines of code dependent on the existing model.

Of course, this application is only one of hundreds or thousands of such systems in an enterprise. Each application on its own has hundreds to thousands of tables and tens of thousands of attributes. These applications are very partially and very unstably “interfaced” to one another through some middleware that periodically schleps data from one database to another.

The data centric approach turns all this on its head. There is a data model—a semantic data model (but more on that will be in a subsequent white paper)—and each bit of application functionality reads and writes through the shared model. If there is application functionality that calculates suggested reorder quantities for widgets, it will make its suggestion, and add it to the shared database, using the common core terms. Any other system can access the suggestions and know what they mean. If the reordering functionality goes away tomorrow, the suggestions will still be there.

Click here to read more on TDAN.com

Evolve your Non-Temporal Database in Place

August 3, 2021July 1, 2016 by Dave McComb

At Semantic Arts, we recently decided to upgrade our internal system to turn something that was a not temporal (our billing rates) into something that was. Normally, that would be a pretty big change. As it turned out, it was pretty straightforward and could be done, as an in place update. It turned out to be a pretty good mini case study for how using semantics and a graph database can make these kinds of changes far less painful.

So, Dave McComb documented it in a YouTube video.

Click here to view: Upgrade a non Temporal Database in Place

Create Ontology

Create SHACL Constraints

Figure 1: Creating Ontology, Constraints, and Triples

Create Triples

Figure 2: Tables to Triples

Build Program Logic & User Interface

Figure 3: Semantic Application Architecture

Complexity Drives Cost in Information Systems

What is a Core Model?

Elegant

Data-Centric Manifesto

The End Game

Semantics?

Business Value

Generic Value

Back to the End Game

A Bit More on the Target

What Will Distinguish the Leaders in the Next Five Years?

What are the benefits of becoming semantic?

Integration

Application Development

New Data Types

Observations from other industries

Vision and Constancy Trump Moon Shots

The Semantic Bank Maturity Model

Getting Started

Integration Debt is a Form of Technical Debt

Integration Debt

Data-Centric

Contact Us

Learn More