Data-Centric Explained

Data-centric is the opposite of application-centric. At Semantic Arts we put data and the model at the center as the most important part of the system. Applications conform to the data, not the other way around. Instead of transforming the data for each new viewpoint, keep the data in its pure form so that it can be reused for other purposes.

For years, information systems developers have been applications centric. They copy data over and over for each of the thousands of applications across their enterprise. Each application has its own data model and consists of hundreds or often thousands of tables. Each table has dozens of columns – many with conflicting column names. One application is complex; thousands are the product of all that complexity.

To combine data from these systems, organizations employ armies of data analysts to unravel the data models and reconstruct them as yet one more island of redundant information and more ‘integration debt’ for the organization to address. The reality is a fairly simple model covers virtually all the information that is being managed by your enterprise.

Our approach to data-centric architecture has demonstrated that leveraging semantic standards, graph technology and a commitment to elegant simplicity is the key to long term system and data improvement.

Watch Dave McComb, CEO of Semantic Arts, provide a more in-depth overview of Data-Centricity in his 2021 presentation at the Data-Centric Architecture Forum, titled “What is Data-Centric? Ideals, Goals, and Concepts" here.

Understanding the Data Dilemma

Data represents our customers, products, people and processes and is an essential factor of input into every aspect of our operations. The primary goal of good data management is to ensure that the meaning of the data is consistent and precise as it flows across processes. This is not currently the case. We’ve allowed data to become out of sync because of technology fragmentation, rigid database structures and the proliferation of applications systems. These problems are compounded because data is often tied to proprietary data models that are managed independently and not reusable.

To address these issues firms copy, move, transform and rename data many times to make the glossaries that drive the software that feeds applications that propel our business processes work. The result is data incongruence - where the meaning from one repository is not the same as meaning from another repository – particularly as we seek to link processes across lines of business. In addition to data incongruence, firms suffer from the limitations of proprietary technology that was state-of-the-art two generations ago where data is organized into columns and stored into tables linked together using internal keys. The reality is that big firms are supporting many thousands of tables, some with conflicting columns names and all with relationships that must be explicitly structured.

The result is that we have allowed data to become isolated, mismatched and inflexible. We spend significant effort moving data and reconciling its meaning.

We refer to this problem as the “bad data tax” and it is a serious liability to organizations. It costs firms somewhere between 40-60% of their total IT budgets just to reconcile meaning and map data into their applications. The bad data tax diverts resources from business goals, extends time-to-value and leads to business frustration. It doesn’t have to be this way.

The solution to this data dilemma is actually quite simple and can be accomplished at a fraction of the cost of what organizations spend each year supporting the cottage industry of data integration workarounds.

The pathway forward doesn’t require ripping everything out but rather the construction of a semantic “graph” layer across data to connect the dots and restore context. This is the future of modern data and analytics and a critical enabler for you to get more value and insight out of your data.

The Data-Centric Paradigm Shift

What if your data is just making things worse?
This short video introduces a different way to think about enterprise data—and why a shift to a data-centric approach changes everything.

Becoming data-centric isn’t a single project. It’s a journey.
The good news: it can start small, deliver value quickly, and build momentum over time.

Demystifying Data Standards

Data-centric uses the power of semantic technology to provide foundational capabilities that work together to create business value. We assign a unique identifier to every data concept. This enables firms to link data wherever it resides to one master ID – eliminating the need to continually move and map data across the enterprise. We are specialists at building ontologies to ensure a shared understanding of requirements between business stakeholders and application developers. These two standards expressed in the language of the Web provide a cost-efficient, non-intrusive breakthrough for data management.

THESE BREAKTHROUGHS INCLUDE

Quality by Math

Linking your data to the ontology ensures precision of meaning about concepts, systems, people and processes. This means that errors and definitional conflicts are verified before they are introduced into operational systems.

Access Control

Security is embedded into the design of the data and not constrained by either systems or administrative complexity. Business rules can be modeled for all circumstances and controlled at the applications and data levels.

Continuous Testing

Use cases are linked to automated testing procedures. If there are changes to authoritative sources, the downstream implications and dependencies are tracked and tested.

Concept Reuse

Using semantic standards eliminates the problem of doing the same thing in slightly different ways because they focus on concepts, not specific applications. Users always understand what the data represents which enables an efficient reuse of important concepts across systems and processes.

Governance

Semantic standards use the capabilities of resolvable identity, precise meaning and structural validation to shift the governance focus from people-intensive data reconciliation to more automated data applications.

Machine Actionable

Semantic technologies empower machines to interpret and reason over data by establishing a shared, explicit representation of meaning across disparate sources. Crucially, knowledge graphs provide essential guardrails for reliable and higher-performing Generative AI implementations.

The Value Proposition

There are three standard KPIs that are meaningful to executive stakeholders and help explain “why” data-centric translates into good policy.

Operational Efficiency

This is about cost containment and process automation. Firms can take back the 40% of money that is wasted on data integration by standardizing meaning, reducing the need to move and reconcile data and eliminating redundant systems.

Enhancing Capability

Take advantage of flexible query for better customer profiling and targeted selling. By eliminating the rigid schemas of relational technology, analysts have the tools they need to ask questions of the data instead of spending time as ‘data janitors’ restructuring it and reconciling its meaning.

Aggregate Data

Combine data across lines of business to mitigate operational risk and support compliance with regulatory requirements. With a data-centric approach, you can control access at a data level to trace the flow of data, protect intellectual property and secure sensitive data from falling into the wrong hands.

No matter how you examine it, the value proposition is overwhelming. By adopting data-centric architecture, firms take a giant step forward in solving the problems of technology fragmentation plus add a whole suite of operational capabilities that were not previously possible. Data-centric should be viewed as the infrastructure for the digital world.

Why Data-Centric is Unique

Most enterprises are ‘applications-centric’ where data is thought of as secondary. We advocate the opposite approach where data is at the center of your architecture (i.e., stored only once and then retrieved when needed). Instead of having to transform the data for each new viewpoint, keep the data in its pure form so that it can be reused for other purposes. What follows are enterprise systems that you might think will make you data-centric, but fall short of the core goal.

Data-Centric is Different From Other Enterprise Systems

Data Warehouse

A data warehouse is a centralized repository for storing data from various systems and conforming it to a common model for reporting. This is similar to data-centric but with several limitations. The extract, transform and load (ETL) process of getting data from source systems to the warehouse is complex and time consuming. It requires the firm to harmonize the format of the data and define the schema before analysis. The biggest challenge is this transformation part because it involves understanding what the data means as well as conforming it to the one central model. Adopting the data-centric approach achieves the benefits of a data warehouse but with shared meaning.

Data Lake

The data lake is a centralized repository for storing data of any type. The schema is applied when the data is accessed which means that raw data can be ingested without upfront structuring. The allure of a data lake was to postpone the challenges of data transformation and just focus on the extract and load requirements. And while data lakes were able to handle diverse data types, they suffered from data quality controls and the complexity of data integration, leading to ‘data swamps’ where the data becomes disorganized and unreliable. The data-centric approach gives assurance of shared meaning to both facilitate interoperability and provide contextual understand of the content.

Data Fabric

The basic idea with data fabric was to create a single interface (i.e., an API) that would abstract all the differences from various databases, so consumers of data could consume from a unified view. This sounds good in theory - and while data fabrics do provide seamless access - they don’t harmonize the meaning (or context) of the data. By creating a unified (and simple) model of all the data of a firm, the data-centric approach can achieve the goal of data fabric while ensuring integration of disparate data sources.

Data Mesh

The data mesh as popularized by Zhamak Dehghani was a new approach to data architecture that distributes data ownership across different domains. It was motivated by the bottlenecks caused by the centralization of data management as well as the rise of "data as a product” which enabled teams to manage their own data pipeline. The challenge was this decentralization approach made it harder to integrate data. Data-centric minimizes the centralization burden and adds a semantic layer to give users the best of both worlds – agility and autonomy as well as assurance that these data products don’t become data silos.

The Data-Centric Manifesto

Dave McComb proposed the Data Manifesto as part of his ongoing work on data-centric approaches to enterprise architecture and information management. The manifesto advocates the core principles of data-centricity and promotes the use of semantic capabilities and ontologies to create a shared understanding of data across an enterprise. Dave’s approach argues for simplifying data structures and reducing complexity in data management to make data more adaptable to changing business needs and technological landscapes. Sign the manifesto at http://datacentricmanifesto.org/

Sign The Manifesto