Screwdrivers & Properties

Screwdrivers generally have only a small set of head configurations (flat, Phillips, hex) because the intention is to make accessing contents or securing parts easy (or at least uniform).Properties & Proliferations

Now imagine how frustrating it would be if every screw and bolt in your house or car required a unique screwdriver head.  They might be grouped together (for example, a bunch of different sized hex heads), but each one was slightly different.  Any maintenance task would take much longer and the amount of time spent just organizing the screwdrivers would be inordinate.

Yet that is precisely the approach that most OWL modelers take when they over-specify their ontology’s properties.

“Avoiding Property Proliferations – Part 1” discusses the pitfalls of habitually applying domains and ranges to properties.

Click here to download the whitepaper.

Greatest hits from the Data-Centric Manifesto

I was just reading through what some folks have written on the Data-Centric Manifesto web site. Thought I’d capture some of the more poignant:

“I believe [Linked] Data Centric approach is the way of the future. I am committing my company to assisting enterprises in their quest to Data-Centric transformation.” -Alex Jouravlev

 

“I have experienced first-hand in my former company the ravages of application-centric architectures. Development teams have rejected SQL-based solutions that performed 10 to 100 times better with less code and fewer resources, all because of application-centric dogma. Databases provide functional services, not just technical services – otherwise they’re not worth the money.” – Stew Ashton

 

“I use THE DATA-CENTRIC MANIFESTO as a mantra, a guide-line, a framework, an approach and a method, with which to add value as a consultant to large enterprises.” -Mark Besaans

 

“A data-centric approach will finally allow IT to really support the way we think and work instead of forcing us to think in capabilities of an application.” -Mark Schenk

 

“The principles of a data-centric approach would seem obvious, but the proliferation of application-centric implementations continues. Recognizing the difference is critical to positive change, and the benefits organizations want and need.” -Kim L Hoover

Data-centric is a major departure from the current application-centric approach to systems development and management. Migration to the data-centric approach will not happen by itself. It needs champions. If you’re ready to consider the possibility that systems could be more than an order of magnitude cheaper and more flexible, then become a signatory of the Data-Centric Manifesto.

Read more here.

Do Data Lakes Make My Enterprise Look Data-Centric?

Dave McComb discusses data lakes, schema, and data-centricity in his latest post on the Data Centric Revolution for The Data Administration Newsletter. Here’s a brief excerpt to pique your interest:The Data-Centric Revolution: Implementing a Data-Centric Architecture

“I think it is safe to say that there will be declared successes in the Data Lake movement. A clever data scientist, given petabytes of data to troll through, will find insights that will be of use to the enterprise. The more enterprising will use machine learning techniques to speed up their exploration and will uncover additional insights.

But in the broader sense, we think the Data Lake movement will not succeed in changing the economics or overall architecture of the enterprise. In a way, the Data Lake is something to do instead of dealing with the very significant problems of legacy ecosystems and dis-economics of change.

Even at the analytics level, where the Data Lake has the most promise, we think it will fall short…

Conceptually, the Data Lake is not far off from the Data Centric Revolution. The data does have a more central position. However, there are three things that a Data Lake needs in order to be Data Centric…”

Click here to read the entire article.

 

Debugging Enterprise Ontologies

Michael Uschold gave a talk at the International Workshop on Completing and Debugging the Semantic Web held in Crete on May 30, 2016.   Here is a preview of the white paper, “Finding and Avoiding Bugs in Enterprise Ontologies” by Michael Uschold:

Finding and Avoiding Bugs in Enterprise Ontologies

Abstract: We report on ten years of experience building enterprise ontologies for commercial clients. We describe key properties that an enterprise ontology should have, and illustrate them with many real world examples. They are: correctness, understandability, usability, and completeness. We give tips and guidelines for how best to use inference and explanations to identify and track down problems. We describe a variety of techniques that catch bugs that an inference engine will not find, at least not on its own. We describe the importance of populating the ontology with data to drive out more bugs. We point out some common ontology design practices in the community that lead to bugs in ontologies and in downstream semantic web applications based on the ontologies. These include proliferation of namespaces, proliferation of properties and inappropriate use of domain and range. We recommend doing things differently to prevent bugs from arising.

Introduction
In a manner analogous to software debugging, ontologies need to be rid of their flaws. The types of flaws to be found in an ontology are slightly different than those found in software, and revolve around the ideas of correctness, understandability, usability and completeness. We report on our experience (spanning more than a decade) in building and debugging enterprise ontologies for large companies in a wide variety of industries including: finance, healthcare, legal research, consumer products, electrical devices, manufacturing and digital assets. For the growing number of companies starting to use ontologies, the norm is to build a single ontology for a point solution in one corner of the business. For large companies, this leads to any number of independently developed ontologies resulting in many of the same heterogeneity problems that ontologies are supposed to solve. It would help if they all used the same upper ontology, but most upper ontologies are unsuitable for enterprise use. They are hard to understand and use because they are large and complex, containing much more than is necessary, or the focus is too academic to be of use in a business setting. So the first step is to start with a small, upper, enterprise ontology such as gist [McComb 2006], which includes core concepts relevant to almost any enterprise. The resulting enterprise ontology itself will consist of a mixture of concepts that are important to any enterprise in a given industry, and those that are important to a particular enterprise. An enterprise ontology plays the role of an upper ontology for all the ontologies in a company (Fig. 1). Major divisions will import and extend it. Ontologies that are specific to particular applications will, in turn, import and extend those. The enterprise ontology evolves to be the semantic foundation for all major software systems and databases that are core to the enterprise.

Click here to download the white paper.

Click here to download the presentation.

Evolve your Non-Temporal Database in Place

At Semantic Arts, we recently decided to upgrade our internal system to turn something that was a not temporal (our billing rates) into something that was. Normally, that would be a pretty big change.  As it turned out, it was pretty straightforward and could be done, as an in place update.  It turned out to be a pretty good mini case study for how using semantics and a graph database can make these kinds of changes far less painful.

So, Dave McComb documented it in a YouTube video.

 

Click here to view: Upgrade a non Temporal Database in Place

Introduction to FIBO Quick Start

We have just launched our “FIBO Quick start” offering.  If you are in the financial industry you likely have heard about the Financial Industry Business Ontology, which has beenFIBO championed by the EDM Council, a consortium of virtually the entire who’s who of the financial industry. We’ve been helping with FIBO almost since its inception, and more recently Michael Uschold has be co-leading the mortgage and loan ontology development effort.  Along the way we’ve done several major projects for financial clients, and have reduced what we know to a safe and quick approach to adopting semantics in the financial sector. We have the capacity to take on one more client in the financial space, so if you’re interested, by all means contact us.

FIBO Quick Start: Developing Business Value Rapidly with Semantics

The Financial Industry Business Ontology is nearing completion. As of June 2016, nine major financial institutions have joined the early adopter program. It is reasonable to expect that in the future all Financial Industry participants will have aligned some of their systems with FIBO. Most have focused their initial projects on incorporating the FIBO vocabulary. This is a good first step and can jump start a lot of compliance work.

But the huge winners, in our opinion, will be the few institutions that see the potential and go all-in with this approach. For sixteen years, we have been working with large enterprises who are interested in adopting semantic technology. Initially, our work focused on architecture and design as firms experimented with ways to incorporate these new approaches. More recently, we have been implementing what we call the “data-centric approach” to building semantically-centered systems in an agile fashion.

Click here to read more. 

Naming an Inverse Property: Yay or Nay?

Inverse Property

Figure 1: Quantum Entanglement

 

For a fuller treatment of this topic, see the whitepaper:  Quantum Entanglement, Flipping Out and Inverse Properties.

An OWL object property is a way that two individuals can be related to each other. Direction is important. For example, consider the two relationships:

  1. being a parent: Michael has Joan as a parent, but Joan has Michael as a child.
  2. guaranteeing a loan: the US government guarantees a loan, but the loan is guaranteed by the US government.

The direction corresponds to which party you are taking the perspective of, the parent or child, the guarantor, or the thing being guaranteed.  From the perspective of the child we might assert the triple: Michael :hasParent Joan.  Note that if Michael has Joan as a parent, then it is necessarily true that Joan has Michael as a child – and vice versa.  So asserting any triple results in the implicit assertion of an inverse triple.  It’s a bit like quantumly entangled particles, you cannot make a change to one w/o immediately affecting the other.

The property from the perspective of the other individual is called the inverse property. OWL provides a way to do refer to it in a triple.  For example, Joan :inverse(hasParent) Jennifer uses the hasParent property from Joan’s perspective to directly assert she has another child.

Figure 2: Property with anonymous inverse

 

If we wish, we can give the inverse property a name. Two good candidates are: hasChild, and parentOf.

Figure 3: Property with named inverse

The question naturally arises: when should you create an explicit named inverse property? There is no universal agreement on this issue, and at Semantic Arts, we have gone back and forth. Initially, we created them as a general rule, but then we noticed some down sides, so now we are more careful.   Below are four downsides of using named inverses (roughly in order of growing importance).  The first two relate to ease of learning and understanding the ontology. The last two relate inference and triple stores.

  1. Names: It can be difficult to think of a good name for the inverse, so you might as well just use the syntax that explicitly says it is the inverse. It will likely be easier to understand.
  2. Cluttered property hierarchy: Too many inverses can significantly clutter up the property hierarchy, making it difficult to find the property you need, and more generally, to learn and understand what properties there are in the ontology, and what they mean.
  3. Slower Inference: Too many named inverses can significantly slow down inference
  4. More Space: If you run inference and materialize the triples, a named inverse will double the number of triples that use a given property

So our current practice is to not create inverses unless we see a compelling reason to do so, and it is clear that those benefits outweigh the downsides.

Semantic Modeling: Getting to the Core

Most large organizations have a lot of data and very little useful information. The reason being, every time they encounter a problem, they build (or more often buy) another computer application system. Each application has its own completely arbitrary data model designed for the task at hand, at that time, and which used whatever simplification seemed appropriate in that instance.

The net result, depending on the size of the organization, is hundreds or thousands of applications— occasionally, tens of thousands—each with its own data model. Each data model has hundreds to thousands of tables, occasionally, tens of thousands (the average SAP install has 95,000 tables), and each table has dozens of columns. The net result is trying to run your company using upwards of millions of distinct data types. For all practical terms, this is impossible.

Most companies spend most of their (very high) IT budget on maintaining these systems (as they are very complex) or attempting to integrate them (and doing a very partial job of it).

This seems pretty bleak and makes it hard to see a way out. What will drop the scales from your eyes is when you see a model that covers all the concepts you use to run your business that has just a few hundred concepts—a few hundred concepts—with a web of relationships between them. Typically, this core is then augmented by thousands of “taxonomic” distinctions; however, these thousands of distinctions can be organized and put into their place for much better management and understanding.

data model

Once you have this core model (or ontology, as we call it, just to be fancy), everything becomes simpler: integration, because you map the complex systems to the sample and not to each other, and application development, because you build on a smaller footprint. And it now becomes possible to incorporate types of data previously thought un-integrate-able, such as unstructured, semi-structured, and/or social media data.

Semantic Arts has built these types of core data models for over a dozen very large firms, in almost as many industries, and helped to leverage them for their future information systems.  We now can do this in a very predictable and short period of time.  We’d be happy to discuss the possibilities with you.

Feel free to send us a note at [email protected].

Written by Dave McComb

The Evolution of the Data-Centric Revolution Part One

We have been portraying the move to a Data-Centric paradigm as a “Revolution” because of the major mental and cultural shifts that are prerequisites to making this shift. In another sense, the shift is the result of a long, gradual process; one which would have to be characterized as “evolutionary.”

This column is going to review some of the key missing links in the evolutionary history of the movement.

(For more on the Data Centric Revolution, see The Data Centric Revolution. In the likelihood that you’re not already data-centric, see The Seven Warning Signs of Appliosclerosis)

Applications as Decks of Cards

In the 50’s and 60’s, many computer applications made very little distinction between data and programs. A program was often punched out on thin cardboard “computer cards.” The data was punched out on the same kind of cards. The two decks of cards were put in the hopper together, and voila, output came out the other end. Payroll was a classic example of applications in this era. There was a card for each employee with their Social Security Number, rate of pay, current regular hours, overtime hours, and a few other essential bits of data. The program referred to data by the “column” numbers on the card where the data was found. Often people didn’t think of the data as separate from the program, as the two were intimately connected.

Click here to view on TDAN.com

What’s exciting about SHACL: RDF Data Shapes

An exciting new standard is under development at the W3C to add some much needed functionality to OWL. The main goals are to provide a concise, uniform syntax (presently called SHACL for Shapes Constraint Language) for both describing and constraining the contents of an RDF graph.  This dual purpose is what makes this such an exciting and useful technology.

RDF Data Shapes

What is a RDF Data Shape?

An RDF shape is a formal syntax for describing how data ishow data should be, or how data must be.

For example:

ex:ProductShape 
	a sh:Shape ;
	sh:scopeClass ex:Product ;
	sh:property [
		sh:predicate rdfs:label ;
		sh:dataType xsd:string;
		sh:minCount 1;
		sh:maxCount 1;
	];
	sh:property [
		sh:predicate ex:soldBy;
		sh:valueShape ex:SalesOrganizationShape ;
		sh:minCount 1;
	].

ex:SalesOrganizationShape
	a sh:Shape ;
	sh:scopeClass ex:SalesOrganization ;
	sh:property [
		sh:predicate rdfs:label ;
		sh:dataType xsd:string;
		sh:minCount 1;
		sh:maxCount 1;
	];

This can be interpreted as a description of what is (“Products have one label and are sold by at least one sales organization”), as a constraint (“Products must have exactly one label and must be sold by at least one sales organization”), or as a description of how data should be even if nonconforming data is still accepted by the system.  In the next sections I’d like to comment on a number of use cases for data shapes.

RDF Shapes as constraints

The primary use case for RDF data shapes is to constrain data coming into a system.  This is a non-trivial achievement for graph-based systems, and I think that the SHACL specification is a much better solution for achieving this than most.  Each of the SHACL atoms can, in principle, be expressed as an ASK query to evaluate the soundness of a repository.

RDF Shapes as a tool for describing existing data

OWL ontologies are good for describing the terms and how they can be used but lack a mechanism for describing what kinds of things have been said with those terms.  Data shapes fulfill this need nicely, which can make it significantly easier to perform systems integration work than simple diagrams or other informal tools.

Often in the course of building applications, the model is extended in ways that may be perfectly valid but otherwise undocumented.  Describing the data in RDF shapes provides a way to “pave the cow paths”, so to speak.

A benefit of this usage is that you get the advantages of being schema-less (since you may want to incorporate data even if it doesn’t conform) while still maintaining a model of how data can conform.

Another use case for this is when you are providing data to others.  In this case, you can provide a concise description of what data exists and how to put it together, which leads us to…

RDF Shapes as an outline for SELECT queries

A nice side-effect of RDF shapes that we’ve found is that once you’ve defined an object in terms of a shape, you’ve also essentially outlined how to query for it.

Given the example provided earlier, it’s easy to come up with:

SELECT ?product ?productLabel ?orgLabel WHERE {
	?product 
		a ex:Product ;
		rdfs:label ?productLabel ; 
		ex:soldBy ?salesOrg .
	?salesOrg
		a ex:SalesOrganization ;
		rdfs:label ?orgLabel .
}

None of this is made explicit by the OWL ontology—we need either something informal (e.g., diagrams and prose) or formal (e.g., the RDF shapes) to tell us how these objects relate in ways beyond disjointedness, domain/range, etc.

RDF Shapes as a mapping tool

I’ve found RDF shapes to be tremendously valuable as a tool for specifying how very different data sources map together.  For several months now we’ve been performing data conversion using R2RML.  While R2RML expresses how to map the relational DB to an RDF graph, it’s still extremely useful to have something like an RDF data shapes document to outline what data needs to be mapped.

I think there’s a lot of possibility for making these two specifications more symbiotic. For example, I could imagine combining the two (since it is all just RDF, after all) to specify in one pass what shape the data will take and how to map it from a relational database.

The future – RDF Shapes as UI specification

Our medium-term goal for RDF shapes is to generate a basic UI from a shapes specification. While this obviously wouldn’t work in 100% of use cases, there are a lot of instances where a barebones form UI would be fine, at least at first.  There are actually some interesting advantages to this; for instance, validation can be declared right in the model.

For further reading, see the W3C’s SHACL Use Cases and Requirements paper.  It touches on these use cases and many others.  One very interesting use case suggested in this paper is as a tool for data interoperability for loose-knit communities of practice (say, specific academic disciplines or industries lacking data consortia).  Rather than completely go without models, these communities can adopt guidelines in the form of RDF shapes documents.  I can see this being extremely useful for researchers working in disciplines lacking a comprehensive formal model (e.g., the social sciences); one researcher could simply share a set of RDF shapes with others to achieve a baseline level of data interoperability.