Decades, Planets and Marriage

Google ontologist, Denny Vrandečić started a vigorous thread on the question of what constitutes a decade. See for example, the article: “People Can’t Even Agree On When The Decade Ends”. This, is a re-emergence of the question from 20 years ago on whether the new millennium will/did start on January 1 of 2000 or 2001.This is often posed as a mathematical conundrum, and math certainly plays a role here, but I think its more about terminology than it is about math. It reminds me of the question of whether Pluto is a planet. It is also relevant to ontologists.

The decade question is whether the 2020s did start on January 1, 2020 or will start on January 1, 2021. Denny noted that: “The job of an ontologist is to define concepts”. This is true, but ontologists often have to perform careful analysis to identify what the concepts are that really matter. Denny continued: “There are two ways to count calendar decades…”. I would put it differently and say: “The term ‘calendar decade’ is used to refer to at least two different concepts.”

At last count, there were 72 comments arguing exactly why one way or the other is correct. The useful and interesting part of the that discussion centers on identifying the nuanced differences between those two different concepts. The much less interesting part is arguing over which of these concepts deserves to be blessed with the term ‘calendar decade’. The latter is a social question, not an ontological question.

This brings us to Pluto. The interesting thing from an ontology perspective is to identify the various characteristics of bodies revolving around the sun, and then to identify which sets of characteristics correspond to important concepts that are worthy of having names. Finally, names have to be assigned: e.g. asteroid, moon, planet. The problem is that the term, ‘planet’, was initially used fairly informally to refer to one set of characteristics and it was later determined that it should be assigned to a different set of more precisely defined characteristics that scientists deemed to be more useful than the first. And so the term ‘planet’ now refers to a slightly different concept than it did before. The social uproar happened because the new concept no longer included Pluto.

A more massive social as well as political uproar arose in the past couple of decades around the term, ‘marriage’. The underlying ontological issues are similar. What are the key characteristics that constitute a useful concept that deserves a name? It used to be generally understood that a marriage was between a man and a woman, just like it used to be generally understood what a planet was. But our understanding and recognition of what is, should or could be, changes over time and so do the sets of characteristics that we think are deserving of a name.

The term planet was given a more restricted meaning, which excluded Pluto. The opposite was being argued in the case of marriage. People wanted a gender-neutral concept for a committed relationship; it was less restrictive. The term ‘marriage’ began to be used to include same-gender relationships.

I am aware that there are important differences between the decades, planets and marriages – but in all three cases, there are arguments about what the term should mean. Ironically and misnomeristically (if that’s a word), we refer to the worrying about what to call things as “just semantics”. Use of this phrase implies a terms-first perspective, i.e. you have a term, and you want to decide what it should mean. As an ontologist, I find it much more useful to identify the concepts first, and think of good terms afterwards. I wrote a series of blogs about this a few years ago.

What is my position on the decade question? If I was King, I would use the term ‘decade’ to refer to the set of years that start with the same 3 digits. Why? Maybe for the same reason that watching my car odometer change from 199999 to 200000 is more fun than watching it change from 200000 to 200001. The other proposed meaning for ‘calendar decade’ is not very interesting to me. So I would not bother to give it any name. But your mileage may vary.

Enterprise Ontology, Semantic Silos, and Cowpaths

Paving Cow Paths

Numerous modern day streets in downtown Boston defy logic – until you realize that the city fathers literally paved over the transit system created and used by cows.*  This gave the immediate benefit of getting places faster, while losing out on longer-term gains that designing a purpose-built street plan could have yielded.  This type of thing is pervasive in today’s enterprise ranging from computerizing paper forms to the plethora of information silos requiring an enterprise ontology– the subject of today’s blog.

Figure 1: Paving the cowpaths in Boston**

Semantic Technology

Semantic Arts works with a wide variety of companies and, unlike just a few years ago, it is now common for our new clients to already have a number of efforts and groups exploring semantic technology in-house.  Gone is fear of the ‘O word’. In its place are a range of projects and activities such as creating ontologies, working with triple stores, and creating proofs of concept. Unfortunately, what we see very little of is coordination of these efforts throughout the enterprise.

It would be mistaken to regard this as a misuse of the technology, because point solutions will often result in significant benefits locally – just like paving cow paths gave immediate gains. It’s more a missed opportunity in the form of a great irony. The very technology designed to break down silos gets used to build yet more silos – Semantic Silos.

Figure 2: Avoid Semantic Silos

Building semantic silos is an easy trap to fall into, because it takes a while to fully comprehend the power of semantic technology (or any other disruptive technology).  Information silos arise for many reasons, both technological and organizational.  Key technological factors include the inability of relational databases to (1) reuse schema and (2) uniquely identify data elements globally across databases.  That’s where the URI and RDF triples come in. It is hard to overstate the power of the URI in semantic technology. URIs uniquely identify not only data elements but also the schema elements. The former eliminates the need for joins, and the coordination of URIs makes the snapping together of disparate databases, well, a snap.  The latter enables something entirely foreign to relational technology: the ability to easily share and reuse all or parts of existing schema.

Enterprise Ontology

The key to avoiding semantic silos is to use an enterprise ontology, which is a small and elegant representation of the core concepts and relationships in your enterprise that are stable over time.  It is at the same time both a conceptual model, and a computable artifact that plays the role of a physical data schema. The enterprise ontology is a foundation for building more specialized ontologies that are loaded into dozens, hundreds or thousands of graph databases, called triple stores that are populated with data.  Data elements are also shared across multiple databases.  This is depicted in figure 3.

These stores can be used by many applications, not just one or two, as is common in today’s siloed, application-centric enterprise.  Collectively, these ontologies and their data form an enterprise knowledge graph. Such graphs are hugely important for modern companies such as Google, Facebook and LinkedIn.

enterprise ontology

 

Figure 3: The triple stores depicted in the top row are not silos. Globally unique URIs snap together to form a single enterprise knowledge graph that is accessible using federated SPARQL queries.  Letters denote ontology URIs and numbers denote data URIs.

Having built enterprise ontologies now in a variety of industries, we are confident in stating the surprising result that there are only a few hundred such concepts that form this core for any given enterprise.  This is what makes it possible to build an enterprise ontology, where building enterprise-wide data models has failed for decades. There is no need to have millions of attributes in the core model.

Summary and Conclusion

  1. It is entirely possible to use semantic technology to develop point solutions around your enterprise and unwittingly end up recreating the very silos that semantic technology aims to get rid of.
  2. We see this happening in organizations that are using semantic technology.
  3. You don’t want to do that, you will miss out on some of the main benefits of the technology. The data will not snap together if there is no coordination.
  4. The answer is to use an enterprise ontology as a core data model that is shared among all the applications and data stores that collectively make up your enterprise knowledge graph.
  5. The URI is the hero: they are globally unique identifiers that allow seamless sharing of data and schema, joins are history.

Keep in mind that technology as enabler is only part of the story. To get real traction in breaking up silos also requires meeting plenty of social and organizational challenges and putting governance policies into place.  But that’s another topic for another day.

Don’t fall in the trap of paving the cow paths to semantic silos. Use an enterprise ontology to create the beginning of an integrated enterprise.

Afterward

See also the delightful and well-known poem by S.W. Foss called, “The Calf Path”.***

* Change Management: Paving the Cowpaths
https://www.fastcompany.com/1769710/change-management-paving-cowpaths

** Picture credit:
http://bostonography.com/2011/cartographic-greetings-from-boston/bostontownoldrenown/

*** https://m.poets.org/poetsorg/poem/calf-path

The Inelegance of having Brothers and Sisters

This blog follows from a recent blog by Dan Carey called Screwdrivers and Properties. It points to a longer whitepaper on the topic of avoiding property proliferation.

One way we keep the number of primitives small is to avoid creating a subproperty if its meaning is essentially the same as the superproperty, but has a more restricted domain or range. We illustrate this with an example in the genealogy domain. Suppose we have the property myeo:hasSibling and we want to model brothers and sisters. One way would be to create two subproperties, myeo:hasBrother and myeo:hasSister, whose ranges are myeo:Male and myeo:Female respectively, and define the class myeo:Brother as a property restriction class that means “any individual that is the brother of some person”.  In Manchester syntax, this looks like: “myeo:brotherOf some myeo:Person” where myeo:brotherOf is the inverse of myeo:hasBrother. Similarly we can define myeo:Sister as “myeo:sisterOf some myeo:Person”. This introduces two new classes and two new properties.

However, we can easily capture the semantics of brother and sister without introducing any new properties. We define the class myeo:Brother as “myeo:Male and myeo:siblingOf some myeo:Person” and myeo:Sister is defined as “myeo:Female and myeo:siblingOf some myeo:Person”. This way we can define the brother and sister concepts entirely in terms of existing primitives with the same number of classes and without creating any new properties.

The only thing that differs about myeo:hasBrother and myeo:hasSister compared to myeo:hasSiblingis that the former two properties have more restricted ranges (myeo:Male & myeo:Female vs. myeo:Person). Otherwise the meaning is identical.  We have essentially moved the semantics of brother from the domains of two new properties into the class expression that define the classes myeo:Brother and myeo:Sister  (see figure below).

Keeping the number of primitives low is not only more elegant, but it has practical value.  The fewer things you have, the easier it is to find what you need. Not only does it help during ontology development, it also helps downstream when others evolve and apply the ontology.

property

Binary Instances

Sometimes when we’re designing ontologies we’re faced with design choices that would lead us to create what we call “binary instances” or a situation where it will take the instantiation of two instances (often of different classes) in order to capture one concept.  For instance we may be considering creating a patient instance that is different from the corresponding person instance.

In an effort to move this design decision from the realm of arbitrary designers choice to something more principled, this article we explore the factors that go into a decision that leads to binary instances.

Some Examples

This section will outline some examples that we have come across, as it is often easier to work from a large pallet of examples than from abstractions.  Some of these examples may seem odd, some you may be surprised that anyone would consider them either one way or the other (binary or unary) but we have seen these at various times.

My guess is your background and predisposition will cause you to look at each one of these and say, either “obviously one instance” or “obviously two instances” but we suggest that any of these could go either way (a few are a bit of a stretch, but bear with it, we’re trying to make a point).  After the examples we introduce some principles that we think will lead to reasonably consistent decisions in this arena.

Statue v. Bronze

This is a classic philosophical argument.  What is the difference between the statue and the clay, or bronze.  The knee jerk reaction is to think they are two things, but consider: if you have a 10-pound statute made out of 10 pounds of bronze, when you go to ship it will you be charged for 20 pounds of freight or 10?

Person v. Employee

When you take on a job, are you two things (person and employee) or one thing (person who is an employee).  Hint: your employer and the Unemployment Insurance Agency are likely to come up with different answers for this one.

The Restrictions of Law v. The Text of Statute

If a lawmaker writes a law that says “it is illegal to turn right on a red light” and we model this.  What do we end up with?  Semantically the law is a restriction on behavior.  Tthere is a behavior (turning on the red) that the law intends to reduce the incidence of, either through cooperation or through punishment.  The question is: is the text of law (the literal words) its own object, separate from the meaning of the words.  If we are writing a text management system, or even a statute management system, there probably is only the text object (the system doesn’t care much about what the words mean).  However if we attempt to manage meaning, we need to consider that there are objects that represent the behavior we are interested in reducing, such that we could detect (via cameras say) behavior in the world that was in violation.  The question then becomes: is there one object that represents the restriction and a second that holds the text of the law, or is there just the restriction with a data type property that is the text?

A Creative Work v A Document

We know that there is a particular rendition of Moby Dick (in English or the Portuguese translation).  Certainly the English and Portuguese documents are different instances.  The real question is: is the recognition of the “work” (Moby Dick in the slightly abstract) a different instance, and do we need it dragging around with each rendition ( i.e. The Portuguese Moby Dick is a derivative of the creative work)

Government Organization v. Region Governed

When we speak of the Ukraine, are we referring to the governing body, which is an organization, or the region (recently diminished) that the government holds sway over.  Should we have one instance that represents the government and the region or two that are linked?

Specification v Model

When companies design and build products they often create specifications (is has 8 GB of memory, is 8 inches wide, and 2 inches tall, etc) and they also create “models” which they usually name (iPhone 6 for instance).  Is the specification a separate object from the model, or is there just one object?

Position v. Incumbent

Barack Obama is the President of the United States.  Is that two instances or one?

Actor v. Role

When Val Kilmer played Doc Holliday in Tombstone, was there one instance (Val Kilmer) who was a Person and was a role, or are there two instances, the role and the person?

Event v. Time Interval

We say an event is something that happened over a particular time interval.  So a particular concert, your attendance at the staff meeting Tuesday morning or World War II would all be considered events.  Each of course has a beginning and ending date and time.  The question is: is the time interval (May 22 from 9:00 AM to 10:00 AM) a separate instance from the staff meeting that occurred over that interval?

Diagnosis v. Disease

Up until the moment we are diagnosed with Cancer, or Diabetes, or even Toe nail fungus, we were unaware of our having the disease.  The diagnosis and the disease seem to coexist in most cases.  Are they two things or one?

Person v. Legal Person

We’ve seen systems that focus on the distinction between the flesh and blood person and the social artifact that is allowed to enter into contract. Two instances or one?

Organization v. Organization in Role

In some systems we’ve seen recently there is a distinction between an Organization (say Goldman Sachs) and an Organization in a Role (Goldman Sachs as an Underwriter v. Goldman Sachs as a Trader)

Contract Document v. Financial Agreement

Two parties agree to a complex financial transaction.  They paper it up with a contract that they sign.  If we model the essence of their agreement is it a separate instance from the written contract?  If not, how?

Person v. Patient

As a matter of history, your medical record is attached to your patient ID. If you’ve been to many medical institutions you have many patient IDs.  The question is, at any one of them are there two instances (Person and Patient) or one instance who is both Person and Patient?

Person v. Address

This one is hilarious.  Of course a person is separate from his or her address.  Except in almost every system ever built, where a persons address are merely attributes attached to the Person record.  When should we make the two distinct instances?

Planned Task v. Completed Task

If we plan a vacation, that is what we would call a Planned Event. We can book flights, hotels and the like and continue to add to this instance.  When we finally go on the vacation, we’ve created an actual or historical event.  Is there one event that changed state from planned to actual, or two events?

Person v. Sole Proprietor

Many independent contractors file tax returns as “Sole Proprietors” should we consider the person as a separate entity from the Sole Proprietor?

Part v. Catalog Item

Our definition of a Catalog Item, is the description of parts to a sufficient level of detail that a buyer would accept any item offered that met the description.  The Catalog Item typically has a part number, in retail a UPC.  The physical part also has the same UPC. Is the part a different item from the Catalog Item.

Customer v. (Person or Organization)

Is your customer, the person or organization that purchased your product or received your services, your customer, or is there another instance that represents your relationship with that entity?  Norms in your industry or limitations of your development environment probably color your answer here more than you think.

Relational technology makes it a relatively unnatural act to have say a Person table and an Organization table and then an order table with a foreign key to one or the other.  It’s far more “natural” in relational to have another table that represents the role of the customer.  Even if you have a “party” table, (which both the Person and the Organization extend) you have created another instance.  There is an id for each entry in the Party table, an id for each entry in the organization table (with a foreign key to the party) and an id for each entry in the person table (with a foreign key to the party).  Even without the role concept, there is an extra instance there.

Having a technology that allows us to have a single id to represent either a Person or Organization (Object Oriented or Semantic Technology) doesn’t get us completely out of the woods.  Now we could have the order refer directly to the Person or Organization.  Now the question becomes: should we?

I have been told by a data modeler from an Australian airline, that many of the people riding in an airplane are not customers.  The only ones they consider to be customers are those that belong to their frequent flyer program.  This makes some sense: they need to keep track of the miles and segments flown and accumulate them, only for the frequent flyers.  Additionally they incur obligations (to redeem balances for flights) but again only for the frequent flyers.

Pictorially

What we’re talking about is: are there two different things, that each have their own identity and properties, but that occur as a pair:

binary instances

Or is there really just one thing, and it is the conventions of our speech that make us think there are two things when really all the properties are on the one thing.

Historical Perspective

Very often design decisions are influenced by the tools that we use to implement solutions. We protest that our designs are independent of target architectures but years of designing databases and then converting them to relational DBMSs lead to thinking in design terms that more easily translate.

One implication is that relational DBMSs (and most Object Oriented languages) tend to see a class as a template for instances.  This has a tendency to suggest that instances that have properties not shared by most of the other instances should be shuttled off to another table.  This almost always ends up creating additional primary keys in other tables and therefore binary instances for anything that is in both tables.  Designed brought up on relational will be inclined to think of the Person and the Patient as two different instances (this isn’t wrong as much as it is an indication of how our experience shapes our design choices)

In an analogous fashion, Object Oriented developers often invoke the Decorator Pattern (from the Gang of Four Pattern Language).  In the decorator pattern, some functionality has been shuffled off to a companion object that performs some of the functionality.  People from this background will tend to see the decorator as a separate individual.

Principles

Our starting point is ten principles: the first principle is: if at all possible have one instance.  The next eight principles suggest circumstances where one instance is not appropriate.  The last one, we call the ambiguity trump, says even if the principles suggest two instances are needed to model the concept in question, you have a final override to say: in this domain we don’t care enough about the distinction and are willing to live with the ambiguity.

Principle 1 – Ockham’s Razor – “Entities should not be multiplied needlessly” The first principle here says the benefit of the doubt goes to simplicity.  If you can represent the concept adequately with one instance, then by all means do so.  This should be the starting point.  Start by imagining one instance.

A second consideration for sticking with one, even if you are tempted by previous designs, habits, industry norms etc., is: with a binary set of objects, each property (predicate) that is to be attached to the concept, must be attached to one or the other.  If you find it difficult to decide which of the two the property belongs on, and you end up making arbitrary choices, you should really consider sticking with one.

Principle 2 – Cardinality – There are two aspects of the concept, and you’re considering whether to devote an instance to each.  One of the trump concepts is: can you have more than one of one aspect for each one of the other.  This is trickier than it first sounds, because we have fooled ourselves a lot over time with the way we couch the question.  One of the more clear cases is Person and Sole Proprietor.  Normally “Joe Jones, the plumber” is “Joe Jones” and when he files his taxes as a Sole Proprietor, the proprietorship is Joe.  Certainly he doesn’t have the firewall that he would have had, had he incorporated.  “Joe Jones, LLC” is recognized as a separate entity, can contract on its own behalf, and can, at least in theory, and declare bankruptcy without bankrupting Joe.  So the corporate case clearly two or more instances.  But at first it would seem that the sole proprietor should fall back to principle 1. However, it turns out that Joe can have multiple Sole Proprietorships.  It doesn’t happen often, but the existence of this case, makes the case that there must be something different between Joe and his Sole Proprietorship.

Principle 3 —   Potential instance Separation   — Is it possible to separate the two aspects that are being potentially represented by two instances?  Can you have the statute without the bronze or vice versa? (probably not and this argues for one) Can you have a waterway without the river (seems like a dry riverbed would satisfy the waterway without being a river, argues for potential separation) can some properties only logically apply to one of the pair and not the other?

Principle 4 – Separate properties – are there properties that would apply only to one of the instances?  For instance a property like “annual rainfall” would apply to a country region but not to the country government.   Often the different properties are shining a light on something deeper: that there are really two different types of things yearning to be separated.  In the case of the customer v Person or Organization, when you start entertaining adding properties (number of segments flown, miles about to expire etc.) you may realize that the entity with the balances is actually an agreement.

Principle 5 – Behavioral Impact – do most (all?) real world behaviors that apply to one also apply to the other? If we end an employee (employment really) have we ended (killed) the person (no wonder so many people cringe at the thought of termination).

Principle 6 – Inference from Definition – if we have formal definitions for the classes that make sense and an inference engine infers one to be a subclass of the other, that makes a case for one instance.  If the formal definitions put the two in disjoint classes, that is a strong argument for two instances.

Principle 7 – Identify Function – is the way we establish whether we already have an instance different in one or the other of these?  The identity function is a set of properties that we use to figure out whether we already have a particular instance in our database.  For instance if the identity function for Person is SSN + Date of Birth, and so is the identity function for employee, then it argues for one instance (it may be that the identity functions are wrong, but it should at least have us pause to reflect)

Principle 8 Granularity – Sometimes the two instances are trying to represent different levels of specificity.  For instance the difference between a Product Model and a Catalog Item may be level of detail.  If there are so many Product Models (or so little variation offered) then the Product Model and Catalog Item are at the same granularity and could be considered one instance.  If however they are at different levels of detail it makes the case for two instances.

Principle 9 – Temporal Difference – if one instance can end independent of the other, that is if they have different lifetimes, it suggests two instances.

Principle 10 – Tolerating Ambiguity  — there are cases where the above analysis suggest that there should be, semantically there are, two instances, but in our domain we really don’t care.  For instance we may be convinced that the GeoRegion of a country is different from the organization that governs it, but for our application or domain, which will not exercise any of the properties that would highlight that difference, we may say we really don’t care.  In this case we would suggest created a supertype of the two classes, and instantiating the supertype. So for instance you may create the class of GeoPoliticalEntities as the union of GeoRegion and Government Organization.  Make your instances of the supertype.  What this does is two fold:

  • If you later decide that you do need to make a distinction, very few things you’ve built to date will be adversely affected. Anything that didn’t care whether you were talking about a region or a government will still not care after you make that distinction.
  • If you have to interface with applications or domains that do make the distinction you will have what you need to incorporate their distinctions without upsetting your part of the system.

Re-examining the examples in light of the principles

Let’s return to the examples we introduced in the beginning and see if the principles shine any light on them.  Note: there will still be situations and domains that come to different conclusions, but we think these will be the conclusion informed from the above principles

Design Example Proposal (one instance or two) Principled Evidence
Statue v. Bronze 1 Principle 1, if you steal the statue you’ve stolen the bronze.  They’re really inseparable.  Also principle 7, how we establish the identity of the item (say we have an RFID tag on the statute it is also identifying the bronze)
Person v. Employee 2 for employers, 1 for unemployment Principle 2 (you can have two jobs at a time) and principle 4 (your employee(ment) has a salary and seniority, you don’t, you have a birthday, your employee role doesn’t) and principle 9 (your job can end before you do) argue for 2 . However the Unemployment Division point of view argues for one.  A formal definition of someone who is employed (has at least one job) argues by principle 6 and the cardinality argument works the other way (your second job doesn’t alter the unemployment rate)
The Restrictions of Law v. The Text of Statute 2, and will have drug / drug interactions regardless of which patients you give the drugs.ed for one. ense tually an agreement. Principle 8, granularity, and principle 2 cardinality.  When we start to interpret the law and get it to the point that we can begin having systems make at least some initial determination of the legality of an action, we find that a given law is many restrictions and at many levels of detail.
A Creative Work v A Document 2 Principle 2 (many derivatives from a single work)
Government Organization v. Region Governed 2 Principle 3 (we can separate the government from the land, and the land area can change without changing the government (sorry Ukraine) and principle 4 there are properties (rainfall) that apply to one and not the other
Specification v Model 2 Principle 8 in most cases the specification is at a lower level of detail than the product model (color is typically not part of the product model, but is typically in the specification, and most product domains different colors of the same product are not equally interchangeable)
Position v. Incumbent 2 Principle 9 (the position usually outlives the incumbent) and also occasionally principle 2 (can have co-presidents, two people in one position)
Actor v. Role 2 Principle 2 (Greater Tuna where two actors played all the roles)
Event v. Time Interval 1 Principle 6 (if a time interval is defined as having a start and an end, and so is an event, the event is a time interval)
Diagnosis v. Disease 2 Even though they initially co-exist, they soon develop their own time lines (principle 9) and properties
Person v. Legal Person 1 Principle 1 the person is the legal person, there isn’t another entity to hide behind.  None of the other principles argues for 2.  Legal Person is a type of Person, except in the case where it means Organization and in that case they are separate because of principle 6, they are disjoint and can’t be the same.
Organization v. Organization in Role 1, unless there is something formal set up to establish the extra role Even though there is a bit of temptation from principle 9 it isn’t convincing.  If you participate as a buyer in one transaction and a seller in another are you three entities (yourself, you the buyer and you the seller) no not really.  Only if there is something formal set up.  In the airline industry the difference between a customer (has a role and therefore 2 entities) and a passenger (doesn’t one entity) is the frequent flyer agreement, where they are accumulating miles, getting various metal colors etc.
Contract Document v. Financial Agreement 1 Principle 1: the document is a representation of the agreement.  Where there are cardinality issues (the contract/ agreement contains many obligations) the cardinality is true of both, in the same way (if the contract has 6 obligations so does the agreement).
Person v. Patient 1 Principle 1. Unlike the cat with nine lives, the person that has 9 patient identities will die if any of them die, and will have drug / drug interactions regardless of which patients you give the drugs.
Person v. Address 2 Principle 1 Addresses are not attributes of people.  Addresses are attributes of buildings that people live in and work in which are obviously separate entities.
Planned Task v. Completed Task 1 for personal 2 for hospital, project management Principle 2 (cardinality) trumps for any organization that has to keep track of either multiple appointments for one visit, or multiple reschedulings for the same task.  Where that doesn’t apply (say your vacation plan or personal todo’s) you can just have one task that transitions from planned to actual by merely being done, in a way that is principle 10, suggesting their may be a difference in personal task management but we just don’t care .
Person v. Sole Proprietor 2 Principle 2 cardinality, since we can have multiple sole proprietorships, we need to allow for two.
Part v. Catalog Item 2 Principle 4: while they both appear to have some of the same characteristics (weight for instance) they aren’t really the same.  That is a structural similarity not a semantic similarity.  A catalog with parts that weight thousands of pounds can be picked up with a single hand.
Customer v (Person or Organization) 1 unless there is a separate agreement, then 2 Principle 4: it is the existence of a separate agreement (separate from the individual order) that is the second instance.  Really the second instance isn’t “customer” but “customer agreement.”  In the absence of a second agreement (Master agreement, frequent shopper agreement etc.) there is only need for one.

Debugging Enterprise Ontologies

Michael Uschold gave a talk at the International Workshop on Completing and Debugging the Semantic Web held in Crete on May 30, 2016.   Here is a preview of the white paper, “Finding and Avoiding Bugs in Enterprise Ontologies” by Michael Uschold:

Finding and Avoiding Bugs in Enterprise Ontologies

Abstract: We report on ten years of experience building enterprise ontologies for commercial clients. We describe key properties that an enterprise ontology should have, and illustrate them with many real world examples. They are: correctness, understandability, usability, and completeness. We give tips and guidelines for how best to use inference and explanations to identify and track down problems. We describe a variety of techniques that catch bugs that an inference engine will not find, at least not on its own. We describe the importance of populating the ontology with data to drive out more bugs. We point out some common ontology design practices in the community that lead to bugs in ontologies and in downstream semantic web applications based on the ontologies. These include proliferation of namespaces, proliferation of properties and inappropriate use of domain and range. We recommend doing things differently to prevent bugs from arising.

Introduction
In a manner analogous to software debugging, ontologies need to be rid of their flaws. The types of flaws to be found in an ontology are slightly different than those found in software, and revolve around the ideas of correctness, understandability, usability and completeness. We report on our experience (spanning more than a decade) in building and debugging enterprise ontologies for large companies in a wide variety of industries including: finance, healthcare, legal research, consumer products, electrical devices, manufacturing and digital assets. For the growing number of companies starting to use ontologies, the norm is to build a single ontology for a point solution in one corner of the business. For large companies, this leads to any number of independently developed ontologies resulting in many of the same heterogeneity problems that ontologies are supposed to solve. It would help if they all used the same upper ontology, but most upper ontologies are unsuitable for enterprise use. They are hard to understand and use because they are large and complex, containing much more than is necessary, or the focus is too academic to be of use in a business setting. So the first step is to start with a small, upper, enterprise ontology such as gist [McComb 2006], which includes core concepts relevant to almost any enterprise. The resulting enterprise ontology itself will consist of a mixture of concepts that are important to any enterprise in a given industry, and those that are important to a particular enterprise. An enterprise ontology plays the role of an upper ontology for all the ontologies in a company (Fig. 1). Major divisions will import and extend it. Ontologies that are specific to particular applications will, in turn, import and extend those. The enterprise ontology evolves to be the semantic foundation for all major software systems and databases that are core to the enterprise.

Click here to download the white paper.

Click here to download the presentation.

Whitepaper: Quantum Entanglement, Flipping Out and Inverse Properties

We take a deep dive into the pragmatic issues regarding the use of inverse properties when creating OWL ontologies.

Property Inverses and Perspectives

It is important to understand that logically, both perspectives always exist; they are joined at the hip. If Michael has Joan as a parent, then it is necessarily true that Joan has Michael as a child – and vice versa. If from one perspective, a new relationship link is created or an existing one is broken, then that change is immediately reflected when viewed from the other perspective. This is a bit like two quantumly entangled particles. The change in one is instantly reflected in the other, even if they are separated by millions of light years. Inverse properties and entangled particles are more like two sides of the same coin, than two different coins.

 a deep dive into the pragmatic issues regarding the use of inverse properties when creating OWL ontologies.
Figure 2: Two sides of the same coin.

 

In OWL we call the property that is from the other perspective the inverse property. Given that a property and its inverse are inseparable, technically, you cannot create or use one without [implicitly] creating or using the other. If you create a property hasParent, there is an OWL syntax that lets you refer to and use that property’s inverse. In Manchester syntax you would write: “inverse(hasParent)”. The term ‘inverse’ is a function that takes an object property as an argument and returns the inverse of that property. If you assert that Michael hasParent Joan, then the inverse assertion, Joan inverse(hasParent) Michael, is inferred to hold. If you decide to give the inverse property the name parentOf, then the inverse assertion is that Joan parentOf Michael. This is summarized in Figure 3 and the table below.

Click here to read more and download the White-paper

Written by Michael Uschold

Naming an Inverse Property: Yay or Nay?

Inverse Property

Figure 1: Quantum Entanglement

 

For a fuller treatment of this topic, see the whitepaper:  Quantum Entanglement, Flipping Out and Inverse Properties.

An OWL object property is a way that two individuals can be related to each other. Direction is important. For example, consider the two relationships:

  1. being a parent: Michael has Joan as a parent, but Joan has Michael as a child.
  2. guaranteeing a loan: the US government guarantees a loan, but the loan is guaranteed by the US government.

The direction corresponds to which party you are taking the perspective of, the parent or child, the guarantor, or the thing being guaranteed.  From the perspective of the child we might assert the triple: Michael :hasParent Joan.  Note that if Michael has Joan as a parent, then it is necessarily true that Joan has Michael as a child – and vice versa.  So asserting any triple results in the implicit assertion of an inverse triple.  It’s a bit like quantumly entangled particles, you cannot make a change to one w/o immediately affecting the other.

The property from the perspective of the other individual is called the inverse property. OWL provides a way to do refer to it in a triple.  For example, Joan :inverse(hasParent) Jennifer uses the hasParent property from Joan’s perspective to directly assert she has another child.

Figure 2: Property with anonymous inverse

 

If we wish, we can give the inverse property a name. Two good candidates are: hasChild, and parentOf.

Figure 3: Property with named inverse

The question naturally arises: when should you create an explicit named inverse property? There is no universal agreement on this issue, and at Semantic Arts, we have gone back and forth. Initially, we created them as a general rule, but then we noticed some down sides, so now we are more careful.   Below are four downsides of using named inverses (roughly in order of growing importance).  The first two relate to ease of learning and understanding the ontology. The last two relate inference and triple stores.

  1. Names: It can be difficult to think of a good name for the inverse, so you might as well just use the syntax that explicitly says it is the inverse. It will likely be easier to understand.
  2. Cluttered property hierarchy: Too many inverses can significantly clutter up the property hierarchy, making it difficult to find the property you need, and more generally, to learn and understand what properties there are in the ontology, and what they mean.
  3. Slower Inference: Too many named inverses can significantly slow down inference
  4. More Space: If you run inference and materialize the triples, a named inverse will double the number of triples that use a given property

So our current practice is to not create inverses unless we see a compelling reason to do so, and it is clear that those benefits outweigh the downsides.

Quantities, Number Units and Counting in gist

We have a simple and effective way in gist to represent a wide range of physical quantities such as ‘82 kg’, ‘3 meters’ and ‘20 minutes’.  Each quantity has a number and a unit, such as ‘meter’ or ‘second’.  In addition to these simple units, we have unit multiplication and division to represent more complex units, e.g. for speed and acceleration. A standard speed unit is meters per second [m/s] and a standard acceleration unit is meters per second per second [(m/s)/s] or simply  [m/s^2].

Physicists as well as business people like to avoid the inconvenience of working with very large or very small numbers like 1000 meters, or .00000001 meters (a trillionth of a meter).  If you counted to see if the number of zeros was correct, you understand the problem.  So we create units like kilometer and picometer and give them conversion factors.   This works for any kind of unit (time, electric current, mass).  Note that the standard units have a conversion of 1 (which in normal parlance, means there is no conversion necessary). See figure 1 for some examples.

Figure 1: Example Quantities

We also have found a need for counting units like dozen or gross. For example, a wine merchant stocks and sells cases of 12 bottles of wine, so counting in dozens is more convenient than counting single bottles of wine.  What is interesting is that we can use the exact same structure for representing ‘4 dozen’ or  ‘7 gross’ as we do for representing things like ‘82 kg’ and ‘20 minutes’.   Take ‘4 dozen’, the number is 4, and the unit is ‘dozen’ and the conversion is 12.

In gist there is also a way to represent percentages, which we have always treated as a ratio. After all, when speaking of a percentage, there is always an explicit or implicit ratio somewhere.  For example:

  1. “Shipment A has only 65% as much oil as shipment B” corresponds to the ratio:
    (No. of barrels in shipment A) / (No of barrels in shipment B) = .65
  2. “There are 20% more grams of chocolate in the new package size” corresponds to the ratio:
    (NewQuantity – OldQuantity) / (OldQuantity) = .20

The units for the first example are barrels/barrels which cancel out leaving a pure number. Similarly, the units for the second example are grams/grams which again cancel out. In fact, every ratio unit that corresponds to a percentage will cancel out and leave a pure number. This means that although it may be useful to do so, we don’t need to represent gist:Percentage using a ratio unit.

Another thing that we never realized before is that, being a pure number,  a percentage can be represented in the same way we represent dozen or gross. The only difference is the conversion (12 vs. .01).  We can use this same structure to represent:

  • parts per million (ppm), used by toxicologists say to measure amounts of mercury in tuna
  • basis points (used by the Fed for describing interest rates)
    Investopedia defines a basis point as “a unit that is equal to 1/100 of 1%”

See figure 2 for the representational structures.

 

Figure 2: One structure for number units and ordinary units

 

Notice how ‘ 4 cm’ is very similar to ‘4 percent’:

  • to convert 4 cm to its standard unit, we multiply 4 by the conversion factor of .01 resulting in .04 meters
  • to convert 4 percent to its standard unit, we multiply 4 by the conversion factor of .01 resulting in .04 ??.

This means we can use the same computational mechanism to perform units conversion for pure numbers like 4 dozen and 4% as we do for ordinary physical quantities like 4 cm or 82 kg.

One question remains. Whereas we can readily see that the conversion factor for kilometer is based on the standard unit of meter, and the conversion factor for hour is based on the standard unit of second, what are the conversion factors of 12, .01 and .00001 (for dozen, percent and basis point) based on? What does it mean to have a standard unit for these pure numbers with a conversion of 1?

Let’s look to see how gist represents dozen and kilometer to see if that gives us any insight.

  1. gist:kilometer is an instance of gist:DistanceUnit &
    ‘3 meters’ is an instance of gist:Extent &
    the base unit is gist:meterAnalogously:
  2. gist:dozen is an instance of gist:CountingUnit,
    ‘4 dozen’ is an instance of gist:Count &
    the base unit is gist:each

Curiously, while ‘meter’ actually means something to us, and we know what it means to say ‘3 meters’, it strange to think what ‘3 eaches’ could possibly mean.  I invite you to stare at the following table for a while and see some analogies.

Figure 3: Standard Unit for Pure Number Quanties

Then notice that:

  1. 4 dozen = 48 eaches
  2. 4 dozen = 48 (just a simple number)
  3. Therefore, 48 must equal 48 eaches (because both are equal to 4 dozen).

But what is it, such that if you have 48 of them gives you the number 48?  The answer is the number one:  48 x 1 = 1.  So the meaning of gist:each is the number one acting as a unit. This is a mathematical abstraction. The ??’s in figure 2 stand for ‘each’ which is the standard number unit. So when you say ‘3 eaches’ it is just 3 of the number one which is just the pure number 3.  As an aside, we can also say that ‘each’ is the identity element for unit multiplication and division. This is analogous to the number 1 being the identity element for multiplication and division of numbers.

  • You can multiply or divide any number by 1 and you get that number back.
  • You can multiply or divide any unit by each (which means one) and get that unit back.

Note that while conceptually they mean the same thing, syntactically gist:each is very different from the number one as a number whose datatype is say integer, or float.

Notice that for these pure numbers in convenient sized units, we are usually counting things: how many dozens, how many basis or percentage points, or how many parts per million.  We refer to ‘each’ thing as ‘one’ thing being counted.  So that links gist:each to the number one.  Thus, despite the awkwardness of speaking of ‘3 eaches’ the names ‘Count’, ‘CountingUnit’ and ‘each’ are quite reasonable.

Finally, insofar as all instances of CountingUnits are based on the number one, and all instances of Count represent pure numbers, we can think of every CountingUnits as a degenerate unit, and we can think of gist:Count as a degenerate quantity.  A ‘real’ quantity is not just a number, it has a number and has a non-numeric unit.

So in conclusion:

  1. We have extended the notion of gist:Count and gist:CountingUnit to apply to pure numbers that are less than one as well as those that are greater than one.
  2. We can represent pure numbers expressed in dozens, percentages, basis points and ppm just like we express the more usual quantities: ‘82 kg’, ‘3 meters’ and ‘20 minutes’.
  3. We can use the same computational mechanism to do units conversions on pure numbers as we can for ordinary physical quantities.
  4. We can represent gist:Percentage using a new unit called gist:percent with a conversion of .01 instead of using a ratio unit, making a more uniform representation.
  5. It will often be helpful to represent a gist:Percentage using a ratio, but it is no longer required.
  6. gist:Count could meaningfully and accurately be called gist:PureNumber since every instance of gist:Count (e.g. ‘4 dozen’, ‘65%’) is a pure number (e.g. 48, .65)
  7. gist:CountingUnit could meaningfully and accurately be called gist:PureNumberUnit because every instance of gist:CountingUnit is used to express pure numbers.
  8. gist:each corresponds to the number one
  9. We can think of Counts (pure numbers) and CountingUnits (number units) as degenerate cases of ordinary quantities and units like ’82 kg’ and ‘kg’

Written by Michael Uschold

SPARQL: Changing Instance URIs

In a prior blog (SPARQL: Updating the URI of an owl:Class in place) we looked into how to use SPARQL to rename a class in a triple store.  The main steps are below. We showed how to do this for the example of renaming the class veh:Auto to veh:Car.

  1. change the instances of the old class to be instances of the new class
  2. replace the triples where the class is used in either the subject or object of the triple
  3. look around for anywhere else the old class name is used, and change accordingly.

The last step addresses the fact that there are a few other things that you might need to do to address all the consequences of renaming a class.   Today we will see how to handle the situation where your instances use a naming convention that includes the name of the class.  Let’s say the instances of Car (formerly Auto) are all like this:  veh:_Auto_234 and veh:_Auto_12. We will want to change them to be like: veh:_Car_234.

The main steps are:

  1. Figure out how you are going to use SPARQL string operations to create the new URI given an old URI.
  2. Replace triples using the oldURI in the object of a triple.
    1. Determine where the oldURI is used as the object in a triple, and use CONSTRUCT to preview the new triples using the results of step 1.
    2. Use DELETE and INSERT to swap out the old triples with the new URI in the object.
  3. Replace triples using the oldURI in the subject of a triple.
    1. Determine where the oldURI is used as the subject in a triple, and use CONSTRUCT to preview the new triples using the results of step 1.
    2. Use DELETE and INSERT to swap out the old triples with the new URI in the subject

In practice, we do step 1 and step 2a at the same time.  We find a specific instance, and filter on just that one (e.g. veh:_Auto_234) to keep things simple. Because we will be using strings to create URIs, we have to spell the namespaces out in full, or else the URI will incorrectly contain the string “veh:” instead the expanded form, which is: “http://ontologies.myorg.com/vehicles#”.

CONSTRUCT {?s ?p ?newURI}
WHERE {?oldURI rdf:type veh:Car .
       ?s ?p ?oldURI.
       FILTER (?oldURI in (veh:_Auto_234))
       BIND (URI(CONCAT ("http://ontologies.myorg.com/vehicles#_Car_",
                         STRAFTER (STR(?oldURI),"_Auto_")))
             AS ?newURI)
       }

This should return a table something like this:

Subject Predicate Object
veh:_TomJones gist:owns veh:_Car_234
veh:_JaneWrenchTurner veh:repaired veh:_Car_234
veh:_PeterSeller veh:sold veh:_Car_234

This tells you there are exactly three triples with veh:_Auto_234 in the object, and shows you what the new triples will be when you replace the old ones.   After this, you might want to remove the FILTER and see a wider range of triples, setting a LIMIT as needed. Now you are ready to do the actual replacement (step 2b).   This is what you do:

  1. Add a DELETE statement to remove the triple that will be replaced.
  2. Replace the “CONSTRUCT” with “INSERT” leaving alone what is in the brackets.
  3. Leave the WHERE clause as it is, except to remove the FILTER statement, if it is still there (or just comment it out).

sparql-changing-instance-uris

Sample Graph of Triples

This will do the change in place for all affected triples. Note that we have constructed the URI from scratch, when all we really needed to do was do a string replace.  The latter is simpler and more robust.  Using CONCAT and STRAFTER gives the wrong answer if the string “_Auto_” does not appear in the URI. Here is the query to execute, with the simpler string operation:

DELETE {?s ?p ?oldURI}
INSERT {?s ?p ?newURI }
WHERE {?oldURI rdf:type veh:Car .
       ?s ?p ?oldURI .
       BIND (URI(REPLACE(STR(?oldURI), "_Auto_", "_Car_")) AS ?newURI)
       }

Step 3 is pretty much identical, except flip the subject and object.  In fact, you can combine steps 2 and 3 into a single query.  There are a few things to watch out for:

  1. VERY IMPORTANT: make sure you do steps 2 and 3 in order.  If you do step 3 first, you will blow away the rdf:type statements that are needed to do step 2.
  2. It is easy to make mistakes, backup the store and work on a copy.
  3. When creating URIs from strings, use full namespaces rather than the abbreviated qname format.
  4. Check the count of all the triples before and after each time you make a change, track down any differences.
Read Next:

SPARQL: Updating the URI of an owl:Class in place

Background

We have been developing solutions for our clients lately that involve loading an ontology into a triple store, and building a UI for data entry. One of the challenges is how to handle renaming things.  If you want to change the URI of a class or property in Protégé you load all the ontologies and datasets that use the old URI and use the rename entity command in the Refactor menu.  Like magic, and all references to the URI are changed with the press of the Enter key.   In a triple store, it is not so easy. You have to track down and change all the triples that refer to the old URI. This means writing and executing SPARQL queries using INSERT and DELETE to make changes in place.  Below is an outline of how to rename a class.

Steps of Change

Let ?oldClass and ?newClass be variables bound to the URI for the old and new classes respectively – e.g. ?oldClass might be veh:Auto and ?newClass might be veh:Car.   The class rename operation involves the following steps:

  1. Change all instances of ?oldClass to be instances of ?newClass instead. e.g.
    veh:myTeslaS   rdf:type   veh:Auto is replaced with
    veh:myTeslaS   rdf:type   veh:Car
  2. Find and examine all the triples using ?oldClass as the object.  It may occur in triples where the subject is a blank node and the predicate is one of the several used for defining OWL  restrictions. E.g . _:123B456x78  owl:someValuesFrom   veh:Auto
    Replace triples with the old class URI in the object with new triples using the  new URI. Note, you might want to do the first part of the next step before doing the replace.
  3. Find and examine all the triples using ?oldClass as the subject. It may occur in triples for declaring subclass relationships, comments as well as the triple creating the class in the first place. e.g. veh:Auto   rdf:type   owl:Class
    Replace triples with the old class URI in the subject with new triples using the  new URI.
  4. Look around for anywhere else that the old name may be used.  Possibilities include:
    1. If your instances use a naming convention that includes the name of the class (e.g. veh:Auto_234)then you will have to find all the URIs that start with veh:_Auto and use veh:_Car  instead.  We will look into this in a future blog.
    2. The class name may occur in comment strings and other documentation.
    3. It may also be used in SPARQL queries that are programmatically called.

Here is some SPARQL for how to do to the first step.

# Rename class veh:Auto to veh:Car
# For each ?instance of ?oldClass
# Replace the triple <?instance rdf:type ?oldClass>
#               with <?instance rdf:type ?newClass>
DELETE {?instance rdf:type ?oldClass}
INSERT {?instance rdf:type ?newClass}
WHERE  {BIND (veh:Auto as ?oldClass)
        BIND (veh:Car as  ?newClass)
        ?instance rdf:type ?oldClass . }

Gotchas

There are many ways to make mistakes here. Watch for the following:

  • Having the DELETE before the INSERT seems wrong, fear not, it is just an oddity in the SPARQL syntax.
  • Save out a copy of the triple store, in case things go wrong that are hard to undo.  One way to do this is to make all the changes to a copy of the triple store before making them in the production one. Do all the steps, make sure things worked.
  • Make sure your namespaces are defined.
  • Before you make a change in place using INSERT and DELETE, always use CONSTRUCT to see what new triples will be created.
  • Think about the order in which you replace triples.  You can easily end up replacing triples in one step, that you needed to find the triples to replace in the next step.
  • Always check the total count of triples before and after an operation that replaces triples. Generally it should be the same; track down any exceptions.  The count may be less due to duplicate triples that may occur in different named graphs.
  • A cautious approach would be to first insert the new triples and on a second step remove the old ones.  I tried this and it did not work, it seems like a bug.  Throw caution to wind, and do the delete and insert at once. You have a backup, and once you get the hang of it, the extra step will just be extra work.
  • It may not be possible to fully automate changes in comments and SPARQL queries that are used programmatically.  Check to see what needs to change, and what doesn’t.

What Next?

After you get step 1 working, try out steps 2 and 3 on your own, all you need to do is some straight-forward modifications to the above example.  Step 4 involves more exploration and custom changes.

In an upcoming blog, we explore one of those changes.  Specifically, if your naming convention for instances uses the class name in the URI then those instance URIs will have to change (e.g. from veh:_Auto_2421 to veh:_Car_2421).

Read Next:
Skip to content