This blog is wholly inspired by an observation made by Dave McComb that collections and queries have an interesting relationship.
Collections frequently arise when creating enterprise ontologies. In manufacturing, there are lists of approved suppliers and lists of ingredients. In finance there are baskets of stocks making up a portfolio, and a 30-year mortgage corresponds to a collection of 360 monthly payments. In healthcare, there are lists of side effects for a given drug, and a patient bill is essentially a collection of line items, each with an associated cost. We will look at two ways to model collections this and consider the pros and cons of each. We will consider a list of approved suppliers for salt.
Represent the list as an explicit collection
The first is to create an explicit collection called say: _ApprovedSaltSuppliers
. The members of this collection are each suppliers of salt, say _Supplier1
, _Supplier14
and _Supplier 23
. We can link each of the suppliers to the collection using the object property gist:memberOf
. So far, we have 4 instances, one property, three triples and no classes.
Figure 1: Simple Collection
It is always good practice to say what class an instance belongs to. What kind of a thing is the instance: _ApprovedSaltSuppliers
? First, it is a collection. We have a class for that called gist:Collection
, so we will want _ApprovedSaltSuppliers
to be an instance of gist:Collection
. However, it more than just any collection, it is not a jury, it is not a deck of cards. More specifically, it is a list of approved suppliers. So we create a class called ListOfApprovedSuppliers
and declare _ApprovedSaltSuppliers
to be an instance of that class. We also make ListOfApprovedSuppliers
a subclass of gist:Collection
, ensuring that _ApprovedSaltSuppliers
is an instance of gist:Collection
.
Using a SPARQL query to get the list
The second approach is inspired by the fact that a SPARQL query returns a report that is a collection of items from a triple store based on some specific criteria. Instead of having an explicit collection in the triple store for approved suppliers of a given substance, you could simply link that substance to each of the approved suppliers and then write a SPARQL query to find them from a triple store of past, present and potential future suppliers. See figure for how this would be done.
Note, we have added some contextual information. First, we indicated the kind of substance they are being approved to supply. We also have a link to a person in charge of maintaining that list. Can you think of what other information you might want to associate with the list of approved salt suppliers if you were modeling this for a specific organization?
Figure 2: Comparing two approaches
How should we skin this cat?
Which approach is best under what circumstances and why? Have another look at the examples above in finance, manufacturing and healthcare. For each, consider the following questions:
- can you think of an easy, natural and obvious way to represent the information and write a query that can generate the list?
- are the items in the list likely to change with some degree of frequency?
- is the collection mainly ‘just a collection’ i.e. does it have little meaning on its own and few if any attributes or relationships linking it to other things?
For the list of approved suppliers, we already saw that the answer to the first question is yes. The answer to the second question is also yes because suppliers come and go. There will likely be a moderate amount of change. The third question is less clear. It could be that the an approved supplier list is just connected to a given substance, and nothing else. In this case, the answer to the third question would be yes. The way we have modeled it includes a named individual responsible for creating and maintaining it. The list might also be part of a larger body of documentation. Or, if it is a larger organization where different divisions had their own approved supplier list, you would need to indicate which organization is being supplied. With this extra information, the answer to the third question would be no.
Consider a patient bill. It is not that obvious how to represent the information so a simple query can give an answer. First, a given patient might have many bills over time, which would connect them to many different line items. It is a bit awkward. Second, these items will never change. Finally, while it makes sense to represent a patient bill as being in essence, a list of line items, it is much more than that. It is not only connected to the patient, but also to the hospital, to the provider, and possibly to an insurance company.
To the extent that the answer to the above three questions is yes, you are probably better off just writing an on-demand query. Conversely, if the answers to the above questions tend to be no, then you probably do want to represent and manage an explicit collection.