Introduction to FIBO Quick Start

We have just launched our “FIBO Quick start” offering.  If you are in the financial industry you likely have heard about the Financial Industry Business Ontology, which has beenFIBO championed by the EDM Council, a consortium of virtually the entire who’s who of the financial industry. We’ve been helping with FIBO almost since its inception, and more recently Michael Uschold has be co-leading the mortgage and loan ontology development effort.  Along the way we’ve done several major projects for financial clients, and have reduced what we know to a safe and quick approach to adopting semantics in the financial sector. We have the capacity to take on one more client in the financial space, so if you’re interested, by all means contact us.

FIBO Quick Start: Developing Business Value Rapidly with Semantics

The Financial Industry Business Ontology is nearing completion. As of June 2016, nine major financial institutions have joined the early adopter program. It is reasonable to expect that in the future all Financial Industry participants will have aligned some of their systems with FIBO. Most have focused their initial projects on incorporating the FIBO vocabulary. This is a good first step and can jump start a lot of compliance work.

But the huge winners, in our opinion, will be the few institutions that see the potential and go all-in with this approach. For sixteen years, we have been working with large enterprises who are interested in adopting semantic technology. Initially, our work focused on architecture and design as firms experimented with ways to incorporate these new approaches. More recently, we have been implementing what we call the “data-centric approach” to building semantically-centered systems in an agile fashion.

Click here to read more. 

Naming an Inverse Property: Yay or Nay?

Inverse Property

Figure 1: Quantum Entanglement

 

For a fuller treatment of this topic, see the whitepaper:  Quantum Entanglement, Flipping Out and Inverse Properties.

An OWL object property is a way that two individuals can be related to each other. Direction is important. For example, consider the two relationships:

  1. being a parent: Michael has Joan as a parent, but Joan has Michael as a child.
  2. guaranteeing a loan: the US government guarantees a loan, but the loan is guaranteed by the US government.

The direction corresponds to which party you are taking the perspective of, the parent or child, the guarantor, or the thing being guaranteed.  From the perspective of the child we might assert the triple: Michael :hasParent Joan.  Note that if Michael has Joan as a parent, then it is necessarily true that Joan has Michael as a child – and vice versa.  So asserting any triple results in the implicit assertion of an inverse triple.  It’s a bit like quantumly entangled particles, you cannot make a change to one w/o immediately affecting the other.

The property from the perspective of the other individual is called the inverse property. OWL provides a way to do refer to it in a triple.  For example, Joan :inverse(hasParent) Jennifer uses the hasParent property from Joan’s perspective to directly assert she has another child.

Figure 2: Property with anonymous inverse

 

If we wish, we can give the inverse property a name. Two good candidates are: hasChild, and parentOf.

Figure 3: Property with named inverse

The question naturally arises: when should you create an explicit named inverse property? There is no universal agreement on this issue, and at Semantic Arts, we have gone back and forth. Initially, we created them as a general rule, but then we noticed some down sides, so now we are more careful.   Below are four downsides of using named inverses (roughly in order of growing importance).  The first two relate to ease of learning and understanding the ontology. The last two relate inference and triple stores.

  1. Names: It can be difficult to think of a good name for the inverse, so you might as well just use the syntax that explicitly says it is the inverse. It will likely be easier to understand.
  2. Cluttered property hierarchy: Too many inverses can significantly clutter up the property hierarchy, making it difficult to find the property you need, and more generally, to learn and understand what properties there are in the ontology, and what they mean.
  3. Slower Inference: Too many named inverses can significantly slow down inference
  4. More Space: If you run inference and materialize the triples, a named inverse will double the number of triples that use a given property

So our current practice is to not create inverses unless we see a compelling reason to do so, and it is clear that those benefits outweigh the downsides.

Semantic Modeling: Getting to the Core

Most large organizations have a lot of data and very little useful information. The reason being, every time they encounter a problem, they build (or more often buy) another computer application system. Each application has its own completely arbitrary data model designed for the task at hand, at that time, and which used whatever simplification seemed appropriate in that instance.

The net result, depending on the size of the organization, is hundreds or thousands of applications— occasionally, tens of thousands—each with its own data model. Each data model has hundreds to thousands of tables, occasionally, tens of thousands (the average SAP install has 95,000 tables), and each table has dozens of columns. The net result is trying to run your company using upwards of millions of distinct data types. For all practical terms, this is impossible.

Most companies spend most of their (very high) IT budget on maintaining these systems (as they are very complex) or attempting to integrate them (and doing a very partial job of it).

This seems pretty bleak and makes it hard to see a way out. What will drop the scales from your eyes is when you see a model that covers all the concepts you use to run your business that has just a few hundred concepts—a few hundred concepts—with a web of relationships between them. Typically, this core is then augmented by thousands of “taxonomic” distinctions; however, these thousands of distinctions can be organized and put into their place for much better management and understanding.

data model

Once you have this core model (or ontology, as we call it, just to be fancy), everything becomes simpler: integration, because you map the complex systems to the sample and not to each other, and application development, because you build on a smaller footprint. And it now becomes possible to incorporate types of data previously thought un-integrate-able, such as unstructured, semi-structured, and/or social media data.

Semantic Arts has built these types of core data models for over a dozen very large firms, in almost as many industries, and helped to leverage them for their future information systems.  We now can do this in a very predictable and short period of time.  We’d be happy to discuss the possibilities with you.

Feel free to send us a note at [email protected].

Written by Dave McComb

The Evolution of the Data-Centric Revolution Part One

We have been portraying the move to a Data-Centric paradigm as a “Revolution” because of the major mental and cultural shifts that are prerequisites to making this shift. In another sense, the shift is the result of a long, gradual process; one which would have to be characterized as “evolutionary.”

This column is going to review some of the key missing links in the evolutionary history of the movement.

(For more on the Data Centric Revolution, see The Data Centric Revolution. In the likelihood that you’re not already data-centric, see The Seven Warning Signs of Appliosclerosis)

Applications as Decks of Cards

In the 50’s and 60’s, many computer applications made very little distinction between data and programs. A program was often punched out on thin cardboard “computer cards.” The data was punched out on the same kind of cards. The two decks of cards were put in the hopper together, and voila, output came out the other end. Payroll was a classic example of applications in this era. There was a card for each employee with their Social Security Number, rate of pay, current regular hours, overtime hours, and a few other essential bits of data. The program referred to data by the “column” numbers on the card where the data was found. Often people didn’t think of the data as separate from the program, as the two were intimately connected.

Click here to view on TDAN.com

What’s exciting about SHACL: RDF Data Shapes

An exciting new standard is under development at the W3C to add some much needed functionality to OWL. The main goals are to provide a concise, uniform syntax (presently called SHACL for Shapes Constraint Language) for both describing and constraining the contents of an RDF graph.  This dual purpose is what makes this such an exciting and useful technology.

RDF Data Shapes

What is a RDF Data Shape?

An RDF shape is a formal syntax for describing how data ishow data should be, or how data must be.

For example:

ex:ProductShape 
	a sh:Shape ;
	sh:scopeClass ex:Product ;
	sh:property [
		sh:predicate rdfs:label ;
		sh:dataType xsd:string;
		sh:minCount 1;
		sh:maxCount 1;
	];
	sh:property [
		sh:predicate ex:soldBy;
		sh:valueShape ex:SalesOrganizationShape ;
		sh:minCount 1;
	].

ex:SalesOrganizationShape
	a sh:Shape ;
	sh:scopeClass ex:SalesOrganization ;
	sh:property [
		sh:predicate rdfs:label ;
		sh:dataType xsd:string;
		sh:minCount 1;
		sh:maxCount 1;
	];

This can be interpreted as a description of what is (“Products have one label and are sold by at least one sales organization”), as a constraint (“Products must have exactly one label and must be sold by at least one sales organization”), or as a description of how data should be even if nonconforming data is still accepted by the system.  In the next sections I’d like to comment on a number of use cases for data shapes.

RDF Shapes as constraints

The primary use case for RDF data shapes is to constrain data coming into a system.  This is a non-trivial achievement for graph-based systems, and I think that the SHACL specification is a much better solution for achieving this than most.  Each of the SHACL atoms can, in principle, be expressed as an ASK query to evaluate the soundness of a repository.

RDF Shapes as a tool for describing existing data

OWL ontologies are good for describing the terms and how they can be used but lack a mechanism for describing what kinds of things have been said with those terms.  Data shapes fulfill this need nicely, which can make it significantly easier to perform systems integration work than simple diagrams or other informal tools.

Often in the course of building applications, the model is extended in ways that may be perfectly valid but otherwise undocumented.  Describing the data in RDF shapes provides a way to “pave the cow paths”, so to speak.

A benefit of this usage is that you get the advantages of being schema-less (since you may want to incorporate data even if it doesn’t conform) while still maintaining a model of how data can conform.

Another use case for this is when you are providing data to others.  In this case, you can provide a concise description of what data exists and how to put it together, which leads us to…

RDF Shapes as an outline for SELECT queries

A nice side-effect of RDF shapes that we’ve found is that once you’ve defined an object in terms of a shape, you’ve also essentially outlined how to query for it.

Given the example provided earlier, it’s easy to come up with:

SELECT ?product ?productLabel ?orgLabel WHERE {
	?product 
		a ex:Product ;
		rdfs:label ?productLabel ; 
		ex:soldBy ?salesOrg .
	?salesOrg
		a ex:SalesOrganization ;
		rdfs:label ?orgLabel .
}

None of this is made explicit by the OWL ontology—we need either something informal (e.g., diagrams and prose) or formal (e.g., the RDF shapes) to tell us how these objects relate in ways beyond disjointedness, domain/range, etc.

RDF Shapes as a mapping tool

I’ve found RDF shapes to be tremendously valuable as a tool for specifying how very different data sources map together.  For several months now we’ve been performing data conversion using R2RML.  While R2RML expresses how to map the relational DB to an RDF graph, it’s still extremely useful to have something like an RDF data shapes document to outline what data needs to be mapped.

I think there’s a lot of possibility for making these two specifications more symbiotic. For example, I could imagine combining the two (since it is all just RDF, after all) to specify in one pass what shape the data will take and how to map it from a relational database.

The future – RDF Shapes as UI specification

Our medium-term goal for RDF shapes is to generate a basic UI from a shapes specification. While this obviously wouldn’t work in 100% of use cases, there are a lot of instances where a barebones form UI would be fine, at least at first.  There are actually some interesting advantages to this; for instance, validation can be declared right in the model.

For further reading, see the W3C’s SHACL Use Cases and Requirements paper.  It touches on these use cases and many others.  One very interesting use case suggested in this paper is as a tool for data interoperability for loose-knit communities of practice (say, specific academic disciplines or industries lacking data consortia).  Rather than completely go without models, these communities can adopt guidelines in the form of RDF shapes documents.  I can see this being extremely useful for researchers working in disciplines lacking a comprehensive formal model (e.g., the social sciences); one researcher could simply share a set of RDF shapes with others to achieve a baseline level of data interoperability.

Governance in a Data-Centric Environment

How a Data-Centric Environment Becomes Harder to Govern

A traditional data landscape has the advantage of being extremely silo-ed.  By taking your entire data landscape and dividing it into thousands of databases, there is the potential that each database is small enough to be manageable.

As it turns out this is more potential than actuality.  Many of the individual application data models that we look at are individually more complex than the entire enterprise model should be.  However, that doesn’t help anyone trying to govern.  It is what it is.

What is helpful about all this silo-ization is that each silo has a smaller community of interest.  When you cut through all the procedures, maturity models and the like, governance is a social problem.  Social problems, such as “agreement,” get harder the more people you get involved.

From this standpoint, the status quo has a huge advantage, and a Data-Centric firm has a big challenge: there are far more people whose agreement one needs to solicit and obtain.

The other problem that Data-Centric brings to the table is the ease of change.  Data Governance likes things that change slower than the process can manage.  Often this is a toss-up.  Most systems are hard to change and most data governance processes are slow.  They are pretty much made for each other.

I remember when we built our first model driven application environment (unfortunately we chose health care for our first vertical).  We showed how you could change the UI, API, Schema, Constraints, etc.  in real time.  This freaked our sponsors out.  They couldn’t imagine how they would manage [govern] this kind of environment.  In retrospect, they were right.  They would not have been able to manage it.

This doesn’t mean the approach isn’t valid— it means we need to spend a lot more time on the approach to governance. We have two huge things working against us: we are taking the scope from tribal silos to the entire firm and we are increasing the tempo of change.

How a Data-Centric Environment Becomes Easier to Govern

A traditional data landscape has the disadvantage of being extremely silo-ed.  You get some local governance being silo-ed, but you have almost no hope on enterprise governance.  This is why its high-fives all around for local governance, while making little progress on firm wide governance.

One thing that data-centric provides that makes the data governance issues tractable is incredible reduction in complexity.  Because governance is a human activity, getting down to human scales of complexity is a huge advantage.

Furthermore, to enjoy the benefits of data-centric you have to be prepared to share.  A traditional environment encourages copying of enterprise data to restructure it and adapt it to your own local needs.  Pretty much all enterprises have data on their employees.  Lots of data actually.  A large percentage of applications also have data on employees.  Some merely have “users” (most of whom are employees) and their entitlements, but many have considerably more.  Inventory systems have cycle counters, procurement systems have purchasing agents, incident systems have reporters, you get the pattern.

Each system is dealing with another copy (maybe manually re-entered, maybe from a feed) of the core employee data.  Each system has structured the local representation differently and of course named all the fields differently.  Some of this is human nature, or maybe data modeler nature, that they want to put their own stamp on things, but some of it is inevitable.  When you buy a package, all the fields have names.  Few, if any of them, are the names you would have chosen, or the names in your enterprise model, if you have one.

With the most mature form of data-centric, you would have one set of enterprise employee data.  You can extend it, but the un-extended parts are used just as they are.  For most developers, this idea sounds either too good to be true or too bad to be true.  Most developers are comfortable with a world they control.  This is a world of tables within their database.  They can manage referential integrity within that world.  They can predict performance within that world.  They don’t like to think about a world where they have to accept someone else’s names and structures, and to agree with other groups decision making.

But once you overcome developer inertia on this topic and you are actually re-using data as it is, you have opened up a channel of communication that naturally leads to shared governance. Imagine a dozen departments consuming the exact same set of employee data.  Not local derivations of the HR golden record, or the LDAP files, but an actual shared data set.  They are incented to work together on the common data.  The natural thing to happen, and we have seen this in mature organizations, is the focus shifts to the smallest, realest, most common data elements.  This social movement, and this focus on what is key and what is real, actually makes it easier to have common governance.  You aren’t trying to foist one applications view of the world on the rest of the firm, you are trying to get the firm to understand and communicate what it cares about and what it shares.

And this creates a natural basis for governance despite the fact that the scope became considerably larger.

Click here to read more on TDAN.com

Linked Data Platform

The Linked Data Platform has achieved W3C recommendation status (which is pretty much acceptance as a standard) Linked Data Platform .  There are some good hints in LDP Primer and LDP Best Practiceslinked data .

This is the executive two paragraph treatment, to get you at least conversant with the topic.

Basically, LDP says if you treat everything like a container, and use the ldp:contains relationship to the things in the container, then the platform can treat everything consistently.  This gives us a RESTful interface onto a rdf database.  You can read from it and write to it, as long as there is a way to map your ontology to Containers and ldp:contains relationships.

Say you have a bunch of inventory related data.  You could declare that there is an Inventory container, and the connection between the Inventory Container and the Warehouses might be based on the hasStockkeeping locations.  Each Warehouse in turn could be cast as a Container and the contains relationship could point to the CatalogItems.

A promising way of getting a RESTful interface on a triple store.

Quantities, Number Units and Counting in gist

We have a simple and effective way in gist to represent a wide range of physical quantities such as ‘82 kg’, ‘3 meters’ and ‘20 minutes’.  Each quantity has a number and a unit, such as ‘meter’ or ‘second’.  In addition to these simple units, we have unit multiplication and division to represent more complex units, e.g. for speed and acceleration. A standard speed unit is meters per second [m/s] and a standard acceleration unit is meters per second per second [(m/s)/s] or simply  [m/s^2].

Physicists as well as business people like to avoid the inconvenience of working with very large or very small numbers like 1000 meters, or .00000001 meters (a trillionth of a meter).  If you counted to see if the number of zeros was correct, you understand the problem.  So we create units like kilometer and picometer and give them conversion factors.   This works for any kind of unit (time, electric current, mass).  Note that the standard units have a conversion of 1 (which in normal parlance, means there is no conversion necessary). See figure 1 for some examples.

Figure 1: Example Quantities

We also have found a need for counting units like dozen or gross. For example, a wine merchant stocks and sells cases of 12 bottles of wine, so counting in dozens is more convenient than counting single bottles of wine.  What is interesting is that we can use the exact same structure for representing ‘4 dozen’ or  ‘7 gross’ as we do for representing things like ‘82 kg’ and ‘20 minutes’.   Take ‘4 dozen’, the number is 4, and the unit is ‘dozen’ and the conversion is 12.

In gist there is also a way to represent percentages, which we have always treated as a ratio. After all, when speaking of a percentage, there is always an explicit or implicit ratio somewhere.  For example:

  1. “Shipment A has only 65% as much oil as shipment B” corresponds to the ratio:
    (No. of barrels in shipment A) / (No of barrels in shipment B) = .65
  2. “There are 20% more grams of chocolate in the new package size” corresponds to the ratio:
    (NewQuantity – OldQuantity) / (OldQuantity) = .20

The units for the first example are barrels/barrels which cancel out leaving a pure number. Similarly, the units for the second example are grams/grams which again cancel out. In fact, every ratio unit that corresponds to a percentage will cancel out and leave a pure number. This means that although it may be useful to do so, we don’t need to represent gist:Percentage using a ratio unit.

Another thing that we never realized before is that, being a pure number,  a percentage can be represented in the same way we represent dozen or gross. The only difference is the conversion (12 vs. .01).  We can use this same structure to represent:

  • parts per million (ppm), used by toxicologists say to measure amounts of mercury in tuna
  • basis points (used by the Fed for describing interest rates)
    Investopedia defines a basis point as “a unit that is equal to 1/100 of 1%”

See figure 2 for the representational structures.

 

Figure 2: One structure for number units and ordinary units

 

Notice how ‘ 4 cm’ is very similar to ‘4 percent’:

  • to convert 4 cm to its standard unit, we multiply 4 by the conversion factor of .01 resulting in .04 meters
  • to convert 4 percent to its standard unit, we multiply 4 by the conversion factor of .01 resulting in .04 ??.

This means we can use the same computational mechanism to perform units conversion for pure numbers like 4 dozen and 4% as we do for ordinary physical quantities like 4 cm or 82 kg.

One question remains. Whereas we can readily see that the conversion factor for kilometer is based on the standard unit of meter, and the conversion factor for hour is based on the standard unit of second, what are the conversion factors of 12, .01 and .00001 (for dozen, percent and basis point) based on? What does it mean to have a standard unit for these pure numbers with a conversion of 1?

Let’s look to see how gist represents dozen and kilometer to see if that gives us any insight.

  1. gist:kilometer is an instance of gist:DistanceUnit &
    ‘3 meters’ is an instance of gist:Extent &
    the base unit is gist:meterAnalogously:
  2. gist:dozen is an instance of gist:CountingUnit,
    ‘4 dozen’ is an instance of gist:Count &
    the base unit is gist:each

Curiously, while ‘meter’ actually means something to us, and we know what it means to say ‘3 meters’, it strange to think what ‘3 eaches’ could possibly mean.  I invite you to stare at the following table for a while and see some analogies.

Figure 3: Standard Unit for Pure Number Quanties

Then notice that:

  1. 4 dozen = 48 eaches
  2. 4 dozen = 48 (just a simple number)
  3. Therefore, 48 must equal 48 eaches (because both are equal to 4 dozen).

But what is it, such that if you have 48 of them gives you the number 48?  The answer is the number one:  48 x 1 = 1.  So the meaning of gist:each is the number one acting as a unit. This is a mathematical abstraction. The ??’s in figure 2 stand for ‘each’ which is the standard number unit. So when you say ‘3 eaches’ it is just 3 of the number one which is just the pure number 3.  As an aside, we can also say that ‘each’ is the identity element for unit multiplication and division. This is analogous to the number 1 being the identity element for multiplication and division of numbers.

  • You can multiply or divide any number by 1 and you get that number back.
  • You can multiply or divide any unit by each (which means one) and get that unit back.

Note that while conceptually they mean the same thing, syntactically gist:each is very different from the number one as a number whose datatype is say integer, or float.

Notice that for these pure numbers in convenient sized units, we are usually counting things: how many dozens, how many basis or percentage points, or how many parts per million.  We refer to ‘each’ thing as ‘one’ thing being counted.  So that links gist:each to the number one.  Thus, despite the awkwardness of speaking of ‘3 eaches’ the names ‘Count’, ‘CountingUnit’ and ‘each’ are quite reasonable.

Finally, insofar as all instances of CountingUnits are based on the number one, and all instances of Count represent pure numbers, we can think of every CountingUnits as a degenerate unit, and we can think of gist:Count as a degenerate quantity.  A ‘real’ quantity is not just a number, it has a number and has a non-numeric unit.

So in conclusion:

  1. We have extended the notion of gist:Count and gist:CountingUnit to apply to pure numbers that are less than one as well as those that are greater than one.
  2. We can represent pure numbers expressed in dozens, percentages, basis points and ppm just like we express the more usual quantities: ‘82 kg’, ‘3 meters’ and ‘20 minutes’.
  3. We can use the same computational mechanism to do units conversions on pure numbers as we can for ordinary physical quantities.
  4. We can represent gist:Percentage using a new unit called gist:percent with a conversion of .01 instead of using a ratio unit, making a more uniform representation.
  5. It will often be helpful to represent a gist:Percentage using a ratio, but it is no longer required.
  6. gist:Count could meaningfully and accurately be called gist:PureNumber since every instance of gist:Count (e.g. ‘4 dozen’, ‘65%’) is a pure number (e.g. 48, .65)
  7. gist:CountingUnit could meaningfully and accurately be called gist:PureNumberUnit because every instance of gist:CountingUnit is used to express pure numbers.
  8. gist:each corresponds to the number one
  9. We can think of Counts (pure numbers) and CountingUnits (number units) as degenerate cases of ordinary quantities and units like ’82 kg’ and ‘kg’

Written by Michael Uschold

Collections and Queries: Two Sides of the Same Coin?

This blog is wholly  inspired by an observation made by Dave McComb that collections and queries have an interesting relationship.

Collections frequently arise when creating enterprise ontologies.  In manufacturing, there are  lists of approved suppliers and lists of ingredients. In finance there are baskets of stocks making up a portfolio, and a 30-year mortgage corresponds to a collection of  360 monthly payments.   In healthcare, there are lists of side effects for a given drug,  and a patient bill is essentially a collection of line items, each with an associated cost. We will look at two ways to model collections this and consider the pros and cons of each. We will consider a list of approved suppliers for salt.

Represent the list as an explicit collection

The first is to create an explicit collection called say: _ApprovedSaltSuppliers.   The members of this collection are each suppliers of salt, say _Supplier1_Supplier14 and _Supplier 23.  We can link each of the suppliers to the collection using the object property gist:memberOf.   So far, we have 4 instances, one property,  three triples and no classes.

a collection

Figure 1: Simple Collection

It is always good practice to say what class an instance belongs to.  What kind of a thing is the instance: _ApprovedSaltSuppliers? First, it is a collection.  We have a class for that called gist:Collection, so we will want _ApprovedSaltSuppliers to be an instance of gist:Collection.  However, it more than just any collection, it is not a jury, it is not a deck of cards. More specifically, it is a list of approved suppliers. So we create a class called ListOfApprovedSuppliers and declare _ApprovedSaltSuppliers  to be an instance of that class. We also make ListOfApprovedSuppliersa subclass of gist:Collection, ensuring that _ApprovedSaltSuppliers is an instance of gist:Collection.

Using a SPARQL query to get the list

The second approach is inspired by the fact that a SPARQL query returns a report that is a collection of items from a triple store based on some specific criteria.   Instead of having an explicit collection in the triple store for approved suppliers of a given substance, you could simply link that substance to each of the approved suppliers and then write a SPARQL query to find them from a triple store of  past, present and potential future suppliers.  See figure for how this would be done.

Note, we have added some contextual information. First, we indicated the kind of substance they are being approved to supply. We also have a link to a person in charge of maintaining that list.  Can you think of what other information you might want to associate with the list of approved salt suppliers if you were modeling this for a specific organization?

a query

Figure 2: Comparing two approaches

How should we skin this cat?

Which approach is best under what circumstances and why?  Have another look at the examples above in finance, manufacturing and healthcare.  For each, consider the following questions:

  1. can you think of an easy, natural and obvious way to represent the information and write a query that can generate the list?
  2. are the items in the list likely to change with some degree of frequency?
  3. is the collection mainly ‘just a collection’ i.e. does it have little meaning on its own and few if any attributes or relationships linking it to other things?

For the list of approved suppliers, we already saw that the answer to the first question is yes. The answer to the second question is also yes because suppliers come and go.  There will likely be a moderate amount of change.  The third question is less clear.  It could be that the an approved supplier list is just connected to a given substance, and nothing else.  In this case, the answer to the third question would be yes.  The way we have modeled it includes a named individual responsible for creating and maintaining it.  The list might also be part of a larger body of documentation.  Or, if it is a larger organization where different divisions had their own approved supplier list, you would need to indicate which organization is being supplied.  With this extra information, the answer to the third question would be no.

Consider a patient bill.  It is not that obvious how to represent the information so a simple query can give an answer. First, a given patient might have many bills over time, which would connect them to many different line items. It is a bit awkward.  Second, these items will never change. Finally, while it makes sense to represent a patient bill as being in essence, a list of line items, it is much more than that. It is not only connected to the patient, but also to the hospital, to the provider, and possibly to an insurance company.

To the extent that the answer to the above three questions is yes, you are probably better off just writing an on-demand query.  Conversely, if the answers to the above questions tend to be no, then you probably do want to represent and manage an explicit collection.

Skip to content