Semantic technology can be a ‘lingua franca’ to connect disparate information silos.
Several speakers and writers have referred to the problem of systems integration as being one of getting systems written in different languages to communicate. Whenever I’ve heard or seen this, the speaker/writer goes on to say that the difficulty is that getting COBOL to speak to Java is like translating Spanish to Chinese. (Each speaker puts different languages in there.) I like an analogy as much as the next person, but this one strains it a bit too much.
Semantically, COBOL and Java (or any two procedural languages) aren’t that different. If you look in their BNF (Backus Naur Form – a standard way of expressing the grammar of a programming language) you’ll find only a few dozen key words in each language. Yes, there are grammatical differences and some things are harder to express in some languages than others.
But this isn’t where the problems come from. More modern languages (C++, Java, C# and the like) have very large frameworks and libraries, each consisting of tens of thousands of methods. If you wanted to make a case that the semantic differences in languages made integration difficult, this would be the place to make it.
Certainly it is these libraries that make learning a new language environment difficult, and it is these libraries that provide most of the behavior of a system, and therefore most of its semantics. This is what makes systems conversion difficult, for in converting a system to an equivalent in another environment, all the behavior must be reimplemented in an equivalent way in the new environment, and depending on how much of the environment was used this can be a large task. But this still isn’t where the integration problem lies.
Most integration is performed at the data level, either through extract files or some sort of messages. What makes integration difficult is each system to be interfaced has developed its own language about what each data attribute “means.” This is obvious with custom systems: analysts interview users, get their requirements and turn them into data designs. Each table and attribute means something, and this meaning is conveyed partially in the name of the item and partially in its documentation. Developers develop the system, procedure writers develop flow and procedures, and end users develop conventions.
Eventually each item means something, often close to what was intended, but rarely exactly the same. The same thing happens more subtly with packaged software. First, most packages have far more complex schemas than would be developed in a comparable custom system (the package vendors have to accommodate all possible uses of their package). Second, package vendors have tended to make their field definitions and validation more generic, as this is the low road to flexibility.
The process of implementing a package is really one of finalizing the definition of many of the attributes, some of which is done through parameters and some of which is procedural and convention. The net result is that a large organization will have many applications (classically each in its own “silo” or vertical slice of the organization) each of which has its data encoded in its own schema, essentially its own language. I sometimes call these their own idiolects (a dialect for one).
In my opinion it is these “silos of Babel” that create most of the difficulty, cost and expense in integrating. Luckily we have a solution to this the application of semantic technology as an intermediary, a lingua franca if you will, not of computer languages or libraries, but of meta data.
Written by Dave McComb