Web 3.0

I’m weighing in in favor of Web 3.0 as an alias for the Semantic Web.

I’m weighing in in favor of Web 3.0 as an alias for the Semantic Web. I know there are a lot of people who will roll their eyes and initiate some anti hype exorcism, but let’s have a sober look at the pluses and minuses here. Web 3.0 is not without its problems. The first problem with Web 3.0 is that everyone is defining it to their own ends. As Montoya Herald summed it up at http://www.christianmontoya.com/2007/10/08/web-30-i-about-money/, Web 3.0 is essentially whatever each of the companies that used the term are working on next. The second problem is that it does pander to the hypemeisters. But the very people who decry hype the loudest are often those who benefit from it the most. (Who can argue that the hype of the Web and Web 2.0 didn’t advance the careers and opportunities of the very people who now think Web 3.0 is hype?) A lot of people seem to be comfortable with Web 2.0 now, despite the fact that it has no real unifying principle. Web 2.0 is blogs and wikis and Facebook and MySpace (user generated content) and AJAX and Rich Internet Applications for a richer user experience in a browser, but really there isn’t anything holding it together or giving it a defined shape. Maybe we don’t need to call the Semantic Web: Web 3.0. But if we don’t some other marginal improvement in an existing technology will claim the moniker. In other words, there will be a Web 3.0 and we will find ourselves explaining to people: “well yes, but that is just a part of the vision…” Isn’t the term “Semantic Web” good enough? It’s good for the population that is already “in the tent” but it suffers from being the next big thing for too long for many others. Many people have discounted what they believe the Semantic Web to be(often by making up things that it isn’t and then objecting to that straw man). Web services suffered from a similar fate, for a long time, as thought leaders confused it with services delivered over the web (Software as a Service for instance) which it has some things in common with, but the two aren’t the same. For some, calling the Semantic Web: Web 3.0 gives an opportunity to take another look. So, I’m coming down in favor of “Web 3.0 = Semantic Web.” What do you think?

 

Silos of Babel

Semantic technology can be a ‘lingua franca’ to connect disparate information silos.

Several speakers and writers have referred to the problem of systems integration as being one of getting systems written in different languages to communicate. Whenever I’ve heard or seen this, the speaker/writer goes on toIf you wanted to make a case that the semantic differences in languages made integration difficult, this would be the place to make it. say that the difficulty is that getting COBOL to speak to Java is like translating Spanish to Chinese. (Each speaker puts different languages in there.) I like an analogy as much as the next person, but this one strains it a bit too much.

Semantically, COBOL and Java (or any two procedural languages) aren’t that different. If you look in their BNF (Backus Naur Form – a standard way of expressing the grammar of a programming language) you’ll find only a few dozen key words in each language. Yes, there are grammatical differences and some things are harder to express in some languages than others.

But this isn’t where the problems come from. More modern languages (C++, Java, C# and the like) have very large frameworks and libraries, each consisting of tens of thousands of methods. If you wanted to make a case that the semantic differences in languages made integration difficult, this would be the place to make it.

Certainly it is these libraries that make learning a new language environment difficult, and it is these libraries that provide most of the behavior of a system, and therefore most of its semantics. This is what makes systems conversion difficult, for in converting a system to an equivalent in another environment, all the behavior must be reimplemented in an equivalent way in the new environment, and depending on how much of the environment was used this can be a large task. But this still isn’t where the integration problem lies.

Most integration is performed at the data level, either through extract files or some sort of messages. What makes integration difficult is each system to be interfaced has developed its own language about what each data attribute “means.” This is obvious with custom systems: analysts interview users, get their requirements and turn them into data designs. Each table and attribute means something, and this meaning is conveyed partially in the name of the item and partially in its documentation. Developers develop the system, procedure writers develop flow and procedures, and end users develop conventions.

Eventually each item means something, often close to what was intended, but rarely exactly the same. The same thing happens more subtly with packaged software. First, most packages have far more complex schemas than would be developed in a comparable custom system (the package vendors have to accommodate all possible uses of their package). Second, package vendors have tended to make their field definitions and validation more generic, as this is the low road to flexibility.

The process of implementing a package is really one of finalizing the definition of many of the attributes, some of which is done through parameters and some of which is procedural and convention. The net result is that a large organization will have many applications (classically each in its own “silo” or vertical slice of the organization) each of which has its data encoded in its own schema, essentially its own language. I sometimes call these their own idiolects (a dialect for one).

In my opinion it is these “silos of Babel” that create most of the difficulty, cost and expense in integrating. Luckily we have a solution to this the application of semantic technology as an intermediary, a lingua franca if you will, not of computer languages or libraries, but of meta data.

Written by Dave McComb

Schools of Enterprise Application Architecture

An Enterprise Application Architecture is the coordinating paradigm through which an organization’s IT assets inter-relate to form the computing infrastructure and support the business goals.

The choice of architecture impacts a range of issues including the cost of maintenance, the cost of development, security, and the availability of timely information.

Introduction

Architecture in the physical world often conforms to a style. For the most part we know what to expect when we hear a building described as ‘Tudor’ or ‘Victorian.’ All this really means is that the builder and architect have agreed to incorporate certain features into their design which are representative of a given school of thought for which we have a label. Similarly, there are schools of thought in Enterprise Architecture which when followed produce equally distinctive architectural results. This paper is an overview of the more prominent of these schools.

The Default Architecture

Imagine an enterprise – and for many of us this might be quite easy – in which no unifying discipline is applied to the application development and design process. Applications are created as needs are discovered, their implementation is directed by that part of the organization providing the budget, and their scope is constrained by the organizational unit within which they are conceived. The applications themselves perhaps go through a rigorous and well understood development process which ends, often as an afterthought, with an integration task in which the shiny new application is plugged into the rest of the enterprise.

Imagine, further, that we are now looking at this enterprise after this approach to application development has been practiced for years, and perhaps decades. What we will see is an evolutionary enterprise architecture in which the impact of non-technical issues, such as the personalities of the managers and the distribution of budgets between lines of business, is clearly visible in the legacy fossil record. The architecture will be a collection of seemingly arbitrarily defined and scoped applications tightly coupled to each other through hand crafted point-to-point interfaces. The problems with this architecture become evident as the enterprise grows. In theory, if every application has to share data with every other application, then the number of interfaces will approximate n(n-1)/2, where n is the number of applications.

This is an O(n2) growth in complexity, and is consequently a point of failure as the enterprise infrastructure grows, and n becomes large. In reality, of course, all of the applications do not share data with all the other applications, which suggests the increase in complexity is not necessarily exponential, but it is nonetheless a problem. John Zachman, creator of the Zachman framework, describes what occurs when we create applications in this manner as ‘post integration.’ The difficulty with post-integration, as he points out, is with semantic consistency. It becomes increasingly difficult to make sure that what we mean by a piece of data in one application is what we mean by that same piece of data in a different system which has received the data through one or more interfaces.

Controlling, and indeed just discovering, the semantics of the data is a difficult undertaking with this architecture, and without a clear definition of the semantics virtually nothing can be done with confidence. Poorly controlled semantics, of course, is exacerbated by another characteristic of the default architecture, which is uncontrolled data replication. Our multiple applications each have their own database, with their own copies of data, which they interpret in their own way. The replicated copies of data, in turn, rapidly become out of synch with each other, leading to an environment in which data meaning, currency and validity are all uncertain.

A fundamental problem with the default architecture is application coupling, by which we mean that a change to any application will have a scope of effect beyond that application. The enterprise applications are all tangled together, a bit like a ball of string. This means that changes that should logically be simple, localized and cheap end up as complex, broad ranging and expensive. The problems with the default architecture are manageable in small enterprises. It is only as the enterprise becomes larger that they become impossible to control. This paper is an overview of the successive approaches which enterprises have employed in response to this issue.

The Integrated Database Architecture

With the advent of the scaleable relational database in the mid 1980s enterprises saw a possible solution to the complexity created by the default architecture. The theory was quite compelling; the enterprise should have a single, large database which implements an enterprise wide conceptual data model. The semantics of the data would be centrally defined and under tight control. All of the application logic would operate off of the single data store, and its central definition.

Consequently there would be no growth in the number of interfaces, because there would be no need to have interfaces. There would be no data consistency or replication problems, because there would only be one copy of the data. Constraints in the database would require data to be collected completely by all parts of the application logic, and to be consistent. This was a seemingly perfect solution which most enterprises embraced enthusiastically.

The problems with the integrated database architecture did not appear immediately. The primary problem, in fact, is one of change over time. The integrated database is a great point in time solution. The problem is that once we have all of our applications based on this single database we have created, in programming terminology, a single giant ‘global variable,’ which, if changed in any way, has a potential effect everywhere. In other words the integrated database gives us tremendous data integration at the cost of extremely tight application coupling.

If, for example, we wish to change the logic of an application, perhaps to send our customers birthday cards, we are changing the same data structure – the database – which we are using to run our mission critical systems, and we are potentially also having to change those mission critical systems even though they do not care at all about birthday cards. So, at the end of the day, we are more likely than not to decide that we won’t change the mission critical systems, for reasons of risk and cost, and that we will rather forgo the birthday card function. The tightly coupled integrated database, then, has a flexibility problem.

Business processes change over time, and each change potentially impacts all of our applications. This means that making any single change is disproportionately expensive, and tends to be resisted, producing a non-responsive IT support infrastructure. When the business cannot change its core systems cost effectively, enterprising business users and IT managers will typically conclude that the obvious answer is to build ‘their own’ little system in parallel to the integrated database and then build an interface. This can often look like a constellation of Microsoft Access databases circling a mainframe, and performing both pre and post processing to support the business’ actual processes. In due course the peripheral applications become as important as the integrated database applications, and eventually the integrated database architecture begins to look like the default architecture, with one or more anomalously large applications. In the end, the integrated database architecture fails because it cannot inexpensively accommodate rapidly changing business processes, which are a hallmark of the modern enterprise.

The Distributed Object Architecture

The arrival of Object Orientation in the early 1990s heralded yet another approach to enterprise architecture. This approach said, in effect, that the problem with the integrated database architecture was one of programming. In order to create an application a developer would have to understand this large complex schema, and would then create logic to manipulate it. This logic, or behavior, defined for the data does, in effect, define the semantics of that data. Having developers re-define logic each time for core bits of functionality creates a ‘semantic drift,’ where the actually implemented behavior, from application to application, is inconsistent.

The distributed object architecture is a discipline which requires the enterprise to create an object representation of its core concepts, such as Customer, Order, and so on. When developers create an application they do so by invoking this predefined behavior, thus ensuring semantic equivalence between applications. The distributed object architecture is attractive in so far as the object analysis process extends logically from the Semantic Model, and leads to a centrally defined and controlled definition of data semantics and process. It is clearly an improvement over straight procedural logic sitting on top of a global database schema, as in the Integrated Database architecture, but it is only an incremental improvement.

The core limitations of the Integrated Database architecture, namely tight coupling and inflexibility, live on in the Distributed Object Architecture, which is not a surprise given that this approach is really nothing more than an object veneer over the integrated database. The distributed object architecture is implemented in a variety of technologies, including Enterprise Java Beans, Microsoft DCOM, and the Common Object Request Broker (CORBA).

The Message Bus Architecture

Flexibility has become one of the most important qualities of enterprise application architectures. ‘Flexibility’ is the capacity to change elements of the architecture at acceptable cost. The key to creating a flexible architecture is to decouple the independent pieces from one another, such that a change to one of the pieces does not unnecessarily require a change in any of the other independent pieces.

This capability is what has been missing from the prior architectures, and this is the primary contribution of the Message Bus Architecture. The message bus architecture returns us to an environment of independent applications maintaining their own databases. We add to this (typically) an ‘integration broker’, which is broadly responsible for communicating data between applications. The data communicated in this way is referred to as a message. By introducing the message broker as an intermediary, we are able to decouple applications from one another. Semantic consistency is enforced by representing the enterprise conceptual data model as a message model, or a centrally controlled message schema.

The n(n-1)/2 point to point to point interfaces are replaced by n interfaces – one from each application to the broker. We necessarily introduce a degree of data replication, but we control the replication through a change notification mechanism provided by the broker, typically in the form of publish/subscribe messages.

This controlled replication manages the data consistency issues, while at the same time creating a degree of ‘runtime decoupling,’ which allows the independent applications to operate even though other parts of the infrastructure may be unavailable. In this environment applications can be implemented in any technology and using whatever database schema they choose.

Their obligation to the enterprise is to generate a set of defined messages conforming to the message model, and to process incoming messages. They are free to change as and when they wish, as long as they continue to support their message contract. This is what is meant by decoupling, and this is where flexibility originates.

In a Message Bus architecture the message routing function, that is to say the logic which controls where a message is delivered, can be centralized in the broker without a loss of decoupling. When this is done, it becomes possible to see the message broker as a business process management (BPM) tool, and as a means of implementing enterprise wide workflow through the addition of rules.  When fully supported by the applications, the Message Bus Architecture allows the implementation of the ‘real time enterprise,’ in which all business events, regardless of origin, appear on the message bus and can be consumed by any interested application.

This can become especially interesting when events generated by external business partners reach the internal message bus, and vice-versa. The Message Bus Architecture requires careful implementation to provide true decoupling and flexibility. It is quite possible to create a network of point-to-point logical interfaces over the single technology interface to the broker. This occurs when applications create ad-hoc messages for every integration case; the solution here is to proactively architect the message model and ensure that it is not circumvented at the application level. The Message Bus Architecture is not, however, a complete solution.

At the technology level it is usually implemented with proprietary technology for the message broker, which is expensive to buy, and requires scarce and equally expensive personnel to use. The distributed nature of the solution necessarily creates multiple points of failure – which can be mitigated through careful design to maximize runtime decoupling – and one central point of failure in the integration broker. Performance issues are also a potential problem; poor application partitioning can create excessively high volumes of messages, and some use cases can be impacted through high network latency. The Message Bus Architecture is a viable solution, but it is not a trivial implementation.

Service Oriented

The Service Oriented Architecture is a refinement on the Message Bus Architecture. The advance with this architecture is the realization that many large granularity functions are automated in the enterprise in multiple places.

Many of our applications will do reporting, most applications will implement a user interface, most will concern themselves with security, and most will implement some form of business logic, and so on. The Service Oriented Architecture posits that the applications should be refactored and these pieces of functionality should be removed from the applications and implemented as a single ‘service’ which can be invoked at runtime. So, for example, reporting becomes a responsibility of the Information Delivery service, which might be implemented through a data warehouse, the user interface might be delegated to a portal service, the security functions will be implemented by an authentication and authorization service, and business logic perhaps by a business rules service.

The likely candidates for service orientation tend to be business neutral, in part because these functions appear repeatedly across the application inventory. The trade off of creating a service is that we have potentially created runtime-coupling between the service and the invoking application, and consequently created a point of failure. The benefit is the reduction of redundant functionality, and its central control and unification.

Taken to a logical extreme Service Orientation will allow applications to divest themselves of the responsibility for security, business logic, workflow and presentation, leaving very little beyond data store and configuration. Service Orientation can be implemented in a messaging environment using a broker, however this is not a requirement. Much of the current literature confuses the implementation technology with the concept, especially where the implementation technology is Web services.

The Orchestrated Web-Service Architecture

The latest technology trend is Web services. Web services are positioned to become the open standards based implementation of the Message Bus Architecture. Where applications currently communicate with the Message Bus using a vendor proprietary adaptor we will have a standard Web service interface instead.

Where the Message Bus Architecture performs message routing using a proprietary extensional routing – or orchestration – tool, or using intentional publish/subscribe logic, the web-service architecture will use the corresponding web service standard, at present BPEL4WS. Where the Message Bus Architecture implements guaranteed delivery through proprietary queuing mechanisms such as IBM-MQSeries and others, the Web service architecture will use upcoming standards such as HTTP-R or Web services reliable messaging.

The Web services standards are currently incomplete, and don’t fully overlap the proprietary products offerings, however the promise is clearly that in the near future Web services will offer an open standards alternative. Web services are by nature point-to point connections. Used naively this will create a technically state of the art implementation of the Default Architecture, with applications tightly bound to each other through many uncontrolled interfaces. The Orchestrated Web service architecture, consequently, introduces a broker to which all Web service calls are made, and which is responsible for forwarding those requests to the applications providing the service.

This centralized orchestration is what allows the Web service approach to remain decoupled. Similarly, by implementation asynchronous request/reply logic – which is to say the requestor does not block waiting for the reply – and by supplementing the standard Web service call over HTTP with guaranteed delivery, the broker is able to create an environment which is similar to that of the Message Bus Architecture. The Web service architecture is practical today, supplemented with various proprietary technologies. It represents an improvement over the Message Bus Architecture by being based on open standards and consequently reducing vendor lock-in.

The Just-in-time Integration Architecture

One of the interesting capabilities which the Web service technologies introduced is the concept of runtime discovery. The UDDI Web service specification allows an application to find a service at runtime, to bind to it, and invoke it. The client application searches for the service based on service categorization and conformance to an interface specification – in this case a WSDL document.

This capability allows us to conceive of an architecture in which Applications and Services create web-service interfaces and place their WSDL descriptions in the enterprise UDDI repository. When an application wishes to invoke a service it looks it up in the repository and invokes it. The key benefit of this approach is that the inter-application binding is entirely dynamic and consequently decoupled; we can replace the service provider at any time simply by changing its entry in the UDDI repository. With this approach there is no broker, and consequently there are no centrally provided management and control functions. However, in a decentralized internet based situation this maybe an appropriate architectural choice.

Conclusion

The choice of Enterprise Application architecture is critical to creating a successful IT infrastructure which is responsive to the business needs and which reinforces the qualities which are of value to the organization. All of the schools of architecture which are described here can be valid choices, just as building a Victorian style house is as legitimate a decision as building in a Tudor style. It is the responsibility of the architect, however, to ensure that the chosen architecture is appropriate for its environment. Although these architectural schools are evolving, and new ones are being created, most enterprises are clearly in a position to benefit from the adoption of a defined enterprise application architecture.

Written by Dave McComb

Role-Based Security

Role-based security is a means of implementing an authorization mechanism which has the potential to substantially reduce administrative cost and reduce vulnerability. Enterprise Role-based security addresses the problemrole-based security of maintaining authorizations within large IT environments; it is perhaps more accurately described as ‘Role-Based Access Control,’ or RBAC.

Application-Based Authorization

Authorization is the process through which a person is granted permission to invoke behavior, see, create, delete or update data, for one or more systems. The system is responsible for enforcing the permissions granted in the authorization process, and consequently, systems use a variety of techniques to support the authorization process. The most basic approach is to represent people in the system as ‘users,’ and then to enforce which users can perform what functions in the code.

So we end up with logic which looks like: ‘if user is Simon then deny…,’ and so on. The problem with this approach is that the ‘user’ concept really becomes a proxy for whichever employee is performing the function, and not an actually person. The user, in other words, is shared, with a corresponding security risk. Additionally, what the user can do is hard coded in the system, and difficult to change.

Introducing Roles

At the application level, role-based security addresses this problem by accepting that the ‘shared user,’ is really a role in the enterprise which multiple employees will perform. So, for example, a role might be ‘sales person,’ or ‘accountant.’ With roles made explicit in this manner we can then use the idea of a ‘User’ to be a proxy for an individual, with security information such as passwords specified for that person. The authorization process now becomes a matter of assigning one or more roles to a given user.

If the person that user represents is replaced by a different employee, then we delete the original user – or remove his or her roles – and create a new user for the new person. Many implementations of role-based security will choose to express in the system’s code the authority which has been granted to a specific role. This approach is supported directly in Microsoft’s .NET framework, where metadata can be used to flag functions with the required roles. This approach has the downside of being quite inflexible. Once the system has been created it is no longer a cheap proposition to change what your sales person role is permitted to see or do, which, given a dynamic business process, is not ideal.

The alternative to this approach is to represent the results of the authorization process as data in the system. This data is known as authorizations, permissions, entitlements, or provisions. A user can now perform a function if he possesses the required permission, through his role memberships. With this approach we have the ability to dynamically redefine the semantics of role membership as our business evolves.

Enterprise Authorization

Although application level role-based security is certainly a useful technique, the full benefit of this concept only appears in the enterprise-wide context. As things stand today most systems – whether they implement a role-based approach or not – contain their users’ authorizations in their own table structures, and have their own user interfaces for creating and removing users, and for defining authorizations. Consequently setting up new users and assigning them their authorizations is a system by system administrative task.

This is clearly a problem as the number of systems and users becomes large, or the turnover in users becomes high. Furthermore, there is an inherent security risk in this approach; since the authorizations for a given user are dispersed across multiple systems it becomes difficult to determine which authorizations a given user actually has. Diagrammatically, the current situation looks as depicted in figure 1, where we have three users in four systems, each with multiple entitlements (not drawn). If we wish, for example, to prevent our accountants from authorizing payments we would have to remove that entitlement in each system for each user, giving a total of 12 changes. Imagine a more representative environment of 50-100 systems with thousands of users. And if we miss one of those changes we have a security failure.

Enterprise Role-Based Authorization

Enterprise Role-based security is a solution to this problem. It introduces the idea of an ‘enterprise role,’ which, like the application role, represents a responsibility an employee might have in an enterprise, or, more broadly, a relationship a person has with the enterprise. A position in the organization’s hierarchy will have some set of responsibilities, and can consequently be described with a collection of enterprise roles. Defining the roles in an organization is an exercise in analysis which we call ‘Role Engineering,’ and is perhaps the most significant aspect of implementing rolebased security. Once we have the enterprise roles defined we can then associate our users with their roles, and the roles with authorizations, as depicted in Figure 2. Conceptually this approach allows us to add and remove authorizations from users with much more rigor; we no longer make security changes to individuals but to roles. By doing so we can be clear as to what people can do across our systems by virtue of their membership in certain roles. We would prefer not to associate authorizations with users directly, which should be considered an exception mechanism which complicates our overall gestalt of our granted authorizations.

Role-Based Security and the Enterprise Architecture

The structure described here can be implemented centrally and interfaced to the systems in the environment using, for example, a messaging infrastructure. This approach creates a true enterprise-wide role-based security implementation, and creates tremendous savings in administrative effort. Now, using our previous Accountant example, the revocation of an authorization requires a single change to one role, rather than 12 changes across four systems. Similarly, hiring employees is a simple matter of associating a new user with the roles defined for the job position, and firing employees is a matter of deleting a user. We now have a single, explicit definition of what a user can do in our systems; there is no longer redundant administration and consequently the possibility for security failures in this area is reduced. When we have an enterprise role based security service, as described here, we have to interface that service to the systems which are going to use it. The client systems should ideally receive from the service an abstract description of the authorizations a user has. The natural means of doing this is to create a model for authentications, and express that model using a standard XML dialect. Alternately, the service can provide the client systems with the users’ roles. The system will then maintain the authorizations for that role. This is clearly a less useful approach, since changing the authorizations for a given role can now no longer be done centrally.

Role Constraints

Once we have rationalized our authentications through the introduction of roles we are in a position to enforce a ‘least privilege’ policy, in which roles are granted only sufficient authorizations to perform their function. Similarly, we can now identify and enforce ‘separation of incompatible duties,’ where a user should not be a member of two roles simultaneously. A single user should not, for example, have the role for authorizing a payment and the role for submission of a payment. An elegant role-based security service will provide a means of expressing constraints on role membership, and allow roles to be described hierarchically, such that new roles can be created by the extension or restriction of existing roles. An interesting consequence of expressing entitlements as data – rather than as static rules to be coded for each role – is the idea of delegation. Delegation is the process through which a user transfers his or her entitlements to another user. Delegation of responsibility – and thus entitlement – is quite common in business; people routinely delegate their authority to legal representatives, for example. Delegation is quite simple in a role-based authorization system, although a consequence of this is that the applications receiving entitlements have to expect dynamically changing entitlement sets for any user, and create user interfaces accordingly. The role constraint mechanism also allows us to address a problem in delegation, where we want the person to be able to delegate an entitlement but not to posses that entitlement themselves. This can occur where we want a supervisor to grant entitlements to his subordinates but not to perform the function themselves for separation of duties concerns.

Conclusion

Role-based security, when implemented for the enterprise, is a means of reducing administrative cost while at the same time enhancing security by removing uncontrolled redundancy and enforcing role membership constraints. For a detailed discussion of the potential cost savings of enterprise role-based security see “The Economic Impact of Role-Based Access Control,” NIST, U.S. Department of Commerce.

First published October 2003 by Dave McComb