Software Architecture Archives

Incremental Stealth Legacy Modernization

May 20, 2022March 8, 2022 by Dave McComb

I’m reading the book Kill it with Fire by Marianne Bellotti. It is a delightful book. Plenty of pragmatic advice, both on the architectural side (how to think through whether and when to break up that monolith) and the organizational side (how to get and maintain momentum for what are often long, drawn-out projects). So far in my reading she seems to advocate incremental improvement over rip and replace, which is sensible, given the terrible track record with rip and replace. Recommended reading for anyone who deals with legacy systems (which is to say anyone who deals with enterprise systems, because a majority are or will be legacy systems).

But there is a better way to modernize legacy systems. Let me spoil the suspense: it is Data-Centric. We are calling it Incremental Stealth Legacy Modernization because no one is going to get the green light to take this on directly. This article is for those playing the long game.

Legacy Systems

Legacy is the covering concept for a wide range of activities involving aging enterprise systems. I had the misfortune of working in Enterprise IT just as the term “Legacy” became pejorative. It was the early 1990’s, we were just completing a long-term strategic plan for John’s Manville. We decided to call it the “Legacy Plan” as we thought those involved with it would leave a legacy to those who came after. The ink had barely dried when “legacy” acquired a negative connotation. (While writing this I just looked it up, and Wikipedia thinks the term had already acquired its negative connotation in the 1980’s. Seems to me if it were in widespread use someone would have mentioned it before we published that report).

There are multiple definitions of what makes something a legacy system. Generally, it refers to older technology that is still in place and operating. What tends to keep legacy systems in place are networks of complex dependencies. A simple stand-alone program does not become a legacy system, because when the time comes, it can easily be rewritten and replaced. Legacy systems have hundreds or thousands of external dependencies, that often are not documented. Removing, replacing, or even updating legacy systems runs the risk of violating some of those dependencies. It is the fear of this disruption that keeps most legacy systems in place. And the longer it stays in place the more dependencies it accretes.

If these were the only forces affecting legacy systems, they would stay in place forever. The countervailing forces are obsolescence, dis-economy, and risk. While many parts of the enterprise depend on the legacy systems, the legacy system itself has dependencies. The system is dependent on operating systems, programming languages, middleware, and computer hardware. Any of these dependencies can and do become obsolescent and eventually obsolete. Obsolete components are no longer supported and therefore represent a high degree of risk of total failure of the system. The two main dimensions of dis-economy are operations and change. A modern system can typically run at a small fraction of the operating costs of a legacy system, especially when you tally up all the licenses for application systems, operating systems and middleware and add in salary costs for operators and administrators to support. The dis-economy of change is well known coming in the form of integration debt. Legacy systems are complex and brittle which makes change hard. The cost to make even the smallest changes to a legacy system are orders of magnitude more than the cost to make a similar change to a modern well-designed system. They are often written in obscure languages. One of my first legacy modernization projects involved replacing a payroll system written in assembler language with one that was to be written in “ADPAC.” You can be forgiven for thinking it is insane to have written a payroll system in assembler language, and even more so for replacing with a system written in a language that no one in the 21st century has heard of, but this was a long time ago, and is indicative of where legacy systems come from.

Legacy Modernization

Eventually the pressure to change overwhelms the inertia to leave things as they are. This usually does not end well for several reasons. Legacy modernization is usually long delayed. There is not a compelling need to change, and as a result for most of the life of a legacy systems resources have been assigned to other projects that get short term net positive returns. Upgrading the legacy system represents low upside. The new legacy system will do the same thing the old legacy system did, perhaps a bit cheaper or a bit better, but not fundamentally differently. Your old payroll system is paying everyone, and so will a new one.

As a result, the legacy modernization project is delayed as long as possible. When the inevitable precipitating event occurs, the replacement becomes urgent. People are frustrated with the old system. Replacing the legacy system with some more modern system seems like a desirable thing to do. Usually this involves replacing an application system with a package, as this is the easiest project to get approved. These projects were called “Rip and Replace” until the success rate of this approach plummeted. It is remarkable how expensive these projects are and how frequently they fail. Each failure further entrenches the legacy system and raises the stakes for the next project.

Ms. Bellotti points out in Kill it with Fire, many times the way to go is incremental improvement. By skillfully understanding the dependencies, and engineering decoupling techniques, such as APIs and intermediary data sets, it is possible to stave off some of the highest risk aspects of the legacy system. This is preferably to massive modernization projects that fail but, interestingly, has its own downsides: major portions of the legacy system continue to persist, and as she points out, few developers want to sign on to this type of work.

We want to outline a third way.

The Lost Opportunity

After a presentation on Data-Centricity someone in the audience pointed out that data-warehousing represented a form of Data-Centricity. Yes, in a way it does. With Data Warehousing and more recently Data Lakes and Data Lake houses, you have taken a subset of the data from numerous data silos and put it in one place for easier reporting. Yes, this captures a few of the data-centric tenets.

But what a lost opportunity. Think about it, we have spent the last 30 years setting up ETL pipelines and gone through several generations of data warehouses (from Kimball / Inmon roll your own to Teradata, Netezza to Snowflake and dozens more along the way) but have not gotten one inch closer to replacing any legacy systems. Indeed, the data warehouse is entrenching the legacy systems deeper by being dependent on them for their source of data. The industry has easily spent hundreds of billions of dollars, maybe even trillions of dollars over the last several decades, on warehouses and their ecosystems, but rather than getting us closer to legacy modernization we have gotten further from it.

Why no one will take you seriously

If you propose replacing a legacy system with a Knowledge Graph you will get laughed out of the room. Believe me, I’ve tried. They will point out that the legacy systems are vastly complex (which they are), have unknowable numbers of dependent systems (they do), the enterprise depends on their continued operation for its very existence (it does) and there are few if any reference sites of firms that have done this (also true). Yet, this is exactly what needs to be done, and at this point, it is the only real viable approach to legacy modernization.

So, if no one will take you seriously, and therefore no one will fund you for this route to legacy modernization, what are you to do? Go into stealth mode.

Think about it: if you did manage to get funded for a $100 million legacy replacement project, and it failed, what do you have? The company is out $100 million, and your reputation sinks with the $100 million. If instead you get approval for a $1 Million Knowledge Graph based project that delivers $2 million in value, they will encourage you to keep going. Nobody cares what the end game is, but you.

The answer then, is incremental stealth.

Tacking

At its core, it is much like sailing into the wind. You cannot sail directly into the wind. You must tack, and sail as close into the wind as you can, even though you are not headed directly towards your target. At some point, you will have gone far to the left of the direct line to your target, and you need to tack to starboard (boat speak for “right”). After a long starboard tack, it is time to tack to port.

In our analogy, taking on legacy modernization directly is sailing directly into the wind. It does not work. Incremental stealth is tacking. Keep in mind though, just incremental improvement without a strategy is like sailing with the wind (downwind): it’s fun and easy, but it takes you further from your goal, not closer.

The rest of this article are what we think the important tacking strategy should be for a firm that wants to take the Data-Centric route to legacy modernization. We have several clients that are on the second and third tack in this series.

I’m going to use a hypothetical HR / Payroll legacy domain for my examples here, but they apply to any domain.

Leg 1 – ETL to a Graph

The first tack is the simplest. Just extract some data from legacy systems and load it into a Graph Database. You will not get a lot of resistance to this, as it looks familiar. It looks like yet another data warehouse project. The only trick is getting sponsors to go this route instead of the tried-and-true data warehouse route. The key enablers here are to find problems well suited to graph structures, such as those that rely on graph analytics or shortest path problems. Find data that is hard to integrate in a data warehouse, a classic example is integrating structured data with unstructured data, which is nearly impossible in traditional warehouses, and merely tricky in graph environments.

The only difficulty is deciding how long to stay on this tack. As long as each project is adding benefit, it is tempting to stay on this tack for a long, long time. We recommend staying this course at least until you have a large subset of the data in at least one domain in the graph while refreshing frequently.

Let’s say after being on this tack for a long while you have all the key data on all your employees in the graph and being updated frequently

Leg 2 – Architecture MVP

On the first leg of the journey there are no updates being made directly to the graph. Just as in a data warehouse: no one makes updates in place in the data warehouse. It is not designed to handle that, and it would mess with everyone’s audit trails.

But a graph database does not have the limitations of a warehouse. It is possible to have ACID transactions directly in the graph. But you need a bit of architecture to do so. The challenge here is crating just enough architecture to get through your next tack. It depends a lot on what you think your next tack will be as to where you start. You’ll need constraint management to make sure your early projects are not loading invalid data back into your graph. Depending on the next tack you may need to implement fine grained security.

Whatever you choose, you will need to build or buy enough architecture to get your first update in place functionality going.

Leg 3 — Simple new Functionality in the Graph

In this leg we begin building update in place business use cases. We recommend not trying to replace anything yet. Concentrate on net new functionality. Some of the current best places to start are maintaining reference data (common shared data such as country codes, currencies, and taxonomies) and/ or some meta data management. Everyone seems to be doing data cataloging projects these days, they could just as well be done in the graph and give you experience and working through learning this new paradigm.

The objective here is to spend enough time on this tack that developers become comfortable with the new development paradigm. Coding directly to graph involves new libraries and new patterns.

Optionally, you may want to stay on this tack long enough to build “model driven development” (low code / no code in Gartner speak) capability into the architecture. The objective of this effort is to drastically reduce the cost of implementing new functionality in future tacks. Comparing before and after metrics on reduced code development, code testing, and code defects to make the case for the innovative approach will be alarming. Or you could leave model driven to a future tack.

Using the payroll / HR example, it will add new functionality dependent on HR data, but other things are not dependent on it. Maybe you built a skills database, or a learning management system. It depends on what is not yet in place that can be purely additive. These are the good places to start demonstrating business value.

Leg 4 – Understand the Legacy System and its Environment

Eventually you will get good at this and want to replace some legacy functionality. Before you do it will behoove you to do a bunch of deep research. Many legacy modernization attempts have run aground from not knowing what they did not know.

There are three things that you don’t fully know at this point:

• What data is the legacy system managing
• What business logic is the legacy system delivering
• What systems are dependent on the legacy system, and what is the nature of those dependencies.
If you have done the first three tacks well, you will have all the important data from the domain in the graph. But you will not have all the data. In fact, at the meta data level, it will appear that you have the tiniest fraction of the data. In your Knowledge Graph you may have populated a few hundred classes and used a few hundred properties, but your legacy system has tens of thousands of columns. By appearances you are missing a lot. What we have discovered anecdotally but have not proven yet, is that legacy systems are full of redundancy and emptiness. You will find that you do have most of the data you need, but before you proceed you need to prove this.

We recommend data profiling using software from a company such as GlobalIDs, IoTahoe or BigID. This software reads all the data in the legacy system and profiles it. It discovers patterns and creates histograms, which reveal where the redundancy is. More importantly, you can find data that is not in the graph and have a conversation about whether it is needed. A lot of data in legacy systems are accumulators (YTD, MTD etc.) that can easily be replaced by aggregation functions, processing flags that are no longer needed, and vast number of fields that are no longer used but both business and IT are afraid to let go. This will provide that certainty.

Another source of fear is “business logic” hidden in the legacy system. People fear that we do not know all of what the legacy system is doing and turning it off will break something. There are millions of lines of code in that legacy system, surely it is doing something useful. Actually, it is not. There is remarkably little essential business logic in most legacy systems. I know as I’ve built complex ERP systems and implemented many packages. Most of this code is just moving data from the database to an API, or to a transaction to another API, or into a conversational control record, or to the DOM if it is a more modern legacy system, onto the screen and back again. There is a bit of validation sprinkled throughout which some people call “business logic” but that is a stretch, it’s just validation. There is some mapping (when the user selects “Male” in the drop down put “1” in the gender field). And occasionally there is a bit of bona fide business logic. Calculating economic order quantities, critical paths or gross to net payroll calculations are genuine business logic. But they represent far less than 1% of the code base. The value is to be sure you have found them and insert into the graph.

This is where reverse engineering or legacy understanding software plays a vital role. Ms. Bellotti is 100% correct on this point as well. If you think these reverse engineer systems are going to automate your legacy conversion, you are in for a world of hurt. But what they can do is help you find the genuine business logic and provide some comfort to the sponsors that there isn’t something important that the legacy system is doing that no one knows about.

The final bit of understanding is the dependencies. This is the hardest one to get complete. The profiling software can help. Some can detect when the histogram of social security numbers in system A changes and the next day the same change is seen in system B, therefore it must be an interface. But beyond this the best you can do is catalog all the known data feeds and APIs. These are the major mechanisms that other systems use to become dependent on the legacy system. You will need to have strategies to mimic these dependencies to begin the migration.

This tack is purely research, and therefore does not develop any perceived immediate gain. You may need to bundle it with some other project that is providing incremental value to get it funded or you may fund it via contingency budget.

Leg 5 – Become the System of Record for some subset

Up to this point, data has been flowing into the graph from the legacy system or originating directly in the graph.

Now it is time to begin the reverse flow. We need to find an area where we can begin the flow going in the other direction. We now have enough architecture to build and answer use cases in the graph, it is time to start publishing rather than subscribing.

It is tempting to want to feed all the data back to the legacy system, but the legacy system has lots of data we do not want to source. Furthermore, this entrenches deeper into the legacy system. We need to pick off small areas that could decommission part of the legacy system.

Let’s say there was a certificate management system in the legacy system. We replace this with a better one in the graph and quit using the legacy one. But from our investigation above, we realize that the legacy certificate management system was feeding some points to the compensation management system. We just make sure the new system can feed the compensation system those points.

Leg 6 – Replace the dependencies incrementally

Now the flywheel is starting to turn. Encouraged by the early success of the reverse flow, the next step is to work out the data dependencies in the legacy system and work out a sequence to replace them.

The legacy payroll system is dependent on the benefit elections system. You now have two choices. You could replace the benefits system in the Graph. Now you will need to feed the results of the benefit elections (how much to deduct for the health care options etc.) to the legacy system. This might be the easier of the two options.

But the one that has the most impact is the other. Replace the payroll system. You have the benefits data feeding into the legacy system. If you replace the payroll system, there is nothing else (in HR) you need to feed. A feed the financial system and the government reporting system will be necessary, but you will have taken a much bigger leap in the legacy modernization effort.

Leg 7 – Lather, Rinse, Repeat

Once you have worked through a few of those, you can safely decommission the legacy system a bit at a time. Each time, pick off an area that can be isolated. Replace the functionality and feed the remaining bits of the legacy infrastructure if necessary. Just stop using that portion of the legacy system. The system will gradually atrophy. No need for any big bang replacement. The risk is incremental and can be rolled back and retried at any point.

Conclusion

We do not go into our clients claiming to be doing legacy modernization, but it is our intent to put them in a position where they could realize over time by applying knowledge graph capabilities.

We all know that at some point all legacy systems will have to be retired. At the moment the state of the art seems to be either “rip and replace” usually putting a packaged application in to replace the incumbent legacy system, or incrementally improve the legacy system in place.

We think there is a risk adverse, predictable, and self-funding route to legacy modernization, and it is done through Data-Centric implementation.

Human Scale Software Architecture

August 24, 2021August 7, 2015 by Dave McComb

In the physical built world there is the concept of “human scale” architecture, in other words, architecture that has been designed explicitly with the needs and constraints of humans in mind: data model humans that are typically between a few feet and 7 ft. tall and will only climb a few sets of stairs at a time, etc.

What’s been discovered in the physical construction of human scale architecture is that it is possible to build buildings that are more livable and more desirable to be lived in, which are more maintainable, can be evolved and turned into different uses over time, and need not be torn down far short of their potential useful life. We bring this concept to the world of software and software architecture because we feel that some of the great tragedies of the last 10 or 15 years have been the attempts to build and implement systems that are far beyond human scale.

Non- human scale software systems

There have been many reported instances of “runaway” projects; mega projects and projects that collapse under their own weight. The much quoted Standish Group reports that projects over $10 million in total cost have close to a 0% chance of finishing successfully, with success being defined as most of the promised functions within some reasonable percent of the original budget.

James Gosling, father of Java, recently reported that most Java projects have difficulty scaling beyond one million lines of code. Our own observations of such mega projects as the Taligent Project, the San Francisco project, and various others, find that tens of thousands or in some cases hundreds of thousands of classes in a class library are not only unwieldy for any human to comprehend and manage but are dysfunctional in and of themselves.

Where does the “scale” kick in?

What is it about systems that exceeds the reach of humans? Unlike buildings where the scale is proportional to the size of our physical frames, information systems have no such boundary or constraint. What we have are cognitive limits. George Miller famously pointed out in the mid-fifties that the human mind could only retain in its immediate short-term memory seven, plus or minus two, objects. That is a very limited range of cognitive ability to hold in one’s short-term memory. We have discovered that the short-term memory can be greatly aided by visual aids and the like (see our paper, “The Magic Number 200+/- 50”), but even then there are some definite limits in the realm of visual acuity and field of vision.

Leveling the playing field

What data modelers found a long time ago, although in practice had a difficult time disciplining themselves to implement, was that complex systems needed to be “leveled,” i.e., partitioned in the levels of detail such that at each level a human could comprehend the whole. We need this for our enterprise systems now. The complexity of existing systems is vast, and in many cases there is no leveling mechanism.

The Enterprise Data Model: Not Human Scale

Take, for instance, the corporate data model. Many corporations constructed a corporate data model in the 1980s or 1990s. Very often they may have started with a conceptual data model which was then transformed into a logical data model and eventually found its way to a physical data model; an actual implemented set of tables and columns and relationships in databases. And while there may have been some leveling or abstraction in the conceptual and logical models, there is virtually none in the physical implementation. There is merely a partitioning which has usually occurred either by the accidental selection of projects or by the accidental selection of packages to acquire and implement.

As a result, we very often have the very same concept implemented in different applications with different names or sometimes a similar concept with different names. In any case, what is implemented or purchased very often is a large flat model consisting of thousands and usually tens of thousands of attributes. Any programmer and many users must understand what all or many of these attributes are and how are they used and how they are related to each other in order to be able to safely use the system or make modifications to it. Understanding thousands or tens of thousands of attributes is at the edge of human cognitive ability, and generally is only done by a handful of people who devote themselves to it full time.

Three approaches to taming the complexity

Divide and Conquer

One of the simplest ways of reducing complexity is to divide the problem down. This only works if after you’ve made the division you no longer need to understand the rest of the parts in detail. Merely dividing an ERP system into modules generally does not reduce the scope of the complexity that need to be understood.

Useful Abstraction

By abstracting we can gain two benefits. First there are fewer things to know and deal with, and second we can concentrate on behavior and rules that apply to the abstraction. Rather than deal separately with twenty types of licenses and permits (as one of our clients was doing) it is possible to treat all of them as special cases of a single abstraction. For this to be useful two more things are needed: there must be a way to distinguish the variations, without having to deal with the difference all the time; and it must be possible to deal with the abstraction without invoking all the detail.

Just in Time Knowledge

Instead of learning everything about a data model, withmodeproper tools we can defer our learning about part of the model until we need to. This requires an active metadata repository that can explain the parts of the model we don’t yet know in terms that we do know.

Written by Dave McComb

How to Run a Project Over Budget by 300-500%

August 24, 2021July 25, 2014 by Dave McComb

A Playbook you Don’t want to Follow

A while back, I was working for a large consulting firm. When I was returning to the US from an overseas assignment, I was allowed to select the city I would return to. I told my boss, who was on the board of this firm, my choice. He counseled against it as apparently the office was being hollowed out, having just hosted the largest project write-off in the history of the firm. (This was a while ago so these numbers will seem like rounding errors to today’s consultants, but I think the lessons remain the same.)

When I found out that we had written off something like $30 million, I asked how anyone could possibly run a project over budget by that much. He said, it’s pretty hard, but you have to follow this specific playbook:

The consulting firm starts the project, creates an estimate, staffs up the project and goes to work (so far this is pretty much like any other project).
At about the time they’ve burned through most of the budget, they realize they’re not done, and not likely to finish anytime soon. At this point they declare a change in scope and convince the client to accept most of the projected overrun. Typically at this point it’s projected to be about 50%.
As they near the end of the extension it becomes obvious that they won’t hit the extended budget either. Senior management in the consulting firm recognizes this as well and sacrifices the project manager, and brings in Project Manager #2.
PM #2 has a very standard script (I don’t know if there is a school for this or if they all work it out on their own): “This is way worse than we thought. It’s not 90% complete (as the outgoing Project Manager had said). It’s not even 50% complete.” New estimates and budgets are drawn up, the client is apprised of the situation. The client has a lot into the project at this point, but also is very reluctant to pay for all this mismanagement. Eventually both parties agree to split the cost of the overrun. The total budget is now between 250% -%300 of the original.
In order to spend all this extra budget, and to get some new much-needed talent on the team, PM #2 brings in more staff. If the project completes in the new (3rd) budget (and sometimes they do) you have a reluctantly satisfied client (at least they got their system) and consultants (even at half price for the last portion they were making money).
Alas, sometimes even that doesn’t work. And when it doesn’t, back to the playbook. Bring in PM #3. PM #3 has to be very senior. This has to work. PM #3, in order to maintain his or her reputation, has to make this troubled project succeed.
PM #3 doesn’t miss a beat. “This is way worse than we thought…” (almost any number can be inserted at this point, but 400 – 500% of the original is not out of range. ) At this point there is no more going back to the client. They consulting firm will eat the rest of the overrun. PM #3 will make sure the new number will absolutely assure success. The consulting firm accepts the write off and finishes the project.

That is pretty much the playbook for how to run a project over budget by that amount. You might well ask, how did they manage to run over in the first place?

Tolstoy said, “Happy families are all alike; every unhappy family is unhappy in its own way.” And so it is with software projects. Each seems to go bad for a different reason. And if you do enough, the odds will catch up to you. But that will be a subject for another article.

Written by Dave McComb

The Magic Number in Complex Systems

August 24, 2021July 22, 2014 by Dave McComb

Somewhere around 200 items seems to be the optimum number of interrelated things we can deal with at one time, when dealing with complex systems such as computer software.

In 1956 George Miller wrote an article for the Psychological Review called “The magic number seven plus or minus two: some limits on The Magic Number in Complex Systems our capacity to process information”. In the article Miller cites many different studies that demonstrate the human mind’s capacity to hold only seven objects in short-term memory simultaneously. Much has been made of the studies in this article, especially in the field of structured design methodologies. The article was a seminal work in laying out the limits of the mind’s ability to handle complexity.

What we’ve discovered, admittedly with far fewer scientific studies to back us up, is something that we believe has been known to architects and engineers for generations. Specifically, that the human mind coupled with external aids is capable of dealing with several hundred items simultaneously in its “near-term” memory.

Architects and engineers have been the major users of oversized sheets of paper for decades. What they realize, and those of us in this software development business have been slow to realize, is that having a great deal of information in your field of view allows you to anchor some information in visible space while you confidently scan other areas of the design.

This is both obvious and puzzling. It’s obvious because software designers and developers deal with extreme degrees of complexity, and any aids that would help understand and manage that complexity would be very well received. It is perplexing because for a $3,000 investment in a plotter and Visio, pretty much any developer has access to the ability to deal with greater amounts of complexity.

Once a design is complete it can certainly be factored into smaller pieces (subsystems, modules, views, etc.) and rendered onto 8 1/2 by 11 sheets of paper. However, we’ve discovered that there are two key times when the design in its entirety needs to be reviewed: during the design process itself and any time you need to communicate that design to other parties. Of course, those two cases cover almost all needs for documentation, so one wonders how the 8 1/2 by 11 format has persisted.

We also wondered if the 36″ by 44″ (E6) paper size that we’ve been dealing with is really optimal. While we again have only empirical evidence, we believe that this is close to the optimal. At 36″ by 44″ a page can be surveyed at arm’s-length and you can review the entire surface without moving. However, beyond 33 by 44 inches you must physically move in order to see the entire surface. Alternatively, you can increase the font size, however, by doing so you have not increased the potential number of interacting items in the design.

At 33 by 44 inches we’ve found that it is possible to get approximately 200 to 250 individual related design elements on a single piece of paper. We’ve been using this methodology for complex semantic models, database designs, application interface diagrams and proposed bus style architectures. We believe that it would also be useful for the documentation of large complex class structures, however, tool support has been hard to find. Some evidence can be had in the popularity of wall sized charts for popular class libraries, but think how much more useful this would be for designs in progress.

In conclusion, we suspect that there is another level of human cognition aided by spatial diagrams where the mind is able to deal with hundreds of interrelated articles in nearly real-time. As they say in the academic business, “More research is needed.” As we say in our more pragmatic business, “It’s obvious enough the first time you try it.”

Written by Dave McComb

System Project Failure: The Heuristics of Risk

August 24, 2021July 22, 2014 by Dave McComb

Information Systems Project Failure: The Heuristics of Risk

An Evaluation of Risk Factors in Large Systems Engineering Projects

This article was originally published in the Journal of Information Systems Management, Volume 8, Number 1, Winter 1991. It is reprinted here by permission of the publisher: www.crcpress.com

System project failures are a well-known part of systems development; however. all the potential risks of planning and executing a project effort are not. This article offers heuristic guidelines to help Information Systems managers assess these inherent risk factors before initiating and while executing a project. Case examples illustrate how a particular risk factor can result in a failed systems development effort. Most Information Systems managers who have been responsible for any type of systems development effort are familiar with project failure. Although publicity is rare, a few project failures have received coverage in the trade press. Some project failures have been labeled runaways, which is “a system that is millions [of dollars] over budget, years behind schedule, and—if ever completed—less effective than promised.

Several years ago, another runaway project disaster received notoriety because the failure affected five million New Jersey state drivers.

The fourth-generation language used to develop the system was not equipped to handle the daily volume of car registrations and license renewals. According to one accounting firm’s study, the problem is widespread: runaway projects accounted for some 35% of the total projects under development in its 600 largest client organizations. Federal government officials have become wary of a similar phenomenon known as grand design approaches to systems development.

A grand design, based on traditional systems development principles, dictates specification of all requirements up front. Problems surface in the federal government bureaucracies because projects are usually large in size and complexity. A large project scope means that costs escalate, time schedules lag, and user, project, and vendor personnel change frequently. Furthermore, Congress balks at the estimated costs, forcing compromise solutions that are prone to implementation failure.

This article attempts to determine the patterns that exist within systems projects that begin well and, for whatever reasons, finish less successfully. To focus the initial work, each project selected used a traditional systems development methodology. Each project failed to meet user expectations. The following observations tie in well with the authors’ experiences with systems development projects. More importantly, these observations may be useful in helping project managers assess the impact of changes to their projects relative to their future success or failure. Furthermore, senior managers and systems sponsors may find help with their decisions regarding the continuation or cancellation of troubled systems projects.

Framework to Identify System Failure Factors

The factors are organized along two dimensions: planning versus executing and technical versus human. The placement of the factors defies pigeonholing; as the framework suggests, few factors are purely classified in one dimension. Broken lines in the exhibit indicate dimension zones. For the purposes of this article, a clockwise approach explains the framework. Beginning with those factors that are most planning related, continuing clockwise through planning-human, executing-human, executing, executing-technical, and planning-technical.

Planning Factors

The project, still staffed at eight but gradually growing, was allowed to drift. Two years later, there were more than 100 employees on the project with a considerable amount of work yet to be done.

Although most project failures surface in late execution, the problems often originate during the planning and can occur irrespective of the planning approach being used. Such problems involve mistakes made in time and budget estimates as well as in compression (i.e., project scheduling and management activities) Estimating. Independent of specific tools or techniques, project managers or project sponsors generally use one of five estimating approaches—available budget, level of effort, bottom-up, comparable case, and executive mandate. Note that the approach taken appears to correlate with prospects for project success.

Available Budget

Information Systems managers accustomed to operating their department in a stable environment usually believe that projects will continue within the allotted budget, plus or minus a small percentage. The threat to their projects success is they fail to recognize that taking on a project adds many tasks and virtually none of these tasks can be shortchanged without jeopardizing the entire project.

Level of Effort

The level-of-effort estimate relies on a judgment call that a certain number of employees should be able to complete each task in a certain amount of time. For example, task one should take three people two months to complete. This approach at least recognizes that there is a relationship between the effort and the tasks initiated. However, the approach fails because there is no inherent feedback loop. Because the estimate is judgmental and no two tasks are the same, the next project is always slightly different. Thus, this type of project estimating does not improve with project manager experience.

Bottom-Up Approach

With this approach, planners ascertain individual tasks (e.g., number of pages of procedures to be written, number of employees to be trained, number of screens to be designed). Then, they assign resources and time blocks to accomplish the tasks. Because these estimating parameters are reasonably constant, they provide a common denominator from project to project. Through project time control systems, Information Systems managers know whether or not their estimates are correct. Therefore, the bottom-up method has a self-correcting mechanism as the project proceeds and a learning component to use for each project.

Comparable Case

Very often, especially in long-term planning, estimates must be made for projects prior to the knowledge of detail required for bottom-up estimating. Usually, there are a handful of parameters that determine the size of the project and compare it to similar projects in similar-size organizations in the same industry. For example, although accounts receivable projects may be very different between a construction company and a hospital, they should generally be comparable between hospitals that are approximately the same size. This method is particularly beneficial for project prioritization and rough-cut decisions regarding the necessary worker hours. However, statistics from similar projects may be difficult to obtain. Once the project begins, planners may convert to the bottom-up method.

Executive Mandate

At times, an organization’s senior executives dictate a project completion date on the basis of political, environmental, and market conditions. For example, one organization’s president pushed up a target completion date from three years to one year for a $25 million project involving major technological change and data conversion. Estimates crunched data conversion to two weeks and allowed only one week for a systems test. The reason: that president wanted to implement the system before his retirement.

Studies have shown that in many cases, assigning more staff to a project will not serve to compress the project completion timetable.

As a general rule and regardless of the estimating method used, the more detail included in estimating project time and budget, the more accurate the estimation. In practice, planners do not include enough detail, and a primary source of systems failures is a predilection toward gross underestimation. Gross underestimation does not mean 20% to 30% underestimation, but rather 100% to 1,000% (based on field observations). Furthermore, an acceptable estimation with sufficient detail does not guarantee that the estimate will be allocated the appropriate resources. The following example is a case in point.

Example 1

A client of one of the author’s was, for reasons outside his control, required to replace all of the organization’s information systems. He had a staff of eight and a 1 ‘/2-year timetable to complete the task. His own level of effort estimate suggested that this time was sufficient. An outside group performed a comparable case estimate indicating that the implementation time would require approximately 25,000 workdays or almost 20 times what he had available. Senior management was shocked, agreed in principle to the external estimation, but did nothing other than transfer responsibility for the project to another department.

The project, still staffed at eight but gradually growing, was allowed to drift. Two years later, there were more than 100 employees on the project with a considerable amount of work yet to be done. Had the organization acted on the estimates at the time, the project probably would have essentially been complete at the end of the two-year period and certainly the team members would have felt much better about their involvement.

Compression

This is the act of speeding up tasks. There are two types of compression: planned (i.e., the fast-track approach) and unplanned (i.e., the catch-up approach). Fast-tracked compression management is an art requiring the ability to begin tasks in parallel that are usually conducted sequentially. Managers must rely on their own judgment to predict enough of the outcomes of the tasks usually done first to successfully mesh succeeding tasks and compress the schedule for the final outcome.

Catch-up compression management is probably more common than fast-track management in systems development projects. The problem is that certain tasks do not compress well. Studies have shown that in many cases assigning more staff to a project will not serve to compress the project completion timetable. Instead, more staff usually delays further the completion of the project.

Planning-Human Factors

The planning-human factors include planning aspects that deal predominantly with human communication and scheduling. These factors are bid strategy, staffing, and scheduling. Bid Strategy. Almost all contemporary systems projects use outside vendors either for hardware, software, or other services.

These products and services are often put out to bid. The bidding strategy, whether it be for the entire project or for subcontracting portions of the project, has a major impact on the project’s success. Unfortunately, there is no single best bid-ding methodology. However, most project managers or organizations have a favorite approach. Examples include a fixed-price strategy (i.e., bidding to a set price) and always subcontracting software development to someone other than its designer to prevent conflict of interest. Among the most popular bidding strategies is the lowest bidder strategy.

The inherent risk in selecting the lowest bidder is the magnitude of differences in productivity between programmers and designers, often a factor of 10:1. The impact of this productivity differential is even greater for the individuals who have a major role in the project (e.g., systems architects and project managers). With that great individual variability on the quality and the productivity, selecting a low bid vendor would seem to be an almost certain prescription for failure, and it usually is. Government agencies in particular are forced through regulations to accept the lowest bidder.

As a result, they go to great lengths to disqualify vendors that private businesses would have eliminated on the basis of subjective evaluation. Instead, they redefine their requirements to be strict enough to attempt to eliminate the unqualified on the basis of submitting nonresponsive bids. Staffing. There are two facets to the staffing problem: incorrect staffing and staff turnover.

Incorrect Staffing

Structural and civil engineering made some of their great strides when they systematically studied failed structures and then incorporated lessons from those investigations into their methods, standards, and approaches. The information systems industry is still in the early stages of a similar evolution.

The most serious aspect of the incorrect staffing problem stems directly from the estimating problem. That is, inadequate total staff is due to a shortsighted estimate. Other incorrect staffing problems are retaining project members who are not results oriented, or who lack the ability to cooperate as team members, or who do not have the skills called for in the work plan (e.g., failure to include staff with end-user knowledge, or with sufficient technical knowledge of the architecture, or with systems development experience). The following example illustrates project difficulties when there is a lack of end-user knowledge.

Example 2

An agency was processing several million dollars worth of revenue checks per month. The processing procedure was very complex and required some extensive reconciliation. Many of the checks were for large amounts of money. In the current procedure, the check accompanied the reconciling paperwork until the reconciliation was complete. At that point the check was forwarded for deposit. As long as the reconciliation was done on the same day this was not a major problem.

However, the reconciliation process had become very involved, and as a result, checks were being delayed or misplaced. The proposed and designed solution was to create a system that would log the movement of the check so that at any given time anyone could tell the exact location of a check.

One change made to a program module or other design feature often creates a domino effect.

When it was suggested that a more straightforward solution would be to put the chicks directly into the bank and do the reconciliation later, the new system was strongly resisted. A significant problem in this example was that there was no one on the system project team familiar with current practices in revenue and collection systems.

Staff Turnover

This is a two-pronged problem. One prong is ensuring sufficient continuity throughout the life of the project; the other prong is recognizing that there will be major changes in the composition of the team as the project progresses. It is difficult to ensure absolute continuity of staff to a project team because employees are always free to quit, or they may get reassigned. However, it is possible to avoid planned changes in the staff team. The classic waterfall life cycle methodology provides the opportunity, and in some cases the requirement, to reconsider and rebid every project at the end of every systems development phase.

Besides the effect of losing the skills and knowledge of the outgoing members of the team, there is a much more important and subtle factor at work. That is, the new team may feel no compulsion to the design or the solution as proposed in the previous phase. Particularly tenuous are those decisions related to project scope and expense. As a result, there is very often a subtle rescoping of the project every time the members of the project team change.

Scheduling

Still in the realm of planning-human, although now at a more tactical level, are those factors related to scheduling (including the sequencing of activities). The sequence and scheduling of project activities will vary by project type. For example, transaction processing systems usually follow the traditional systems development life cycle (SDLC) approach whereas decision support systems may use applications prototyping. Whatever methodology is used, the sequence and scheduling of project activities should follow a logical, additive progression toward defined goals.

Executing-Human Factors

Project execution, from the human (or personal) side, may be stalled by the lack of feedback, user involvement, and motivation. Each factor inhibits smooth execution of the project.

Feedback

The lack of unbiased feedback surfaces when project managers and systems personnel believe that they can move forward and meet impossible deadlines. Part of the problem is that progress in software or systems development is mostly invisible. Programmers may deceive both themselves and their managers about the extent of their progress. For example, a programmer believed to be 90% complete on a module may actually have 50% of the work yet to do.

User Involvement

This factor includes both the importance of user participation in the design and adequate user training to use the system. The time allotted to procedure development and user training is often too short. User management usually regards systems training as an addition to the regular job requirements rather than giving users sufficient time away from their job responsibilities to learn the system. One of the keys to a system’s project success is to establish ownership among systems users. An effective method of accomplishing user ownership is to let users change requirements during the development process or tailor requirements afterwards. However, this is a two- edged sword if change is not controlled. The importance of user ownership is illustrated in the following example.

Example 3

Because of regulatory changes, a company needed to change both its cost accounting system and allocation methods. Users initiated an internal project that defined some unnecessary complex algorithms and required a major mainframe development effort. A senior vice-president, realizing that the project would not be completed by the established deadline, authorized a second, external team to create a second backup system. Within three weeks, the external team completed the second system using a fully functional prototype at about one-tenth the development costs and one-tenth the operating costs of the internal project.

By this point, the users had such a strong personal involvement with the system they were developing that they rejected this new system despite its advantages. From the external team’s perspective, the project was a failure in spite of the overwhelming cost, schedule, and functional advantages. This failure occurred be- cause the users were not involved in the development of the system.

Motivation

Motivation is a universal personnel problem. With respect to systems personnel, standard time allotments for a given unit of work are not always valid because the motivation of systems personnel varies greatly; that is, the level of motivation of systems staff members will determine how quickly (or slowly) it takes them to complete the work. Two aspects of motivating systems personnel are project-reward structures and the project manager’s success in motivating the team members.

The reward structure can have a significant effect on a project’s outcome. For example, one project team was staffed by employees who performed this systems implementation in addition to their full-time jobs. They were told that when the system converted, they would have the new positions that this new system implied.

As a result, team members worked a lot of overtime to complete the project. Another project effort involved the same arrangement except for the reward structure. Team members had to develop the system in their spare time without the incentive of building future jobs. As a result, the project failed.

Executing Factors

Change management and subsequent workarounds and catch-ups are execution problems stemming directly from the lack of unbiased feedback. When people set impossible deadlines, they act frantically to get the work done rather than admit that they are behind schedule.

Change Management

Once a design specification is finalized, the order goes out to freeze the spec. Although no specifications are ever absolutely frozen, too many changes after finalization may create havoc. One change made to a program module or other design feature often creates a domino effect. Actual changes are often on a larger scale than originally intended.

The extent and impact of subsequent changes to design specifications are products of design quality, project team-user relations, and the project team’s attitude toward change. Analysts who want to please everyone may bring about a project’s demise. Every suggestion brought forward by a user becomes a change. Changes beget other changes until the project dies of accumulated entropy, as in the following example.

Example 4

One project exploded with accumulated changes (see Exhibit 2). Suggested changes (e.g., an inventory aging report) were documented on functional investigation memos (FIMs). A FIM is a one-page narrative describing the change from a user’s perspective. The team analyzed each FIM to determine such affected systems components as data base layout changes or program changes. The change to each affected component was documented on what was called a systems investigation memo (SIM).

Once, authorized, SIMs became technical change requests (TCRs), which consist of module-level detail from which programmers could code. The idea was to accumulate all changes and then open each module only once. This philosophy missed an important point. That is, it is a better strategy to focus on the impact of a single change throughout the entire system than to focus on testing individual modules. In any case, requested changes kept coming in and, as the exhibit suggests, complexity multiplied. After nearly a year of implementing changes, the change backlog was larger than the original change backlog.

After a change in management, the team eliminated all of the unnecessary changes, backed out of some of the coding changes that had already been made, and established a completed version of the system within five months.

Workarounds

Some project managers will continue to find more ingenious ways to live with a problem rather than solve it. For example, one organization implemented an MRP II system (Manufacturing Resource Planning) that was inconsistent with its previous business practices, which were already identified on PC software and procedures. Rather than reconciling the inconsistencies and opting in favor of one of the two systems, employees looked for ways to work around the inconsistencies.

For example, the data element due-date on the informal PC system was the most recent estimate as to when the materials would arrive. In contrast, due-date on the formal MRP II system was based on calculated need dates from the current engineering design. Instead of recognizing the problem that they were trying to pursue two ways to conduct business, the employees attempted to reconcile the two due dates. Eventually, the workarounds became intolerable, and within nine months, the PC system and old procedures were abandoned.

Executing-Technical Factors

Although there are human aspects to vendor, control, and performance factors, the authors view these factors as predominantly technical issues that can compromise project execution.

Vendor

As outsiders, vendors usually get the blame for many project problems, and often, they deserve it. Depending on the particular role they play, vendors may be part of other failure factors discussed in this article. Here, the concentration is on vendor problems in conjunction with the request for proposal (RFP). Ironically, the RFP process may result in the very problems that the process was designed to avoid. That is, the RFP may lead to the implementation of inappropriate products or solutions. Through RFPs, vendors are often requested to respond to user requirements through a checklist evaluation. The evaluation will be weighted by the customer to determine the vendor with the highest score.

This method, however, has several flaws. First, vendors are trusted to make the evaluation for themselves and considerable license is usually used with their answers. Customers often try to address this by making the vendor’s response an addendum to the eventual contract. Experience suggests that this approach is not very effective. If not solved during design or conversion, performance problems may never be resolved.

Second, implicit in the checklist evaluation approach is the assumption that a series of very minute and discrete requirements will actually result in a comprehensive high-quality system. It is often surprising how vendors will continue to add features into their system with very little regard to the impact on the integrated whole, as in the following example.

Example 5

A medium-sized construction company performed a very detailed RFP with checklists.

A relatively unknown software vendor came in first on the functional requirements checklist evaluation even though its solution was written in an exotic operating system and programming language. Management ignored a recommendation to weigh the reliability of a more established vendor’s credibility and in stalled base. Problems occurred immediately.

The package promised a multiplicity of features (e.g., the A/R system supported either an open item or balance forward processing variable by customer). However, these features constantly interfered with one another; the total number of options far exceeded any possibility of testing all combinations and permutations. The implementation team knew that the vendor had performed little integration testing. As a result, they had to do the vendor’s debugging as well as a considerable amount of design. The only way the team could make the system workable was to remove large portions of code to reduce testing requirements to a manageable level.

Control

Controls are a vital part of information systems and in particular those that handle financial transactions. Control code in these systems may be greater than the entire application proper, and adding controls to a system after the fact may cost as much as the original system itself. One organization experienced these costs with a general ledger accounting system, as shown in this example.

Example 6

The literature promoting a system advertised its ability to post transactions into any period: past, present, or future. “What if,” stated the literature, “you find an invoice in the bottom of the drawer that is more than a year old? It does happen.”

However, the controls for this complexity were not built into the, system. Transactions could be posted in previously closed months without a corresponding transaction to change the current month’s balance. Miskeying a date could post it into the distant future, omitted from all financial reports until that date arrived. The solution was straightforward but not easy. The application needed a new control record to indicate which months and years were open for posting. Unfortunately, nearly 100 transactions in nine applications had to be changed to check for an open accounting period prior to creating a transaction, and all programs had to be reviewed to check for the possibility of creating unbalanced general ledger transactions.

Performance

Almost every meaningful systems development project has performance problems. If not solved during design or conversion, performance problems may never be resolved. Some examples include a minicomputer-based system with an interface file with seven records that took more than 12 hours to process, and an online shop floor control system with a 45-minute response time for some transactions.

Planning-Technical Factors

This last section targets two factors that may lead to systems failure. They are experimenting with new technology and technical architecture (i.e., designing the system independent of technical considerations).

Experimenting with New Technology

Experimenting with new technologies is not a problem unless managers take a mainstream business systems development project and jeopardize it by experimenting with unproven technology. The following example illustrates this problem with a data base system.

Example 7

In the midst of the design of the large online materials management system, a hardware vendor suggested to a client that an advanced data base system it was working on would solve the client’s data base tuning and performance problems.

This data base system relied on transferring part of the data base search logic to the disk read heads, which would allow it to search an entire disk for unstructured information very rapidly without draining the CPU resources. One of the authors pointed out that it would be useful for unstructured queries, but the application being designed was for designated transactions that knew which data base records they required. The vendor persisted and sent senior designers of the product to the client to argue the case.

Fortunately for the client, the vendor’s own statistics provided evidence that this product would not help the performance of the application and indeed could hinder it significantly. It seemed as more users got on this system and began cueing up unstructured queries, the system degraded exponentially. Although this particular client was spared the expense and distraction of this technical experimentation, another client (in the same city) purchased the system to use it for transaction processing and accessing unstructured queries. These unstructured queries so degraded the transaction processing that a separate machine had to be set up to provide queries on a non- integrated, standalone basis.

Technical Architecture

Not too long ago, it was popular to design systems independent of their technical architecture. The intention was to prevent knowledge of technical details from biasing the best functional solution. However, this does not work well, as shown in the following example.

Example 8

Analysts familiar only with minicomputer architectures were trying to develop an application for a mainframe environment. In this case, a minicomputer architecture would not work because in a mainframe online architecture, information is keyed into the terminal where it is buffered and stored until the user completes a full screen of information. Then, the user presses a send key, and the entire screen’s worth of information is sent as one block to the mainframe.

This initiates the application program long enough to process the incoming message, perform any necessary data base updates, and format and send another screen of information to the user. The application program then effectively terminates. In a minicomputer online architecture, however, the application program is constantly active when the user is at the workstation. Every time the user interacts with the screen, the application program responds.

In one of its late design reviews of this project, management noted that the user design implied a minicomputer architecture; that is, interaction with the CPU after entry of every field. Not only was this design used to create error messages, but also to actually change the layout of the screen. At this point, the analysts refused to change their design and convinced users to purchase a standalone minicomputer. This meant a need less expense to the users and gave them a nonintegrated, standalone system.

Conclusion

Structural and civil engineering made some of their great strides when they systematically studied failed structures and then incorporated lessons from those investigations into their methods, standards, and approaches. The information systems industry is still in the early stages of a similar evolution.

It is as if it is just beginning to investigate collapsing bridges and broadly categorize the failures (e.g., structural problems, weakness in materials, and unanticipated environmental forces such as flooding). Systems failures are commonplace. Heuristic advice about how to prevent systems failure once a project is underway is less common. The 15 project-risk factors, identified on the systems failure risk framework, and the case examples illustrating how each of these factors can con tribute to project failure, are designed to help Information Systems managers understand and control their own systems development projects.

Using this framework as a guide, Information Systems managers can broaden their perspective on the sources of potential problems, and in so doing, prevent some of the unnecessary project failures they currently face.

David McComb is president of First Principles Inc, a consulting firm that specializes in the application of object-oriented technology to business systems. Previous to founding the company, McComb was a senior manager for Andersen Consulting, where for 12 years he managed major systems development projects for clients on six continents. He received a BS degree and an MBA from Portland State University. Jill Y. Smith is an assistant professor of MIS in the College of Business Administration at the University of Denver. She obtained a PhD in business computer information systems from the University of North Texas. She is also a research associate in UNT’s Information Systems Research Center.

Notes

1. J. Rothfeder, “It’s late, costly, incompetent—but try firing a computer system,” Business Week (November 7, 1988), pp 164—165.

2. D. Kull, “Anatomy of a 4GL Disaster,” Computer Decisions (January 11, 1986), pp 58—65.

3. Rothfeder.

4. An Evaluation of the Grand Design Approach to Developing Computer Based Application Systems, Information Resources Management Service, US General Services Administration (September 1988).

Written by Dave McComb

Deplorable Software

August 24, 2021July 22, 2014 by Dave McComb

Why can’t we deploy software as well we did fifty years ago?

The way we build and deploy software is deplorable. The success rate of large software projects is well under 50%. Even when successful, the capital cost is hideous. In his famous “Mythical Man Month,” Frederick Brooks observed that complexity comes in two flavors: essential (the complexity that comes from the nature of the problem) and accidental (the complexity that we add in the act of attempting to solve the problem). He seemed to be suggesting that the essential portion was unavoidable (true) and the larger of the two. That may have been true in the 60’s but I would suggest that most of what we deal with now is complexity in the solution.

Let’s take a mundane payroll system as a case in point. The basic functionality of a payroll system has barely changed in 50 years. There have always been different categories of employees (exempt and non), different types of time (regular, overtime, hazardous, etc.), different types of deductions (before and after tax), different types and jurisdictions of taxes (federal, state, local), various employer contributions (pension, 401K, etc.), and all kinds of weird formulas for calculating vacation accrual. There have always been dozens of required government reports, and the need to print checks or make electronic deposits. But 50 years ago, with tools that today look as crude a flint axe, we were able in a few dozen man months to build payroll systems that could pay thousands of employees. Nowadays our main options are either to pay a service (hundreds of thousands per year in transaction costs for thousands of employees) or implement a package (typically many millions of dollars and dozens to hundreds of person years to get it implemented).

I’m not a Luddite. I’m not pining for the punched cards. But I really do want an answer to the question: what went wrong? Why have we made so little progress in software over the last 50 years?

Written by Dave McComb

Interested in a Solution? Read Dave McCombs, “Software Wasteland”

It Isn’t Architecture Until It’s Built

August 24, 2021July 14, 2014 by Dave McComb

It’s our responsibility as architects to make sure our work is implemented. We’ve been dealing a lot lately with questions about what makes a good architecture, what should be in an architecture, what’s the difference between a technical architecture and an information architecture, etc. But somewhere along the line we failed to emphasize perhaps one of the most important points in this business.

“It isn’t architecture until it’s built.” While that seems quite obvious when it’s stated, it’s the kind of observation that we almost need to have tattooed or at least put on our wall where we won’t forget it. It’s very easy to invest a great deal of time in elegant designs and even plans. But until and unless the architecture gets implemented it isn’t anything at all; it’s just a picture. What does get implemented has an architecture. It may not be very good architecture. It may not lend itself to easy modification or upgrade or any of the many virtues that we desire in our “architected” solutions. But it is architecture.

So the implication for architects is that we need to dedicate whatever percent of our time is necessary to ensure that the work gets implemented. It’s really very binary. You belong to an architectural group. Maybe you are the only architect, or maybe there are five people in your group. In either case, if your work product results in a change to the information architecture the benefits can be substantial. Almost any architectural group could be justified by a 10 or 20% improvement. Frankly, in most shops a 50 to 90% improvement in many areas is possible. So on the one side, if a new architecture gets adopted at all it’s very likely to have a high payback. But the flip side is that a new architecture, no matter how elegant, is not worth anything if it’s not implemented and the company would be acting rationally if it terminated all the architects.

The implication is that as architects we need to determine the optimal blend between designing the “best architecture” and investing our time in the various messy and political activities that ensure that an architecture will get implemented. These range from working through governance procedures to making sure that management is clear about a vision to continually returning to the cost-benefit advantages, etc. The specifics are many, and varied. In many organizations you may be lucky enough that you may not have to invest a great deal of your time to get a new architecture and implement it. Perhaps you’re fortunate enough to have insightful leadership or a culture that is eager to embrace a new architecture. If that’s the case, you might get away with spending 10 or 20% of your time ensuring that your architecture is getting implemented and spend the vast majority on developing, designing and enhancing the architecture itself.

However, if you’re like many organizations, life for the architect will not be that easy. You might find it profitable to spend nearly half your time in activities that are meant to promote the adoption of the architecture. Certainly you should never pass up an opportunity to make a presentation or help goad a developer along. Indeed, the importance is so great that given an opportunity to present you would do well to invest disproportionately in the quality of the presentation, as a perfunctory presentation about the status or a particular technical standard is not likely to move developers or management to adopt it and you may need to return to the theme over and over again.

As someone once pointed out to me, in matters such as this the optimal amount of communication is to over communicate. The rule is when you’re absolutely certain that you’ve communicated an idea so many times, so thoroughly, and so exhaustively that it just is not possible that anyone could tolerate hearing it any more, that’s probably just about right. Experience says that when we think we’ve communicated something thoroughly and repeatedly three-fourths of the audience has still not internalized the message. People are busy and messages bounce off them and need to be repeated over and over and over. And you’ll find that each part of your audience will accept and internalize a message in a different way from a different presentation and at different rates. I’m continually amazed when I get some feedback with a particular stakeholder at some meeting. The coins finally drop and they become advocates for some position we’d taken. In some cases I’ve gone back and realized that we’ve presented it many times to that person and somehow finally one time it took. In other cases we realized that, in fact, we hadn’t presented already it to that individual. We thought we’d covered everyone but they weren’t in certain meetings or we repeat something so many times we think everybody must have heard it and, in fact, that’s not the case at all.

In closing, I’d like to recommend that every architect make a little plaque and put it near their desk that says: “It isn’t architecture until it’s built.” That might help you decide what you are going to do tomorrow morning when you come into work.

Written by Dave McComb

Event-Driven Architecture

August 24, 2021July 14, 2014 by Dave McComb

Event-driven architecture as the latest buzzword in the enterprise architecture space.

If you’ve been reading the trade press lately, you no doubt have come across the term event-driven architecture as the latest buzzword in the enterprise architecture space.

So you dig about to find out just what is this event-driven architecture. And if you dig around a bit, you’ll find that event-driven architecture (EDA) is an application architecture style that is defined primarily by the interchange of real-time or near-real-time messages or events.

Astute readers of this web site, our white papers, attendees at our seminars, and of course our clients, recognize that this is exactly what we have been espousing for years as to what a good service oriented architecture looks like. You may recall our Enterprise Message Modeling architecture that prominently featured publishing. Event analysis was to define key messages being sent from application to application. You may recall our many exhortations to use “publish and subscribe” approaches for message dispatch whenever possible. You may recall us relying on events to populate managed replicated stores for just this purpose.

So, you might ask, why does the industry need a new acronym to do what it should have been doing all along?

First, a bit of history. In the 1960s MRP (Material Requirements Planning) was born. To the best of my knowledge, the first commercial implementation was at the GE Sylvania television plant. The system started from the relatively simple idea that a complex Bill of Material could be exploded and time phased to create a set of requisitions for either inventory parts or purchased parts. But these early systems went considerably beyond that and “closed the loop,” checking inventory, lead times, etc. After the successes of these early systems, a number of packaged software vendors began offering MRP software. However, to meet the common denominator and make the product as simple as possible, these products very often did not “close the loop;” they did not factor in changes in demand to already existing schedules, etc. Then a mini-industry, APICS, the American Production Inventory Control Society, sprang up to help practitioners deal with these systems. What they soon proposed was that these MRP systems needed to be “closed loop.” Sure enough, a few vendors did produce “closed loop” systems. This created a marketing problem. The response was MRPII and a change in the acronym; it now stood for Manufacturing Resource Planning.

“MRPII is what everyone needs.” And most of the education and marketing was about the shortcomings of the earlier MRP systems. Of course, the earlier MRP systems were, for the most part, just bad implementations, not something that was more primitive, in the way that we look at Paleolithic art.

And so it is with SOA. Apparently what has happened is that the Web services movement has become associated with service oriented architecture. However, most practitioners of Web services are comfortable using Web services as a simple replacement for the remote procedure call (RPC). As a result, many organizations are finding their good intentions of SOA being sucked down into a distributed request/reply environment, which is not satisfying the issues the architecture was meant to address. Nor is it delivering on the promises of the architecture: loose coupling, and the commoditization of shared services.

Perhaps it’s inevitable we’ll have to deal with new acronyms like EDA. But if you’ve been tuned in here for a while, think of EDA as SOA done right.

Written by Dave McComb

Strategy and Your Stronger Hand

August 24, 2021July 11, 2014 by Dave McComb

Those of us in the complex sale sector need to be aware that volume operations from adjacent marketplaces will soon enter ours.

The December 2005 issue of the Harvard Business Review has excellent articles by two of my favorite business authors, Geoffrey Moore (“Strategy and Your Stronger Hand“) and Clayton Christiansen (“Marketing Malpractice: The Cause and the Cure,” which is applicable as we start looking at commercializing Semantic Technology).

Moore’s article has many fresh insights; chief among them is that companies have a dominant business model. The model does not depend on the industry they are in, nor their age or size. He likens this to our dominant “handed-ness” and as the editor pointed out on the editorial page, “It’s easier to convert a shortstop into an outfielder than it is to change a southpaw into a righty.”

Some firms’ dominant model is “volume operations” and for others it is “complex systems.” The first relies on many customers, brands, advertising, channels and compelling offers. The latter relies on targeted customers and the integration of third party products into total solutions. For each the grass often looks greener in the other model, but almost no business succeeds when they attempt to change models.

The rhythm of most high tech sectors is that the complex sale companies forge new territories and solve unique customer problems. The volume companies come in later and try to commoditize the solution. To survive, the complex sale companies need to do two things simultaneously: defend, for as long as possible, the position they have already won, and move up the solution chain and incorporate the newly commoditized components into an even more interesting solution.

The one thing they need to avoid is trying to convert their own early wins into volume opportunities. What does this have to do with semantics? We are just beginning the commercial roll out of this technology. We will have all the fits and starts of any new high tech sector. We have an opportunity to be a bit more self aware.

Those of us in the complex sale sector need to be aware that volume operations from adjacent marketplaces will soon enter ours. We need to be continually vigilant about incorporating rather than competing, and moving on up the solution chain. Consumers of this technology have the opposite challenge: how to recognize which aspects of their problems require “complex” solutions and which aspects are ripe to be solved with “volume” solutions.

The Zachman Framework

August 24, 2021July 11, 2014 by Dave McComb

Shortly after you start your inquiry about software architecture, or enterprise architecture as it is often called, you will come across the Zachman Framework.

The Zachman Framework is a product of John Zachman who has been championing this cause for at least 15 years, first with IBM and then on his own. As with so many things in this domain, we have an opinion on the Zachman Framework and are more than willing to share it.

What Is the The Zachman Framework?

First though, let’s describe just what the Zachman Framework is. John Zachman believes, as we do, that software is a human created artifact of large-scale and as such we may learn a considerable amount from the analogy between software and other large-scale artifacts of human creation. In the early days of software development, there were many who felt that perhaps software was a creative activity more akin to writing or artistry or perhaps craftsmanship on a small-scale.

However, this idea has mostly faded as the scale and interconnectedness of the pieces has continued to increase. Some of John’s early writings compared software or enterprise architecture to the architectures needed to support airframe manufacturing, aircraft, rockets and the like. His observation was that in order to deal with the complexity of the problem in these domains, people have historically divided the problem into manageable pieces. The genius of John’s approach and his framework has been in the orthogonality of the dimensions of which he divided the framework.

The Zachman Framework is displayed as a matrix. However, John is careful to point out that it is not a “matrix” but a “schema” and should not be extended or modified, as in his belief it is complete in its depth and breadth.

The orthogonal dimensions referred to above are shown in rows and columns here. In the rows are the points of view of the various stakeholders of the project or the enterprise. So for instance, at the highest level is the architecture from the point of view of the owner of the project or the enterprise and as we transform the architecture through the succeeding rows we gradually get to a more and more refined scope, as would be typical of the people who need to implement the architecture or eventually the products of the architecture. In a similar way, the columns are orthogonal dimensions and in this case, John refers to them as the interrogatories So, each column is an answer to one of Rudyard Kipling’s six able servants: who, what, when, where, how, and why. Each column is a different take on the architecture, so for instance, the “what” column deals with information, the “things”about which the system is dealing.

In the aircraft analogy, it would be the materials and the parts. Likewise, the “how” column refers primarily to functions or processes; the “where” to the distribution of networking; the “when” to scheduling cycle times and workflow; the “who” to the people and organizations involved in each of the processes; and the “why” to strategy and eventually down to business rules.

Behind this framework then are models which allow you to describe an architecture or any artifact; a specific design of a part of a product or database table, or whatever, within the domain of that cell. Many people have the misperception that at the high-level there is less detail and that you add detail as you come down the rows. As John is very fond of pointing out, you need “excruciating” detail at every level and what is occurring at the transition from row to row is the addition of implementation constraints that make the thing buildable.

John has been a tireless champion of this cause, and from that standpoint we have him to thank for pointing out that this is an issue, and furthermore for championing it and keeping it in the forefront of discussion for a long, long period of time. He’s been instrumental in making sure that senior management understands the importance and the central role of enterprise architecture.

What the Zachman Framework Is Not

At this point though, we need to point out that the Zachman Framework is not an architecture. And the construction of models behind the framework is not, in and of itself, an architecture. It may be a way to describe an architecture, it may be a very handy way for gathering and organizing the input you need into an architectural definition project, but it is not an architecture nor is it a methodology for creating one. We believe the framework is an excellent vehicle for explaining, communicating, and understanding either a current architecture or a proposed architecture.

However, it is our belief that a software architecture, much like a building architecture or an urban plan, is a created and designed artifact that can only be described and modeled after it has been created and that the act of modeling it is not the act of creating it. So in closing, to reconcile our approach with the Zachman Framework we would say that firstly, we have a methodological approach to creating enterprise software architecture. Secondly, we have considerable experience in actually performing this and creating architectures that people have used to develop and implement systems. Thirdly, these architectures that we have designed/developed can be modeled, described, and communicated using the Zachman Framework. But that does not mean that they were, or in our opinion, even could be created through a methodological modeling process as suggested by the Zachman Framework.

Legacy Systems

Legacy Modernization

The Lost Opportunity

Why no one will take you seriously

Tacking

Leg 1 – ETL to a Graph

Leg 2 – Architecture MVP

Leg 3 — Simple new Functionality in the Graph

Leg 4 – Understand the Legacy System and its Environment

Leg 5 – Become the System of Record for some subset

Leg 6 – Replace the dependencies incrementally

Leg 7 – Lather, Rinse, Repeat

Conclusion

Non- human scale software systems

Where does the “scale” kick in?

Leveling the playing field

The Enterprise Data Model: Not Human Scale

Three approaches to taming the complexity

Divide and Conquer

Useful Abstraction

Just in Time Knowledge

A Playbook you Don’t want to Follow

An Evaluation of Risk Factors in Large Systems Engineering Projects

Framework to Identify System Failure Factors

Planning Factors

Available Budget

Level of Effort

Bottom-Up Approach

Comparable Case

Executive Mandate

Studies have shown that in many cases, assigning more staff to a project will not serve to compress the project completion timetable.

Example 1

Compression

Planning-Human Factors

Incorrect Staffing

Example 2

Staff Turnover

Scheduling

Executing-Human Factors

Feedback

User Involvement

Example 3

Motivation

Executing Factors

Change Management

Example 4

Workarounds

Executing-Technical Factors

Vendor

Example 5

Control

Example 6

Performance

Planning-Technical Factors

Experimenting with New Technology

Example 7

Technical Architecture

Example 8

Conclusion

Notes

Why can’t we deploy software as well we did fifty years ago?

What Is the The Zachman Framework?

What the Zachman Framework Is Not

Contact Us

Learn More