We’ve been working on something we call “SemOps” (like DevOps but for Semantic Technology + IT Operations). The basic idea is how can we create a pipeline to go from proposed enterprise ontology or taxonomy enhancements to “in-production” as frictionlessly as possible.
As so often happens, when we shine the Semantic Light on a topic area, we see things anew. In this very circuitous way, we’ve come to some observations and benefits that we think will be of interest even to those who aren’t on the Semantic path.
DevOps for Data People
If you’re completely on the data side, you may not be aware of what developers are doing these days. Most mature development teams have deployed some version of DevOps (Software Development + IT Operations) along with CI/CD (Continuous Integration / Continuous Deployment).
To understand what they are doing it helps to harken back to what preceded DevOps and CI/CD. Once upon a time, software was delivered via the waterfall methodology. Months or occasionally years would be spent getting the requirements for a project “just right.” The belief was that if you didn’t get the requirements right up front, the cost to add even a single new feature would cost 40 times what it would cost if the requirement were identified up front. It turns out there was some good data on this cost factor, and it still casts its shadow any time you try to make a modification to a packaged enterprise application, 40 x is a reasonable benchmark compared to what it would cost to implement that feature outside the package. This as a side note is the economics that creates the vast number of “satellite systems” that seem to spring up alongside large packaged applications.
Once the requirements were signed off on, the design began (more months or years) then coding (more months or years) finally systems testing (more months or years). Then the big conversion weekend, the system goes into production, tee shirts are handed out to the survivors and the system becomes IT Operations problem.
There really was only ever, one “move to production” and few thought it worthwhile to invest the energy in making this more efficient. Most sane people, once they’d stayed up all night on a conversion weekend, were loath to sign up for another, and it certainly didn’t occur to them to find out a way to make it better.
Then agile came along. One of the tenets of agile was that you always had a working version that you could, in theory, push to production. In the early days it wasn’t that people were pushing to production on any frequent schedule, but the fact that you always could was a good discipline to avoid technical debt and straying off building hypothetical components.
Over time, the idea that you could push to production became the idea that you should. As people invested more and more in their unit testing and regression testing, and pipelines to move from dev to QA to production, people became used to the idea of pushing small incremental changes into production systems. That was the birth of DevOps and CI/CD. In mature organizations like Google and Amazon, new versions of their software are being pushed to production many times per day (some reports say many times per second, but this may be hyperbole).
The reason I bring it up is because there are some things in there that we expect to duplicate with SemOps, and some that we already have with data (as I was writing this sentence, I was tempted to write “DataOps” and I thought: “is there such a thing?”) A nanosecond of googling later and I found this extremely well written article on the topic from our friends at DataKitchen. They are focusing more on the data analytics part of the enterprise, which is a hugely important area. The points I was going to make were more focused on the data source end of the pipeline, but the two ideas tie together nicely.