How to fix a humongous computer system
Just to show you how much hot water Healthcare.Gov is in, I present an anecdote from a previous life as a (Cobol/DB2) programmer.
Prerequisite: a functioning system that processes 100s million of lines of data every night in batch.
Change Control: the system leaks $$$ or a new functionality is desired (both real world examples).
Stage 1 – Unit
This is the fun part, and also the shortest. The programmer happily writes new code, and feeds it a small, customized data file to see if does its job.
Stage 2 – Integration
Next, he imitates the programs just before and after his new code in the production stream, to make sure that it still functions as specified.
Stage 3 – System
Now, he imitates the whole shootin' match with his new bits included. He runs it against production data, and compares the result to the existing production system. You are looking to make sure that he has not broken the system, there are no data integrity issues, and that it really has the desired effect (I have seen all these types of failures in various projects I have been involved with).
Stage 4 – Acceptance
Now, the project manager presents the finished product to the business user who commissioned the changes in the first place for final approval. It is not uncommon for the user, a couple of days later, to present a page of other changes that are desired. This is where most projects run into problems. Yes, these changes can certainly be done, but you have to restart the clock all over again. You absolutely cannot just slip in these few changes at the last minute and still meet the original deadline.
Step 5 – Installation
You are done, and everyone gets treated to cheap Chinese or pizza to celebrate.
This entire process, if all goes well, takes 6 months end-to-end for something relatively straight forward.
Upon occasion, I was called on to do emergency repairs to a system written in object-oriented (OO) code (Forte). Being a linear programmer, I was totally not qualified, but it was necessary when the company lays off all its OO programmers. I learned that if you change one class, the changes ripple out to all other dependent classes, kinda like tossing a pebble into a pond. It propagates out in all directions, including areas you did not even know existed: at the other end, you could easily have created a tidal wave. Some of the changes I made where deeply buried in several layers of OO classes.
The problem here is that the website front-end and middleware that connects to the database is written in OO code, Java and OracleForms to name just 2 popular choices. Not only does not the system work in the first place, many parts are not yet constructed. Add that to the job of normalizing OO code, and you have a task that is several orders of magnitude more complicated than anything I have ever done. At one point, I was involved in the clean-up effort to iron out the kinks of a brand new Cobol system that simply did not work. In the end, it took 3 years to get running correctly.