IT Infrastructure As a House of Cards
snydeq writes "Deep End's Paul Venezia takes up a topic many IT pros face: 'When you've attached enough Band-Aids to the corpus that it's more bandage than not, isn't it time to start over?' The constant need to apply temporary fixes that end up becoming permanent are fast pushing many IT infrastructures beyond repair. Much of the blame falls on the products IT has to deal with. 'As processors have become faster and RAM cheaper, the software vendors have opted to dress up new versions in eye candy and limited-use features rather than concentrate on the foundation of the application. To their credit, code that was written to run on a Pentium-II 300MHz CPU will fly on modern hardware, but that code was also written to interact with a completely different set of OS dependencies, problems, and libraries. Yes, it might function on modern hardware, but not without more than a few Band-Aids to attach it to modern operating systems,' Venezia writes. And yet breaking this 'vicious cycle of bad ideas and worse implementations' by wiping the slate clean is no easy task. Especially when the need for kludges isn't apparent until the software is in the process of being implemented. 'Generally it's too late to change course at that point.'"
In most organizations, the IT department is treated as pure cost instead of something that provides strategic value. These IT departments have no chance of getting a budget approved that will allow them to "start over" on any part of their implementation; hence the constant onslaught of temporary fixes and patches.
"It's a reverse vampire...they....they crave the sun!"
...but I believe in Duct Tape.
As long as your backup and tertiary machines have different kludges keeping them running, there's no problem...
Wait, you mean there have been newer and faster processors released since then? So Mordac really has been hiding something from me...
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
This the essence of technical debt. Whether you're programming or deploying IT infrastructure, it's inescapable that sometimes you're going to have to include kludges to work around edge conditions, a vocal 1% of your users, or whatever. These kludges are eyesores, and fragile, but they're also as far as you could go with the time and budget you had.
Sometimes, accruing debt like this enhances your liquidity and ability to respond to change, so avoiding all kludges introduces other more obvious costs that slow you down and make you seem unresponsive to users or customers. But you can't just go on letting your debt grow all the time and not eventually come up technically bankrupt. Let it grow when you have to, but just as importantly make time to pay it down. A lot of this stuff can be paid down a little at a time, as you come across it a few months later. The pay-off if you're vigilant is that the next ridiculously urgent fix to that system can often be handled much more easily, without dipping down further... with patience and attention to maintaining this balance, you can reduce your technical debt and make the whole system hum.
The downside is that there isn't a quick fix when you find yourself deep in technical debt. You can't just spend all your time reducing it; your highest aspiration at that point should be maintaining the level of technical debt, rather than letting it grow, but it's generally been my experience that altering the curve of debt growth even a little can set you on the right path.
--Matthew
The problem with the whole idea of "if we only had enough documentation and change control" is that it becomes a non-trivial event to actually read through the documentation. Let's take an imaginary system that's been in production for 5 years...assume every last drib and drab of change has been documented...now you've got a 2000 page document and several hundred change records that tell you *everything*. Except, when it comes right down to it, mastering that 2000 pages of documentation and all the changes made afterwards is a months if not years long project - hardly effective for dealing with production problems that need to be solved in minutes or hours.
The illusion being perpetrated here is that people are interchangeable, and if you just have enough documentation, you can replace Mr. Jones with 20 years of hands on experience with the system with Mr. Vishnu living in Bangalore (or even Mr. Smith in the next cube, for that matter), with a net cost savings.
Now, I'm not saying documentation is a bad thing -> lord knows, it helps to have a knowledge base you can search...but knowing what to search for is knowledge you only get by real world experience with maintaining a production system. This is not digging ditches, boys and girls, this is skilled, if not essentially artistic labor.
Simply put, people matter more than process.