Slashdot Mirror


IT Infrastructure As a House of Cards

snydeq writes "Deep End's Paul Venezia takes up a topic many IT pros face: 'When you've attached enough Band-Aids to the corpus that it's more bandage than not, isn't it time to start over?' The constant need to apply temporary fixes that end up becoming permanent are fast pushing many IT infrastructures beyond repair. Much of the blame falls on the products IT has to deal with. 'As processors have become faster and RAM cheaper, the software vendors have opted to dress up new versions in eye candy and limited-use features rather than concentrate on the foundation of the application. To their credit, code that was written to run on a Pentium-II 300MHz CPU will fly on modern hardware, but that code was also written to interact with a completely different set of OS dependencies, problems, and libraries. Yes, it might function on modern hardware, but not without more than a few Band-Aids to attach it to modern operating systems,' Venezia writes. And yet breaking this 'vicious cycle of bad ideas and worse implementations' by wiping the slate clean is no easy task. Especially when the need for kludges isn't apparent until the software is in the process of being implemented. 'Generally it's too late to change course at that point.'"

22 of 216 comments (clear)

  1. All comes down to budget by Admodieus · · Score: 5, Informative

    In most organizations, the IT department is treated as pure cost instead of something that provides strategic value. These IT departments have no chance of getting a budget approved that will allow them to "start over" on any part of their implementation; hence the constant onslaught of temporary fixes and patches.

    --
    "It's a reverse vampire...they....they crave the sun!"
    1. Re:All comes down to budget by Opportunist · · Score: 4, Insightful

      Budget and the lack of ability to see ahead, on the side of the decision makers.

      Far too often decision makers are not the people who also have to suffer, I mean work with the tools they bought. They are often easily swayed by a nifty presentation from a guy who doesn't know too much either but promises everything, and of course the ability to cut cost in half, if not more, so they buy. Only to find out that the solution they bought is not suitable to the problem at hand. And then the bandaids start to pop up.

      --
      We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    2. Re:All comes down to budget by Megaweapon · · Score: 4, Insightful

      They are often easily swayed by a nifty presentation from a guy who doesn't know too much either but promises everything, and of course the ability to cut cost in half, if not more, so they buy.

      If you've worked in a huge shop, you know that the big software vendors send reps out to IT managers for golf outings and the like. Screw it if the software works or not, just fluff up the guy with the budget rubber stamp.

      --
      I'm sure "SlashdotMedia" will improve on all the wonders that Dice Holdings blessed us all with
    3. Re:All comes down to budget by eln · · Score: 5, Informative

      The problem is not with kludges themselves, but with the fact that IT management does not stress documentation and proper change control procedures enough. If a kludge works, is documented, was implemented with proper change controls, and can be repeated, is it really a kludge anymore? IT has to screw around with stuff to make it work, that's what they (we) get paid for. If all we ever had to do was click on an install button and have everything work perfectly from there, what would be the purpose of an IT department at all? Off-the-shelf software and hardware can never be made to work perfectly for everyone's requirements. IT folks are paid to get non-unique components to work for unique requirements.

      The problem is not with these fixes, it's that nobody ever documents what they did, and documentation is not readily available when needed. So, these kludges become tribal knowledge, and people only know about them because they were around when they were implemented or they've heard stories. When this happens, these wacky fixes can come back and bite you in the ass later when something mysteriously crashes and no one can get it to work like it did because nobody remembers what was done to make it work before. As people come and go, and institutional knowledge of older systems slowly erodes, we end up in a situation where everyone thinks the current system is crap, nobody knows why it was built that way, and everyone figures the only way out is to nuke the site from orbit and start over. The trick is keeping it from getting to that point.

      Of course, nobody likes jumping through all these hoops like filing change control requests or writing (and especially maintaining!) documentation, so it gets dropped. IT management is more worried about getting things done quickly than documenting things properly, so there's no incentive for anyone to do any of it. Before long, you get a mass of crap that some people know parts of, but nobody knows all of, and nobody knows how or where to get information about any of it except by knowing that John Geek is the "network guru" and Jane Nerd is the "linux guru".

      We will never get hardware and software that works together exactly the way we want them to. We will always have to tweak things to get them to work right for us. Citing lack of budgets or bug-ridden software may be perfectly valid, but those problems are never really going to be solved. Having our own house in order does not mean fixing all the bugs or being able to refresh our technology every 6 months. Having our own house in order means we know exactly what we did to make each system work right, we can repeat what we did, and everyone knows how to find information on what we did and why.

    4. Re:All comes down to budget by Vellmont · · Score: 4, Insightful


      If a kludge works, is documented, was implemented with proper change controls, and can be repeated, is it really a kludge anymore?

      Yes.

      You've either don't know what a kludge is, or don't have enough ability to see how fixing things or implementing something the wrong way can really be a horrible mistake that feeds on itself and creates other mistakes. Kludges aren't something you can simply document around. The rest of your post isn't really worth responding to, since it makes the false assumption that kludges are simply poorly documented behavior. If that's the worst you've seen, you're lucky.

      --
      AccountKiller
    5. Re:All comes down to budget by HangingChad · · Score: 4, Insightful

      the IT department is treated as pure cost instead of something that provides strategic value.

      I can't count the times I've gone in somewhere and saw major deficiencies in their IT infrastructure. I mean really bad, O-M-G size problems. And when you point them out they act like you're trying to pad your billing. Just fix whatever isn't working that day. One of them was a doctors office.

      Imagine if their patients acted that way. I don't care if I have cancer, just remove that lump in my underarm.

      That's what you get when the problem is dictating the solution.

      --
      That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage
    6. Re:All comes down to budget by Cryacin · · Score: 4, Funny

      Yeah, that's why the sane firms have rules on accepting gifts.

      Yes, and both of them have never looked back!

      --
      Science advances one funeral at a time- Max Planck
  2. I don't believe in a lot of things by Culture20 · · Score: 5, Funny

    ...but I believe in Duct Tape.
    As long as your backup and tertiary machines have different kludges keeping them running, there's no problem...

  3. As a non-developer, this is what I see by Em+Emalb · · Score: 4, Insightful

    Maintaining code is boring.

    Everyone wants to work on the latest and greatest stuff, no one wants to maintain or even release patches.

    It sucks, especially since it isn't limited to just software development.

    I've seen companies where their "core switch" was a Cisco 2548. This wasn't 10 years ago, this was last year! Unreal.

    --
    Sent from your iPad.
    1. Re:As a non-developer, this is what I see by drachenstern · · Score: 4, Interesting

      As a dev, what's the problem with a 24 port gigabit switch as the "core" on a medium sized office? Aside from the fact that 10Gb is becoming popular (has become popular?) in the datacenter? Most desktops are only at the 1Gb level (and most users at below 100Mb), and most inbound internet pipes are much smaller. I don't understand the downfall here.

      Can you elaborate?

      --
      2^3 * 31 * 647
    2. Re:As a non-developer, this is what I see by JerkBoB · · Score: 4, Interesting

      As a dev, what's the problem with a 24 port gigabit switch as the "core" on a medium sized office?

      If all you've got is 24 hosts (well, 23 and an uplink), then it's fine. I suspect that the reality he's alluding to is something more along the lines of multiple switches chained together off of the "core" switch. The problem is that lower-end switches don't have the fabric (interconnects between ports) to handle all those frames without introducing latency at best and dropped packets at worst. For giggles, try hooking up a $50 8-port "gigabit" switch to 8 gigabit NICs and try to run them all full tilt. Antics will ensue... The cheap switches have a shared fabric which doesn't have the bandwidth to handle traffic between all the ports simultaneously. True core switches are expensive because they have dedicated connections between all the ports (logically, if not physically... I'm no switch designer), so there's no fabric contention.

      --
      A host is a host from coast to coast...
      Unless it's down, or slow, or fails to POST!
  4. Don’t patch bad code - rewrite it by D4C5CE · · Score: 4, Interesting

    Don’t patch bad code – rewrite it.

    Kernighan & Plauger
    The Elements of Programming Style
    2nd edition, 1974 (exemplified in FORTRAN and PL/1!)

    1. Re:Don’t patch bad code - rewrite it by eggoeater · · Score: 4, Insightful

      I couldn't agree more, but that's very expensive and very very dangerous. Why? Two factors:
      1. Rewriting means rethinking; most legacy code is functional and is usually rebuilt in OOP. Whenever you rethink how something works it tends to change the entire behavior to say nothing of all the new bugs you'll have to hunt down. You're customers will definitely notice this.

      2. Scope creep!! Rebuilding it? Why not throw in all that cool functionality we've been talking about for the past 10 years but couldn't implement because the architecture couldn't handle it. You get the idea.

      Want an example? Netscape 5

  5. Written for a P-II 300Mhz? by damn_registrars · · Score: 5, Funny

    Wait, you mean there have been newer and faster processors released since then? So Mordac really has been hiding something from me...

    --
    Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
  6. Re:Take responsibility and stop the magical thinki by FooAtWFU · · Score: 4, Interesting

    "they simply fail to properly advise the units that are making decisions of the cost and consequence of such a short-sighted approach."

    In the defense of IT, those people they're trying to advise aren't always the best at taking advice. (But then again, neither are IT admins always the best at giving it.)

    --
    The World Wide Web is dying. Soon, we shall have only the Internet.
  7. pay off your credit cards? by Matthew+Weigel · · Score: 5, Informative

    This the essence of technical debt. Whether you're programming or deploying IT infrastructure, it's inescapable that sometimes you're going to have to include kludges to work around edge conditions, a vocal 1% of your users, or whatever. These kludges are eyesores, and fragile, but they're also as far as you could go with the time and budget you had.

    Sometimes, accruing debt like this enhances your liquidity and ability to respond to change, so avoiding all kludges introduces other more obvious costs that slow you down and make you seem unresponsive to users or customers. But you can't just go on letting your debt grow all the time and not eventually come up technically bankrupt. Let it grow when you have to, but just as importantly make time to pay it down. A lot of this stuff can be paid down a little at a time, as you come across it a few months later. The pay-off if you're vigilant is that the next ridiculously urgent fix to that system can often be handled much more easily, without dipping down further... with patience and attention to maintaining this balance, you can reduce your technical debt and make the whole system hum.

    The downside is that there isn't a quick fix when you find yourself deep in technical debt. You can't just spend all your time reducing it; your highest aspiration at that point should be maintaining the level of technical debt, rather than letting it grow, but it's generally been my experience that altering the curve of debt growth even a little can set you on the right path.

    --
    --Matthew
  8. like bubblegum under a desk... by Thud457 · · Score: 4, Insightful

    There's nothing more permanent than a temporary fix.

    --

    the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff

  9. Re:Kludges are short-time fixes and long-time prob by Grishnakh · · Score: 4, Insightful

    It doesn't look like "doing it right the first time" is an option here. RTFA. They're talking about vendor applications being crappy and crufty, and IT departments being required to support them. The IT department didn't pick the app, and isn't allowed to not support it. They can't switch to another app (usually apps like this have little or no competition, and they're probably locked-in anyway).

    So there's really nothing they can do but complain as long as they're required to support some shitty application on the latest version of Windows, as these are the requirements set down by upper management.

  10. Re:Take responsibility and stop the magical thinki by Bunny+Guy · · Score: 4, Insightful

    I'm going to tackle some of the conceptual problems that are hinted at above, which is usually where the difficulties lie, usually in trying to use the wrong software and expecting to somehow "make everything better" if you just make it work "my way" - the true "Magical Thinking".

    I tend to agree with your conclusions, "wipe the slate clean" is a drastic action. I disagree with some of the approach you use to arrive at them:

    a.) Problems are solved by people being invested in solving them, not process. This requires the antithesis of "Units" - Ownership; Ownership in the company, Ownership of the mission, and a direct heart felt connection to the success of the company. Until you have staff, from the CEO down, that own problems, from the mess in the coffee room to server down time, you will have a "business house of cards" no matter how good the process. In fact, most of the time, fixing things involves re-writing and/or reconsidering process - usually starting with asking the question - "Do we really need that?"

    b.) Sometimes you really do have a train wreck on your hands. If you have mastered a.) b follows almost effortlessly, because now, you can *talk* about this behemoth that is eating your company and everybody sees the discussion for what it is, not empire building or managerial fingerprinting.

    when you run into a train wreck - assess your tech problem - is the fix easily found? Are your processes using the software at cross purposes? if so, which is cheaper to fix? No amount of bug fixing will repair using the wrong software. It won't even fix using the right software in the wrong way.

    In the end, re-asses often, be frugal, not cheap, if it truly is a requirement to run your business, buy the most appropriate. If you've made the mistake of buying a Kenworth long hauler when you needed 3 old UPS trucks - admit it, sell it back, take your loss and get what you really need.

    Thats not "magical thinking" it's just common sense.

  11. The meaning of Quality by bartwol · · Score: 4, Insightful

    More than any other type, businesses are run by salesmen. These are people whose strongest attributes are the ability to build relationships, to communicate value, and a strong inclination to increase their personal wealth.

    Increasingly, the stuff salesmen sell is based on complex technologies that, really, are beyond the reach of their comprehension. They kind of understand the products they sell, but really, they don't. If the world only had salesmen, there wouldn't be any sophisticated products.

    Say hello to the engineer...a person who builds products. His strongest attributes are a desire to solve problems, a willingness to absorb the tedious but essential details needed to build a complex system, and a personality that derives gratification from doing so.

    We now begin the business cycle. The salesman says, "Build me something I can sell."

    The engineers says, "I will build you something that works well."

    And therein begins a lifetime of the two, symbiotically, talking past each other. The engineer serves the salesman, and the salesman serves himself. But make no mistake about it: the salesman is in control.

    For a salesman, QUALITY means it works well enough for him to sell more, and most importantly, to make more money for himself. For an engineer, QUALITY means it works reliably and efficiently. To be sure, QUALITY is an abstract and moving target that varies according to the eyes of the beholder. But to understand why we have the predicament described in this article, we need only understand the SIGNIFICANCE OF QUALITY TO A SALESMAN.

    I would continue to expound, but then, most readers here need only reflect on their already frustrated pasts to understand the mechanics of this convenient but often vacuous relationship.

  12. Just how much documentation can you read? by hsthompson69 · · Score: 5, Insightful

    The problem with the whole idea of "if we only had enough documentation and change control" is that it becomes a non-trivial event to actually read through the documentation. Let's take an imaginary system that's been in production for 5 years...assume every last drib and drab of change has been documented...now you've got a 2000 page document and several hundred change records that tell you *everything*. Except, when it comes right down to it, mastering that 2000 pages of documentation and all the changes made afterwards is a months if not years long project - hardly effective for dealing with production problems that need to be solved in minutes or hours.

    The illusion being perpetrated here is that people are interchangeable, and if you just have enough documentation, you can replace Mr. Jones with 20 years of hands on experience with the system with Mr. Vishnu living in Bangalore (or even Mr. Smith in the next cube, for that matter), with a net cost savings.

    Now, I'm not saying documentation is a bad thing -> lord knows, it helps to have a knowledge base you can search...but knowing what to search for is knowledge you only get by real world experience with maintaining a production system. This is not digging ditches, boys and girls, this is skilled, if not essentially artistic labor.

    Simply put, people matter more than process.

  13. Re:I was torn between modding this up and commenti by gillbates · · Score: 4, Insightful

    Everyone wants to work on the latest and greatest stuff, no one wants to maintain or even release patches.

    I don't really know how to address this, except by the people who think they are going to be the next great video game designer remaining unemployed.

    Here's how you address it: you hire one of those 9 out of 10 CS graduates who "Just got in it for the money". Had you offices in the Midwest, you'd have no problem finding programmers whose only ambition is to crunch out brain-dead code until they can move into management. Trust me, I work with these people and they're even worse than the primadonnas interested only in the "cool" things. Naturally, not everyone can be the next game programmer, or work on cool things, but you probably don't want to hire those whose only ambition is to do the grunt work.

    Typically, the primadonna has to have his ego coaxed into doing the grunt work. But you can usually count on him to do it fast, and not to make a total mess of things. Granted, some people have a higher estimation of their abilities than their peers. But at least someone passionate about coding can be inspired to improve their code; they'll actually accept coding standards once reasonably explained. But here's a short list of problems with the typical "career type":

    1. Because they don't have the intelligence or the initiative to do things right, they'll happily plod along, even when the given design can't possibly work, or can't be delivered on time. And when it does fail, rather than trying to understand *why* it failed, or *what* they could do differently next time, they blame their coworkers/subordinates, etc.
    2. They are more sensitive to the political implications than the technical ramifications of their decisions. Consequently, they'll often run with an inefficient, or sometimes even incorrect design so as to placate their superiors. And once again, the blame always lies with *someone else*.
    3. And speaking of blame, they'll frequently blame others when things go wrong, and even sometimes when they don't. There are *certain people* at the office around whom I can't have a technical discussion with coworkers because they understand neither about what we are talking, nor that such conversations are a normal part of the job. I've actually been reprimanded for discussing architectural decisions, because "we've already decided on the architecture..." Which is great, but the fact that you've decided doesn't help me understand it better. Supposedly, we're all mind readers here, and no discussion is necessary.
    4. The career types usually promise unrealistic deadlines, and write horribly unmaintainable code. After all, writing code is just a stepping stone into management, and maintaining that code will soon become *someone else's* problem, not theirs...
    5. And perhaps the worst part is that they have a corrosive effect on teamwork and morale. With a politician in the office, *no one* wants to do the grunt work out of fear that it will adversely affect their career.

    It's easier to convince a rock-star programmer that documentation is necessary than it is to convince the career-track political programmer that a race condition is a problem, that architecture matters, that maintainability and scalability are important. Just the other day, I had a department manager question the value of writing reusable code - in fact, he was so hostile as to suggest that it wasn't worth our time to make code reusable... (And not only that, but reported to my boss that my suggestion otherwise was "distracting to what we're trying to accomplish here"...)

    I know the starry-eyed programmers can be a handful at times, but those indifferent to technical issues will lay a minefield in your company. Suddenly, years after they've moved on, you'll find your new hires telling you the projects they built aren't worth salvaging, that you'll have to start over, etc... I've seen these types move into management and turn an otherwise fun profession into a death march. You don't want the stupid, or the political, types of people writing code. They'll set your company up for failure every time.

    --
    The society for a thought-free internet welcomes you.