Slashdot Mirror


New Virginia IT Systems Lack Network Backup

1sockchuck writes "Virginia's new state IT system is experiencing downtime in key services because of a mind-boggling oversight: the state apparently neglected to require network backup in a 10-year, $2.3 billion outsourcing deal with Northrop Grumman. The issue is causing serious downtime for state services. This fall the Virginia DMV has suffered 12 system outages spanning a total of more than 100 hours, and downtime hampered the state transportation department when a state of emergency was declared during the Nov. 11 Northeaster."

14 of 211 comments (clear)

  1. Blame Northrop? by betterunixthanunix · · Score: 3, Insightful

    In my experience, it is rare for a customer, even with professional IT staff, to properly specify their needs when it comes to technology. Why did Northrop, which presumably has experience in government systems, not design backups?

    --
    Palm trees and 8
    1. Re:Blame Northrop? by eht · · Score: 5, Insightful

      Likely they were told they should have a backup, quoted a price, and said nah, we will be fine.

    2. Re:Blame Northrop? by skgrey · · Score: 4, Insightful

      And not just backups, it sounds like they had no BCP plan at all. This is a massive oversight, but a fairly common one. I've consulted for a number of years, and it's amazing at how many companies don't have a BCP plan at all, and sometimes it includes simple backups of data.

      The companies where I've seen this basically do a risk assessment and say "well, we are willing to accept the risk of downtime because BCP is too costly". Unfortunately they don't weigh the chance of an outage or disaster appropriately, and then find themselves severely screwed when a tornado, storm system, or fire occurs, and then they are either out of business (in a small company) or take enough of a hit to make a headline on Slashdot and cripple the business.

      Seriously, when are companies going to realize that this is a critical component of IT? I've felt like I've talked till I was blue in the face about this over the years.

    3. Re:Blame Northrop? by mcgrew · · Score: 4, Insightful

      Why did Northrop, which presumably has experience in government systems, not design backups?

      Because they didn't have to. It wasn't in the contract, so they're not going to spend the money doing it. They're not in business to keep the state government afloat, their only purpose is to make money.

      If you don't properly specify your needs, that's your fault. Don't rely on corporate good will, because there is no such thing.

    4. Re:Blame Northrop? by Eivind · · Score: 3, Insightful

      True enough. But as you say, Northrop is in the business of making money, so it would've made sense for them to do the following:

      * Deliver a offer for the system requested.
      * Get the deal signed
      * Say: We notice you've not specified any backup, do you want that additionally ?

      Gives them a chance to upsell, AND potentially makes the customer happier -- a win-win.

    5. Re:Blame Northrop? by WinterSolstice · · Score: 4, Insightful

      You must not deal with the government much :)

      If you are bidding for a government contract, it's a public bid. They state their requirements very precisely, and every single dollar you spend over is counted against you.

      Basically to do network backup, you'd have to eat it out of the goodness of your heart. There is a potential to upsell later, of course, but it has to go back through the public approvals process.

      --
      An operating system should be like a light switch... simple, effective, easy to use, and designed for everyone.
    6. Re:Blame Northrop? by TheLink · · Score: 3, Insightful

      > they have no trouble waking you up to make you fix it, but if you suggest an HA/failover?
      > Sorry, too expensive. We have weighed the risk, and decided it's an acceptable risk.

      Yes because they can count on waking you up to fix it.

      So seems perhaps the bosses are doing the right thing for the organization. They hired you, you will wake up to fix it, and they don't need to spend on HA/failover.

      Now if they hired someone who can't fix it fast, or sleeps really soundly, then they should spend on HA/failover, or hire you instead ;).

      --
    7. Re:Blame Northrop? by nine-times · · Score: 4, Insightful

      They're not in business to keep the state government afloat, their only purpose is to make money.

      I hate when this is offered as an excuse for shoddy work. "It's not their job to do good work. It's their job to make money." Yeah? So what. It strikes me a little like saying, "Hey, can't blame a con man for stealing your money. That's what con men do!"

      I don't know this particular situation well enough to say who is at fault and to what degree, but it's part of their business to service their customers well. It's part of every company's business to provide service to their customers in an ethical manner.

  2. Easy by Spad · · Score: 5, Insightful

    During the first six months of the year, state Department of Transportation workers faced 101 significant IT outages totaling 4,677 hours: an average of more than 46 hours per outage. One took 360 hours to fix.

    That's 27 weeks of downtime in the space of 26 weeks, which raises a much more important question than why there's no network redundancy and that question is: What kind of fucking morons have they got running their systems?

  3. outsourcing by Clover_Kicker · · Score: 5, Insightful

    But I thought the magic pixie dust of free enterprise would make outsourcing something to the private sector cheaper, more efficient, and better in every possible way?

  4. Re:They have bigger problems than just this one... by mcgrew · · Score: 3, Insightful

    Bureaucracy is bureaucracy. Government involvement doesn't mean ineptitude, and the free market doesn't gurantee competence. Whether private or public, ineptitude as well as competence abounds.

  5. Network redundancy not backups by zerofoo · · Score: 4, Insightful

    The article does not mention "backups" as in tape drives and off-site storage.

    The article does mention lack of redundancy at the network carrier level.

    My guess is that Northrop Grumman designed a network around single circuits connecting offices to data centers, and did not design the network to tolerate WAN link failures.

    A stupid oversight for sure, but nothing that can't be easily remedied by ordering redundant WAN circuits from your telco of choice. Redundant routing gear would also be smart.

    For all that are blaming government for this - they outsourced the design and implementation to a private company. That company screwed the pooch in design and implementation. Shame on both parties for not recognizing the risk of WAN failure.

    -ted

  6. Funny math or multiple systems? by Cprossu · · Score: 3, Insightful

    "During the first six months of the year, state Department of Transportation workers faced 101 significant IT outages totaling 4,677 hours: an average of more than 46 hours per outage. One took 360 hours to fix."

    wait, 4,677 hours? how could that be? There were 181 days in the first 6 months of this year, that's only 4,344 hours.. there was more downtime on the system than days in it's operational life! (did someone /0 here?)

    Outsourced, no thanks... I think I'd rather dig up a Univac I to do work on, at least it would be more reliable

    1. Re:Funny math or multiple systems? by Tino · · Score: 3, Insightful

      4,677 hours of failure in 4,344 hours of time means that at any given time, an average of 1.07 locations were offline.

      There are 131 DMV offices in Virignia; I don't know how many other Department of Transportation locations are included in the same bucket. If we assume that it's *only* the 131 DMV offices, 1.07 failures at any given time means that the system means that 130.3 locations are working, meaning that this statewide patchwork of network connections is 99.45% reliable.

      If your 'redundant' connections cut the failures in half (which they wouldn't), you'd have 99.59% reliability at more than twice the cost for the network.

      Adding 'redundancy' would more than double the network cost (since presumably currently they're using the lowest bidder), and in most places it wouldn't add any real redundancy anyway. Getting actual network redundancy is *fiendishly* difficult, even when you're spending a lot of money and siting a facility in a place that's well-served for networking. In small-town Virginia, you're almost certainly going to wind up paying for having redundant wires hanging on the same poles.