State of Virginia Technology Centers Down

HA fail by Anonymous Coward · 2010-08-27 04:03 · Score: 4, Insightful

How does a fault in a single SAN controller cause an outage of the entire data storage network? Expensive SAN solutions are expensive & highly redundant for reason. This smells like a "Let's buy the cheaper solution" and/or an infrastructure design fail.

Re:HA fail by Anonymous Coward · 2010-08-27 04:14 · Score: 0

It is far worse than that. The summary says it is a meltdown! I don't know how IT could cause that, but terrorism must be involved. From what I've heard, they are evacuating New Jersey and calling in the National Guard.
Re:HA fail by cgenman · 2010-08-27 04:39 · Score: 4, Interesting

Also, this can happen when you hire an external firm to manage something that you should be managing yourself. External managers for projects like this are motivated by extracting as much money as possible from you. Internal departments of technology, by comparison, are motivated by convincing co-workers to not shout at them.

--
The ______ Agenda
Re:HA fail by Even+on+Slashdot+FOE · 2010-08-27 04:40 · Score: 2, Interesting

Step 1) Design system so a single SAN controller is the only thing keeping the network running.
Step 2) Use money saved by not adding redundancy/designing the system correctly to give self money.
Step 3) Expect one component to last long enough for you to leave the job before it fails.
Step 4) ????
Step 5) Profit anyway because they don't get the concept of failures==bad things and keep paying you.
Re:HA fail by g0bshiTe · 2010-08-27 04:53 · Score: 1, Insightful

I live in Virginia, it's more like business as usual for a Commonwealth.

--
I am Bennett Haselton! I am Bennett Haselton!
Re:HA fail by NotBornYesterday · 2010-08-27 05:00 · Score: 1

Because there was more than one failure. FTFA:

The system was built with redundancies and backup storage. It was hailed as being able to suffer a failure to one part but continue uninterrupted service because standby parts or systems would take over. But when the memory card failed Wednesday, a fallback that attempted to shoulder the load began reporting multiple errors, Nixon said.
Cheap solution problem? Possibly. Infrastructure design fail? Possibly, but not likely. Couldn't critique it without seeing their setup, but it sounds like they designed some redundancy in. I wonder what kind of "memory card" failed. From the description, it sounds like it might be a cache module.

--
I prefer rogues to imbeciles because they sometimes take a rest.
Re:HA fail by Anonymous Coward · 2010-08-27 05:04 · Score: 0

Too many Quantum BigFoot drives
Re:HA fail by Local+ID10T · 2010-08-27 05:06 · Score: 1

Because there was more than one failure. FTFA:

The system was built with redundancies and backup storage. It was hailed as being able to suffer a failure to one part but continue uninterrupted service because standby parts or systems would take over. But when the memory card failed Wednesday, a fallback that attempted to shoulder the load began reporting multiple errors, Nixon said.
Cheap solution problem? Possibly. Infrastructure design fail? Possibly, but not likely. Couldn't critique it without seeing their setup, but it sounds like they designed some redundancy in. I wonder what kind of "memory card" failed. From the description, it sounds like it might be a cache module.
Regular testing of redundant systems is critical. Anyone who has done disaster planning knows this.

--
"You want to know how to help your kids? Leave them the fuck alone." -George Carlin
Re:HA fail by Daniel+Dvorkin · 2010-08-27 05:08 · Score: 2, Funny

Also, this can happen when you hire an external firm to manage something that you should be managing yourself. External managers for projects like this are motivated by extracting as much money as possible from you. Internal departments of technology, by comparison, are motivated by convincing co-workers to not shout at them.
B-b-but you're saying that the bloated corrupt government that takes money from people at gunpoint and has no incentives for efficiency might have done a better job than a private contractor that works on the God-given free enterprise system that rewards efficiency and punishes waste! That's unpossible!

--
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
Re:HA fail by Wyatt+Earp · 2010-08-27 05:08 · Score: 3, Informative

Sweet Zombie Jesus.
If the RAM in our 8TB Netgear SAN fries it doesn't blow up my office, what the hell are they and Northrup Grumman doing?
Re:HA fail by Wyatt+Earp · 2010-08-27 05:10 · Score: 1

Did the dude from the City of SF design this network so that if he wasn't there to SSH in with a modem he had hidden in his toaster over, the ram in a SAN would bring the whole network down?
Re:HA fail by donnyspi · 2010-08-27 05:18 · Score: 1

Yeah really. Before we got away from traditional hardware (NAS, SAN, etc.) we had piece of crap Dot Hill arrays and they had redundant power supplies and redundant controllers. There must be more to this story.
Re:HA fail by Anonymous Coward · 2010-08-27 05:19 · Score: 0

>If the RAM in our 8TB Netgear SAN fries it doesn't blow up my office, what the hell are they and Northrup Grumman doing?
If I had to speculate, I'd guess they are thinking far beyond the 'single office' blowup scenario and dealing with an order of magnitude more volume.]
At least I hope so. If the whole state government used a single, consumer-grade SAN, that wouldn't really surprise me but it bothers me a lot that people get billion dollar contracts for things where I'd do a far better job at $200/hr for a few months.
Re:HA fail by Anonymous Coward · 2010-08-27 05:20 · Score: 0

Agreed. Cheaper is better and worth 1-2 days downtime a year. How does the Cost per Downtime compare to other states?
Re:HA fail by pnutjam · 2010-08-27 05:35 · Score: 3, Funny

I started working for a city government earlier this year, let's just say I was amazed, I won't qualify it as amazed in a good way or bad way, but, you know...

--
Cheap storage VM.
Re:HA fail by CharlyFoxtrot · 2010-08-27 05:41 · Score: 1

How does a fault in a single SAN controller cause an outage of the entire data storage network? Expensive SAN solutions are expensive & highly redundant for reason. This smells like a "Let's buy the cheaper solution" and/or an infrastructure design fail.
On the plus side if the US government ever builds Skynet we know where to strike.

--
If all else fails, immortality can always be assured by spectacular error.
Re:HA fail by Anonymous Coward · 2010-08-27 05:43 · Score: 2, Insightful

Did the dude from the City of SF design this network so that if he wasn't there to SSH in with a modem he had hidden in his toaster over, the ram in a SAN would bring the whole network down?
No he asked them repeatedly to buy a spare, which was denied, then he refused to yank it out of the live production system when another's department's boss said to give it to the chick he was banging so she could be a computer expert too.
Re:HA fail by wkcole · 2010-08-27 05:49 · Score: 4, Interesting

How does a fault in a single SAN controller cause an outage of the entire data storage network? Expensive SAN solutions are expensive & highly redundant for reason. This smells like a "Let's buy the cheaper solution" and/or an infrastructure design fail.
RTFA!
The problem was a dual (or worse) failure. What the article reveals is that while they may have had all of the right hardware in place and a mechanism for it to handle the most likely failures, they were missing the 'soft' components of a good HA system: routine testing of failover and a rapid repair plan. In the auto industry where failed systems can halt factories and rack up hundreds of thousands of dollars of cost per hour of downtime, it is the norm for HA systems to have frequent failover tests, to have on-site spares for critical components that can be replaced by on-site staff, and to have support arrangements that put a skilled human on-site with replacement hardware in a small amount of time. This is why traditional "enterprise class" systems are so expensive. They are designed for rapid diagnosis and repair, and a well-run enterprise that needs truly HA systems pays for expensive HUMAN support by their own staff and/or from IBM, Sun^WOracle, EMC, HP, etc. and monitoring systems on top of that. If you fail over your HA systems every Sunday at 02:00 (or whatever time is safe...) and have the right staff, processes, and support contracts in place, you will find nearly all of the latent failures and have them fixed before a true production failure exposes them.
The most appalling thing about this to me isn't the failure. Some systems don't have safe times for testing failovers, and I know from personal experience that a component in an HA system that was working perfectly Saturday and has been idle since Sunday can go tits-up when needed on Wednesday. The real problem is the long outage. If the clowns in the VA state government were doing their jobs, they would not have a system like this without vendor support contracts to fix well-defined hardware problems (e.g. "bad memory card" ) within a few hours at most. This was something I always loved about working in a shop with the top-grade EMC contract. The Symmetrix and its associated gadgetry would call EMC about failures and we'd have a tech show up at the DC with parts before we even noticed anything unusual: costly, but nowhere near as expensive as killing all of the SAN-reliant systems for a random day every 3 years. The 4th 9 is not cheap or simple, because it always requires humans.
Re:HA fail by Foobar+of+Borg · 2010-08-27 05:53 · Score: 1

terrorism must be involved. From what I've heard, they are evacuating New Jersey and calling in the National Guard.
No, that was the Martians.

--
Similar to the upcoming US election results
Re:HA fail by Anonymous Coward · 2010-08-27 06:29 · Score: 0

It sounds like a case of, "oh, it's a government contract. Let's charge them the price for the good system and then deliver a bunch of consumer-grade Linksys SAN boxes held together with a token ring network and duct tape." That's the efficiency of private industry for you ... efficiency at robbing the public and then using that money to buy off politicians who turn around and say, "look how inept and inefficient government is. We should send more business to entity X."
Re:HA fail by Peeteriz · 2010-08-27 06:42 · Score: 1

They had pretty much the most expensive support contracts possible. The problem is that apparently all this waste of taxpayers' money has bought nothing useful.
Re:HA fail by swb · 2010-08-27 07:07 · Score: 1

If you fail over your HA systems every Sunday at 02:00 (or whatever time is safe...)
(voice of tech ignorant executive)
"We can't be down then. We have remote workers that want to do things at that time."
"The overtime for that window is too expensive, and we can't do it during production hours. We'll just assume you planned carefully."
"You just told me part of the reason that system is so expensive is that it is much less likely to fail. Well, we're not paying for a spare."
And after hearing that, I want to duct tape those fucking executives to their $1500 chair and let them watch while I take a powder-actuated nailer to their precious Mercedes S550.
Re:HA fail by Anonymous Coward · 2010-08-27 07:34 · Score: 0

What the article reveals is that while they may have had all of the right hardware in place and a mechanism for it to handle the most likely failures, they were missing the 'soft' components of a good HA system
So, like I said, it was an HA infrastructure failure.
Re:HA fail by operagost · 2010-08-27 08:10 · Score: 0, Troll

No, it kinda sounds like he believes IT outsourcing is a bad idea for any organization. But thanks for nothing.

--

Gamingmuseum.com: Give your 3D accelerator a rest.
Re:HA fail by ultranova · 2010-08-27 08:15 · Score: 3, Insightful

B-b-but you're saying that the bloated corrupt government that takes money from people at gunpoint and has no incentives for efficiency might have done a better job than a private contractor that works on the God-given free enterprise system that rewards efficiency and punishes waste!

On the contrary, the free market did exactly as it was supposed to: it eliminated the inefficiency of redundant systems and a safety margin. Efficiency or the safety of redundancy, you can have one or the other but not both. That's why any important system should be managed by the government, and free enterprise should be limited to the role of logistical optimization it's actually good at.
Unfortunately some people nowadays consider free market their religion, so we got deregulation and resulting financial crisis. Oh well...

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:HA fail by Kymermosst · 2010-08-27 08:30 · Score: 1

It is far worse than that. The summary says it is a meltdown! I don't know how IT could cause that, but terrorism must be involved. From what I've heard, they are evacuating New Jersey and calling in the National Guard.
No. Their IT infrastructure is so power-hungry that they co-located a nuclear plant with their main data center.
*That* is what is melting down.

--
"Alcohol, Tobacco, Firearms, and Explosives" should be a convenience store, not a government agency.
Re:HA fail by Americano · 2010-08-27 08:54 · Score: 0, Troll

ZING! You really made a great point there about how government is well-suited to do things like this better than private enterprises - we all know that companies like Google, Apple, Microsoft, and dozens of Financial Services firms know nothing about operating redundant, fault-tolerant, and high-availability systems! This single case has amply demonstrated that the free market is a sham, and the government should pretty much manage everything.
I suspect this has nothing to do with "government" vs. "free market," and everything to do with "we hired a contractor who didn't know what the fuck they were doing, and so our system has design flaws, and so it crashed. Oh, and we also delivered poorly documented requirements, a shitload of scope creep, and gave preference to some contractors' bids because they were friends with the head of our agency!"
Not surprisingly, this is a risk when the support contract goes to the person with the best connections and lobbying, rather than the best design specs and capability.
Re:HA fail by conspirator57 · 2010-08-27 08:59 · Score: 2, Interesting

Funny, I was unaware that Northrop Grumman were a scion of the free market. Could you name some of their non-government customers that provide more than 1% of their total revenue? It's called a Military-Congressional-Industrial Complex for a reason. But thanks for playing the strawman game.

--
"If still these truths be held to be
Self evident."
-Edna St. Vincent Millay
Re:HA fail by Americano · 2010-08-27 09:03 · Score: 1

Does the government not have responsibility to:
1) Manage the delivery and implementation of the contracted items, and
2) Verify that what was contracted for is actually delivered?
Are you actually suggesting that a bunch of "average salary" mid-level IT drones would have done a better job at implementing a high availability / fault tolerant system than a private contractor that specializes in design and implementation of this type of system, and has done it dozens of times?
I think it's far more likely that the contractor chosen (Northrop) was chosen less for their skill in this area, and more based on how much money they spent lobbying somebody.
Re:HA fail by conspirator57 · 2010-08-27 09:06 · Score: 1

don't be selfish. i want to watch the executive too. it would be far more socially valuable for you to sell tickets and publish a web video. plus you could make mad bank doing a real public service by providing a cautionary tale to other executives so as to avoid similar asshattery in future.

--
"If still these truths be held to be
Self evident."
-Edna St. Vincent Millay
Re:HA fail by cgenman · 2010-08-27 09:46 · Score: 4, Insightful

If you're big enough that you're not just going to be scaling staff up and immediately down again, hire your people in-house. It's not a question of government vs private companies. It's a question of hiring your best people to be on staff, or outsourcing to someone who doesn't have the same motivations. This is true if you're a government, a corporation, a private entity, or a high school marching band. Plus the markup on external IT services is just obscene.
Poorly managed projects will be poorly managed internally or externally. But externally poorly managed projects are a lot more expensive, and harder to reign back under control.

--
The ______ Agenda
Re:HA fail by Americano · 2010-08-27 10:05 · Score: 1

If you're big enough that you're not just going to be scaling staff up and immediately down again, hire your people in-house.
I agree. But you also have to consider whether state government pay scales are going to allow you to hire the "best" people for the job onto your staff, and I think largely the answer to that is, "No."
This is essentially "data center design" and building the systems to integrate with a host of state services. When you're talking about a project of that complexity, "Hey, you're an Exchange admin, why don't you build out a Fault tolerant, high availability SAN network for ~20 state agencies when you have a couple spare hours?"
Obviously, the contractors can't work in a bubble - the people who will use the systems have to communicate requirements and use cases to the contractors, which requires good management and oversight, but that's not *necessarily* a fault of the contractor if there isn't proper management and oversight of the project from the government. And again, that has nothing to do with "free market" vs. "government," which was really the point I was getting at in my response.
Re:HA fail by Anonymous Coward · 2010-08-27 11:10 · Score: 0

i heard EMC was on site and they were doing a remediation. The REAL FAIL was they were making changes in the middle of the day?? NG did the decision, they should pay the price! Virginia needs to ditch their terrible monopoly contract with NG.
Re:HA fail by hawguy · 2010-08-27 11:37 · Score: 1

Cheap solution problem? Possibly. Infrastructure design fail? Possibly, but not likely. Couldn't critique it without seeing their setup, but it sounds like they designed some redundancy in. I wonder what kind of "memory card" failed. From the description, it sounds like it might be a cache module.
Uhh, if they designed in redundancy and it failed, isn't that the definition of "Infrastructure design fail"?
Re:HA fail by hawguy · 2010-08-27 12:05 · Score: 2, Interesting

If you fail over your HA systems every Sunday at 02:00 (or whatever time is safe...)
(voice of tech ignorant executive)
"We can't be down then. We have remote workers that want to do things at that time."
"The overtime for that window is too expensive, and we can't do it during production hours. We'll just assume you planned carefully."
"You just told me part of the reason that system is so expensive is that it is much less likely to fail. Well, we're not paying for a spare."
And after hearing that, I want to duct tape those fucking executives to their $1500 chair and let them watch while I take a powder-actuated nailer to their precious Mercedes S550.
Why the rage? Just spell out very clearly (and in writing) exactly what will happen if component X fails, and the cost to implement redundancy now. When component X fails and the company loses Y dollars of revenue and the CEO comes to you, just pull out the email and say "I tried to design redundancy but he wouldn't spend the money".
It worked for me when I tried to get money for a spare battery cabinet on our primary UPS. I told my boss that if a single battery in the string fails during a power failure, we'll lose most of the server room. That happened during a power outage -- the UPS went offline, we lost power to most of the equipment (we were able to scrounge up enough money for redundant power to some core network devices, but nearly all of the servers went down). Everything came up again after the generator restarted, then went down again after utility power came back because of a delay in transfer from generator to utility. My boss screamed at me, I told her I warned her it could happen and pulled out the email where I outlined exactly that scenario (except for the part where everything tripped off again during the switch back to utility power.
I had a new battery cabinet installed within 2 weeks, *and* a redundant UPS for most servers.
Re:HA fail by davester666 · 2010-08-27 17:15 · Score: 1

$2.4 Billion for 10 years...
Obviously, they went with the low-bidder. You get what you pay for!

--
Sleep your way to a whiter smile...date a dentist!
Re:HA fail by NotBornYesterday · 2010-08-29 04:45 · Score: 1

Impossible to say without seeing the design, but my guess is no. To me, "infrastructure design fail" implies the architect foolishly designed single points of failure into the plan.

What happened here would likely be an "infrastructure fail", or possibly a "maintenance fail".

--
I prefer rogues to imbeciles because they sometimes take a rest.
Re:HA fail by wkcole · 2010-08-30 05:06 · Score: 1

Why the rage? Just spell out very clearly (and in writing) exactly what will happen if component X fails, and the cost to implement redundancy now. When component X fails and the company loses Y dollars of revenue and the CEO comes to you, just pull out the email and say "I tried to design redundancy but he wouldn't spend the money".
BTDTGTPS
Or very nearly so. I admit I have never been immediately and explicitly fired for "I told you so" moments, but both as an employee and as a consultant I've had predominantly negative experiences of citing unheeded advice in my own defense. Sometimes a blamethrower is rational and accepts the fact of having made a mistake, but that is not the most common outcome. If a problem gets to the point of someone trying to narrow the focus of accountability, it is unlikely to end fairly or well for the people with the lowest management stature.

I had a new battery cabinet installed within 2 weeks, *and* a redundant UPS for most servers.
I've had that sort of thing happen as well, but it isn't necessarily a victory. My first instructive experience of this sort (about 20 years ago, so obviously I'm a cynical old fart) seemed a success in the short term direct sense, but 3 months later it was the centerpiece of a brutally unfair annual performance review, zeroing out my bonus because I was not a "team player" and had not "stood behind" my boss (a hard trick when she was actively trying to make me a human shield while stabbing me in the back.) And of course the ongoing cost of belatedly following my advice was used to rationalize a pay freeze for our whole team. My primary lesson from that was that I needed to work elsewhere, but a more generally useful rule of thumb is that there is no such thing as winning in a workplace CYA game. Being right and even being acknowledged as such don't guarantee anything and being subject to the decisions of someone who doesn't take responsibility for their errors is a crappy way to live.
Re:HA fail by wkcole · 2010-08-30 05:29 · Score: 1

They had pretty much the most expensive support contracts possible. The problem is that apparently all this waste of taxpayers' money has bought nothing useful.
"Most expensive" != "Best"
"Most expensive" != "Adequate"
I have not seen all the details and they probably are not public but given that NG has a history of being primarily a military contractor, I'd bet on whatever they offered being vastly overpriced, deceptively sold, specified well short of best-in-class, and provided objectively short of that spec.
Re:HA fail by swb · 2010-08-30 05:30 · Score: 1

Why the rage? Just spell out very clearly (and in writing) exactly what will happen if component X fails, and the cost to implement redundancy now. When component X fails and the company loses Y dollars of revenue and the CEO comes to you, just pull out the email and say "I tried to design redundancy but he wouldn't spend the money".
It's never that easy. IT is *always* wrong when disaster strikes, even with memos, charts, graphs and color glossy photographs with circles and arrows and a paragraph on the back of each one.
The first excuse is always that you didn't explain the issue clearly enough, so they can't be responsible for refusing funding. -- "I've warned him about his communication skills, they're consistently full of technical language he can't or won't explain." I like when they make their ignorance your problem.
This is often coupled with "IT's solution wasn't redundant and couldn't be made redundant without spending significant money, that, at the time and considering the economic situation, could not be justified, given the focus on saving jobs and salaries." In other words, I don't get my bonus without suppressing spending, and daddy needs a new S550.
And all this is assuming that, after the disaster, you're given an opportunity to even explain the outcome and management's role in not preventing it. Usually the CTO/CIO is told they'll "be OK after this" but we "probably need to clean house a little to get rid of the people that always blame management." In other words, for keeping the real reasons hushed up, somebody gets paid and the smart people get shitcanned without ever getting a chance to explain management's failure. This is sometimes sold as in "everyone's best interest" if there's an ESOP and/or bonuses paid in options, since keeping the stock price up becomes the only thing that matters.
All in all, there's just too many ways for execs to control the debate and dole out blame, regardless of "proof" that their willingness to line their own pockets outweighed doing the right thing. I think its also common in non-high-tech industries for CIO-type positions to be filled by rejects and trustees from the accounting & finance department with an over-arching mandate to "cut costs". That's great, except they end up with cash incentives (stock, bonus money, etc) to cripple DR or other "just in case" systems because they are so expensive, and when shit does happen, they get away with it by saying they were "told" to keep costs down.
Re:HA fail by waveman · 2010-09-03 09:21 · Score: 1

> Why the rage? Just spell out very clearly (and in writing) exactly what will happen if component X fails, and the cost to implement redundancy now. When component X fails and the company loses Y dollars of revenue and the CEO comes to you, just pull out the email and say "I tried to design redundancy but he wouldn't spend the money".
This happened to a friend of mine. The internal "customer" raged about the waste and cost of the proposed redundant network infrastructure. He insisted on having it removed. My friend agreed but insisted on the "customer" signing off on the risks involved and acknowledging it was he who made the decision.
In due course the network failed and due to lack of redundancy it was out for about 1/2 a day. The "customer" then came for my friend's head, demanding that the person responsible for this disaster be fired.
My friend produced the documentation and asked the "customer" if he wanted this forwarded to his boss so he could be fired. The issue suddenly went very quiet. Without this documentation my friend would have been fired.
Re:HA fail by c6gunner · 2010-09-03 11:29 · Score: 1

The free market isn't some magical pixie that just does whatever you want. If the government got fucked over by Northrop Grumman, it's either because they didn't bother to look into what they were buying, or they failed to hold NG accountable when it failed to deliver on the promised contract. The free-market is about choice and competition - neither of which is worth a damn if the customers are too lazy or too stupid to verify that they're actually getting what they want for the best price available.

card? by The+Lyrics+Guy · 2010-08-27 04:04 · Score: 1

What is a SAN memory card?

Re:card? by snookerhog · 2010-08-27 04:06 · Score: 3, Insightful

sounds like nobody in Virginia knows either
Re:card? by Culture20 · 2010-08-27 04:08 · Score: 3, Informative

A technically correct term, albeit against normal colloquialism which calls them memory chips. Memory chips are the black things on the cards.
Re:card? by NotBornYesterday · 2010-08-27 05:02 · Score: 2, Informative

From the awkward phrasing, my completely uninformed guess is they are referring to a cache module on a controller somewhere.

--
I prefer rogues to imbeciles because they sometimes take a rest.
Re:card? by GaryOlson · 2010-08-27 05:45 · Score: 1

Unless the unit is from Texas Memory Systems -- completely flash based storage. Note in the FAQ where scheduled down time is required to replace a faulty Flash drive.

I wonder is some genius decided to use one of these units as primary storage instead of using as caching storage.

--
Every mans' island needs an ocean; choose your ocean carefully.
Re:card? by Anonymous Coward · 2010-08-27 06:38 · Score: 0

Interesting that they would use that kind of infrastructure instead of say redundant hard drives in multiple physical locations. I don't have any experience with SAN architecture but it sounds like they are using solid state hdd's. I was unaware that they had achieved that kind of capacity. Shoehornjob

It's always money by Anonymous Coward · 2010-08-27 04:05 · Score: 2, Interesting

I'll tell you exactly how. Some manager somewhere said that it cost too much to add redundancy. It's happened over and over at my extremely large company, and it will continue to happen as long as money is the prime concern.

Re:It's always money by jsnipy · 2010-08-27 04:12 · Score: 1

at least now they can quantify thier (bad) descision with thier loss of productivity and perhaps loss of revenue.

--
-- if you mod me down, I will become more powerful than you can possibly imagine
Re:It's always money by snookerhog · 2010-08-27 04:12 · Score: 1

why do we need redundancy when the MTBF is 500000 hours? That's more than 57 years! Surely we will replace the whole system in less than that time, so why bother with redundancy?
Re:It's always money by Daniel_Staal · 2010-08-27 04:14 · Score: 3, Insightful

Add in politics: Get a couple of representatives arguing over where the money (if any) should be spent, and all possibility of real redundancy and fault-tolerance go out the window.
It's true in larger government organizations than this. The failures just haven't occurred yet.

--
'Sensible' is a curse word.
Re:It's always money by Anonymous Coward · 2010-08-27 04:17 · Score: 1, Insightful

There's not a lot of money left over for redundancy after you take out the kickbacks, graft and bribes.
Re:It's always money by jeffmeden · 2010-08-27 04:20 · Score: 1

What does mean mean again? Ah nevermind. Odds are 2 out of 3 it will fail outside of business hours anyway. And if that's the case, no one will notice!
Re:It's always money by Anonymous Coward · 2010-08-27 04:25 · Score: 0

No, we don't want to sleep with your mom.
Re:It's always money by joebok · 2010-08-27 04:34 · Score: 1

Money is always the prime concern for a business. If the cost of adding redundancy is higher than the expected cost of dealing with network failure, then why would a business do it?
That being said, I often see the cost of dealing with a significant network interruption being underestimated - either the $ cost or the probability of it happening.
Re:It's always money by cgenman · 2010-08-27 04:42 · Score: 2, Interesting

I love how people can determine a MTBF of 50 years after testing a piece of hardware for a month.
For my money, the only computer that should be able to claim a 50 year MTBF is the Univac. And that's really, really not accurate.

--
The ______ Agenda
Re:It's always money by cgenman · 2010-08-27 04:49 · Score: 4, Insightful

Everyone seems to think that a network outage is no big deal, until the network goes down. That's when people start thinking of the burn rate of an entire organization sitting on their thumbs while that network of off-the-shelf Linksys routers is replaced by some kid at Best Buy. Or how that 5k dollars per year for a backup external line suddenly pales in comparison to the 5k dollars per hour your organization is wasting because you were a cheap bastard.

--
The ______ Agenda
Re:It's always money by jeffmeden · 2010-08-27 04:52 · Score: 1

What does mean mean again? Oh, that's right. If you want a MTBF of 50 years, you can either get one unit and run it for 50 years to prove yourself, or you can get 100 units and run them for 6 months... To be sure, it doesn't automatically take into account mechanical wear but any engineer worth their salt can extrapolate acceptable wear rates with 6 months of data (and that's only if you are talking about systems with moving parts)...
Re:It's always money by geekoid · 2010-08-27 05:09 · Score: 3, Interesting

This is a private sector failure. NG is the culprit here, not the government.
This is why you should be very wary of bidding out work to 3rd party. They don't care about your city. They are not thinking about how their decision impact the city in 10-20-50 years.
and while infrastructures is far more complex and expensive then people who don't deal with it realize, 2.5 billion of 10 years? 240million a year? That is a price where they should have a tested redundancy system. I single point SAN failure? Shame on NG.
I hate to burst your preconceive bubble, but in my years in the private sector and public sector as taught me, most government agency are far better at keeping there own infrastructure. More reliable and long standing.

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
Re:It's always money by Anonymous Coward · 2010-08-27 05:15 · Score: 0

Yeah, the failures have occurred, but they've been spun better.
Re:It's always money by Anonymous Coward · 2010-08-27 05:19 · Score: 2, Funny

Oh yeah? Well YOUR momma so stank, she lay down on train tracks and nothing happened 'cause not even the train would hit that.
Re:It's always money by Daniel_Staal · 2010-08-27 05:20 · Score: 2, Insightful

My 'preconceive bubble' is based on my current job for the US government, and the situation we have in our department.
It might be true on average that government agencies are better at keeping their own infrastructure, especially if they can manage to keep their accounting and design of that infrastructure at a lower level. However, once those decisions pass the level from the internal to the external (or: From those hired for the job, to those elected/appointed into it), that long-term planning appears to break down, in favor of political squabbles.

--
'Sensible' is a curse word.
Re:It's always money by Anonymous Coward · 2010-08-27 05:49 · Score: 0

Yeah? Well YOUR momma is so stupid, she thought Tupac Shakur was a Jewish holy day!
Re:It's always money by cgenman · 2010-08-27 05:52 · Score: 3, Informative

"get 100 units and run them for 6 months..."
Which works if you presume a linear fail rate, which is bonkers. Systems always run better at the beginning of their lifecycle. Static buildup, electrical interference, repeated heating and cooling cycles, etc all take a toll on the electronics. Would you really personally estimate a real-world MTBF of off-the-shelf SATA drives at 70 years? No, because they work perfectly well for the first year, start having trouble the second, and are all dead by the 8th. But if you presume linear dropoff using just that first year of testing, they look pretty damn bomb proof because that's when they work best. It's a stupid system that's only valid if you replace all of your hardware every year.
And all systems have moving parts. Electrons move. The circuit boards expand and contract. Crap builds up on important components. Electroplating can move move metals from one part of the design to another. Stuff gets plugged in and unplugged.
I realize that MTBF has a very technical definition that is different than marketing departments utilize it as. I might agree with you that any engineer worth their salt can extrapolate a proper MTBF. But most of the MTBF's I've seen are just stupidly wrong. If people really believe those published fantasy numbers, no wonder they don't put enough redundancy in their systems.

--
The ______ Agenda
Re:It's always money by gclef · 2010-08-27 06:14 · Score: 1

However, once those decisions pass the level from the internal to the external (or: From those hired for the job, to those elected/appointed into it), that long-term planning appears to break down, in favor of political squabbles.
As someone who's worked both sides of the public/private line, allow me to assure you that this is not unique to government. I've seen plenty of boneheaded design decisions made by upper management for obscure/bizarre/just-plain-wrong reasons in both private and government gigs.
Re:It's always money by Anonymous Coward · 2010-08-27 06:24 · Score: 0

yeah. um. You know what a "mean" is right? That's a statistic. You get them with math and stuff. Not by sitting with a clipboard in a room for 50 years waiting vacuum tubes to pop.
Re:It's always money by sjames · 2010-08-27 07:36 · Score: 1

It MAY be a government failure as well. When you write the impossible into a bid, make the bidding process tremendously complex and make the cost of even bidding too high for most potential contractors (by expecting a complex analysis up-front for free) you eliminate all but the largest contractors with a fat legal department. If you then require acceptance of the lowest bid with no allowance for confidence level you set up a perfect storm for a ripoff. You assure that each bid you receive will be a lie based more on how much they think they can underperform based on how intimidating their legal staff is and how many low friends they have in high places.
The one thing you won't get is a sincere bid from someone who expects to make a fair profit for a job well done.
Government bidding is the process of spending thousands of dollars to make sure you don't lose tens of dollars and getting it wrong anyway.
Re:It's always money by conspirator57 · 2010-08-27 09:10 · Score: 1

no, no, no. those costs are incalculable and soft costs not in management's budget. they're covered by the emergency fund. not to mention it'd look bad for the manager if they got toted up and ascribed to him/her.

--
"If still these truths be held to be
Self evident."
-Edna St. Vincent Millay
Re:It's always money by QuantumBeep · 2010-08-30 13:56 · Score: 1

You would not believe the number of businesses that use a bog-standard DSL line that doesn't even carry an SLA - and then have the hubris to call their ISP and demand compensation for "lost revenue".

Northrup Grumman by elrous0 · 2010-08-27 04:05 · Score: 1

Northrup Grumman already runs the U.S. military. Might as well turn over IT to them too.

--
SJW: Someone who has run out of real oppression, and has to fake it.

Re:Northrup Grumman by Anonymous Coward · 2010-08-27 04:23 · Score: 1, Informative

Northrop Grumman is actually only the 4th largest player in defense contracting. If you want to look at the big players, look at Boeing, Lockheed Martin, and BAE.

Northrop Grumman, to some of the other contractors, is also known to be a screw-up that puts out mediocre quality work with a high price tag.

LMT, Raytheon, SAIC, and Honeywell would have all been better choices for making quality products.
Re:Northrup Grumman by Anonymous Coward · 2010-08-27 04:33 · Score: 0

Yeah, that Lunar Module they used to put men on the moon was a real piece of shit.
Re:Northrup Grumman by Anonymous Coward · 2010-08-27 04:38 · Score: 0

Lunar module, right. Got any examples that aren't pushing 40 years old?
Re:Northrup Grumman by brainboyz · 2010-08-27 04:43 · Score: 1

B2 Bomber?
Re:Northrup Grumman by Amouth · 2010-08-27 05:00 · Score: 1

Rephrase - Got any examples that aren't more than 20 years old?

--
'...if only "Jumping to a Conclusion" was an event in the Olympics.'
Re:Northrup Grumman by Wyatt+Earp · 2010-08-27 05:14 · Score: 1

USS Ronald Reagan (CVN-76) and USS George H.W. Bush (CVN-77) are two things they build that don't break easily.
Re:Northrup Grumman by Anonymous Coward · 2010-08-27 05:18 · Score: 0

The construction of your moms titanium composite dildo practically reinvented the field of tolerance engineering. That has to count for something.
Re:Northrup Grumman by Anonymous Coward · 2010-08-27 06:52 · Score: 0

What is it with you defense groupies and phallic references? Do big weapons compensate for a certain anatomical inadequacy?
Re:Northrup Grumman by tsm_sf · 2010-08-27 08:24 · Score: 1

They were both robots? That explains so much...

--
Literalism isn't a form of humor, it's you being irritating.
Re:Northrup Grumman by Fnord666 · 2010-08-27 13:09 · Score: 1

All 2.5 billion gets you in the military is a manger and toilet seat.

Both useful things if you have to deal with an ass.

--
'The tyrant will always find pretext for his tyranny.' - Aesop's Fables

They need a better network admin by Nemesisghost · 2010-08-27 04:07 · Score: 4, Funny

Maybe they should hire Terry Childs, at least he won't let their network go down for something like this.

Re:They need a better network admin by Bogtha · 2010-08-27 04:22 · Score: 0

Who modded this insightful? The problems that Virginia are experiencing right now are caused by a single point of failure. Terry Childs, regardless of your opinions of his conviction, made himself a single point of failure. So out of every sysadmin in the entire fucking world, you picked the absolute least appropriate person for the job.

--
Bogtha Bogtha Bogtha
Re:They need a better network admin by Anonymous Coward · 2010-08-27 04:54 · Score: 2, Insightful

That's insane. Terry Childs failed (he was arrested and unable to make changes to the network)--and the city kept running.
Re:They need a better network admin by Anonymous Coward · 2010-08-27 05:37 · Score: 0

The did have their own version of Terry Childs-- and they had to let him go.
How else would they end up outsourcing to Northrop Grumman.

Well..... by Anonymous Coward · 2010-08-27 04:09 · Score: 0

HAHAHAHHAHAHHAHHA - stupids

"This is supposed to be the best system you can buy, and it's never supposed to fail, but this one did," he said

And iv'e got a bridge for sale in San Francisco...

Re:Well..... by jeffmeden · 2010-08-27 04:17 · Score: 2, Funny

HAHAHAHHAHAHHAHHA - stupids
"This is supposed to be the best system you can buy, and it's never supposed to fail, but this one did," he said
And iv'e got a bridge for sale in San Francisco...
Throw in your city's cisco-powered WAN and I'll take it!
Re:Well..... by Yunzil · 2010-08-27 07:29 · Score: 1

And iv'e got a bridge for sale in San Francisco...
No thanks. I got a sweet deal on one in Brooklyn.
Re:Well..... by aix+tom · 2010-08-27 09:50 · Score: 1

Well, they just proved it:
The difference between a system that can fail, and a system that can not possibly fail is, that when a system that can not possibly fail fails, the fault will be at a place impossible to get at and fix.

Redundancy by CmdrPorno · 2010-08-27 04:10 · Score: 2, Funny

Silly state, expecting to get redundancy for only $2.4 billion dollars. Don't they realize they're going to have to pay a lot more than that to get a reliable network?

--
Sent from my iPhone

Re:Redundancy by Wonko+the+Sane · 2010-08-27 04:24 · Score: 2, Interesting

What makes you think that the legislators expect redundancy? When that kind of money changes hands the only thing they care about is getting favors and campaign contributions.
Re:Redundancy by Darth_brooks · 2010-08-27 05:23 · Score: 1

Your sig makes that comment *that* much more hilarious.

--
There are some people that if they don't know, you can't tell 'em.
Re:Redundancy by CmdrPorno · 2010-08-27 06:20 · Score: 1

Thankfully, I didn't pay $2.4 billion for AT&T's crappy network. (The reception is actually not that bad here, but I'm in a rural area with no 3G, which really sucks.)

--
Sent from my iPhone

Awful. by boneclinkz · 2010-08-27 04:11 · Score: 4, Insightful

Our primary concern should be a complete audit of World of Warcraft server hardware, to ensure that this vulnerability does not exist in other, more vital networks.

Re:Awful. by Chris+Snook · 2010-08-27 10:39 · Score: 1

Given that Blizzard monitors local weather in places where they have data centers, to be aware of potential power supply and cooling issues before the alarms go off, I'm going to take a shot in the dark and guess their SANs use redundant controllers.
http://www.crunchgear.com/2009/09/18/blizzard-reveals-some-technical-data-about-world-of-warcraft/

--
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.

Re:It's always schedule by rwa2 · 2010-08-27 04:13 · Score: 2, Interesting

Heh, it shouldn't be about the money, though... they should have specified high availability from the very beginning. They often throw it out during the prototyping stage, saying they need to Keep It Simple Stupid just to get things working, but then all the software is never designed to be able to handle redundancy, and shoehorning it in later becomes pretty much like starting again from scratch.

Also, designing in redundancy is usually worse than having no redundancy at all if it's never tested. There should be a pretty simple test plan, where, say, the CTO comes in and is allowed to pull any single random wire or component out of the rack and see how the system reacts / recovers. But unfortunately people are usually using the system by that time, and it's too much of a hassle to come in off-hours and pay everyone overtime for such a test.

Sorry, has to be said... by Omega+Hacker · 2010-08-27 04:15 · Score: 2, Funny

I think the id10ts who pulled off this stunt are rather DIMM....

--
GStreamer - The only way to stream!

Question. by U8MyData · 2010-08-27 04:15 · Score: 2, Insightful

Umm, so what's the point of having a SAN if it weren't redundant? Me thinks there is more to this story.

Re:Question. by bswooden · 2010-08-27 04:23 · Score: 1

Umm, so what's the point of having a SAN if it weren't redundant? Me thinks there is more to this story.
from working with VITA on a daily basis, I can assure you there is probably not much more to the story than this. I have never seen a more disorganized bunch of clowns in my life.
Re:Question. by MightyMartian · 2010-08-27 04:25 · Score: 2, Insightful

Probably involving executives vacationing in nice tropical locales by rewarding themselves with hefty bonuses. Meanwhile some poor IT guys weren't given the budget that reflected how much the State was paying out, and had to cobble together a SAN solution, or pick the cheapest one off the shelf. The IT guys will, of course, be the patsies for this whole episode, with the CEO and CTO all huffing and puffing and vowing to State officials and lawmakers that they're doing everything they can to get to the bottom of this.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:Question. by EmperorKagato · 2010-08-27 04:34 · Score: 1

Even the cheapest solutions $20k to $40k range have redundancy.
It's not a SAN if it has 1 point of failure, it's just a virtual storage box or NAS. Hell they could have spent just $10k and just run a windows file server with a bunch of disks in a redundancy configuration.
I run a SAN network and you have no idea how much I'm raging over the stupidity of this incident right now.

--
----- You know you have ego issues when you register a domain in your name.
Re:Question. by Anonymous Coward · 2010-08-27 04:40 · Score: 0

The main server boots from a Sandisk SD card, because the admin had an old one on his camera and the brass wouldn't pay for something more reliable.
Re:Question. by Necron69 · 2010-08-27 04:44 · Score: 1

Ditto. I tests SAN configurations for a living, and I'm stumped by this one. I'd love to know some details.
Necron69
Re:Question. by Locke2005 · 2010-08-27 04:53 · Score: 1

You'd think they'd at least do RAID 1 Mirroring. Then they could just hot swap in another drive, sync it, and be on their merry way. Why centralize your data services if you're not going to do it right?

--
I've abandoned my search for truth; now I'm just looking for some useful delusions.
Re:Question. by jeffmeden · 2010-08-27 04:57 · Score: 2, Funny

"What could possibly be the difference between raid0 and raid1? Come on, who would put those radio button choices so close together if they really meant opposite things!"
Re:Question. by MightyMartian · 2010-08-27 04:58 · Score: 3, Informative

Well, as Sherlock Holmes' greatest axiom goes "When you have eliminated the impossible, whatever remains, however improbable, must be the truth." Using that logic, the answer is simple. They're not using a SAN. Somewhere along the line someone is bullshitting, and my gut tells me its management. A lot of folks who get government contracts pretty much view them as an opportunity to skim off the top. Why, take what should be a $50,000 solution and mock something up for $10,000, and that's $40,000 profit.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:Question. by Anonymous Coward · 2010-08-27 05:05 · Score: 0

It's not actually all that odd to boot a server from a cheap USB key. HP blades come with a little internal USB cradle just for this purpose. If all you're doing is booting the machine into some sort of appliance E.g. a NAS controller or a VM host, the root filesystem doesn't get touched once it's booted. The integrity of the root filesystem is only an issue when you reboot the machine, which shouldn't be happening often in the above scenarios.
Re:Question. by Darth_brooks · 2010-08-27 05:13 · Score: 3, Interesting

Depends on the SAN. The article (as most tech articles are) is very short on scope & details. So "one chip" went bad. Should that bring everything to a screeching halt? The answer should be "no" but in practice we can all say that it's more often a case of "not usually." From TFA:
It was hailed as being able to suffer a failure to one part but continue uninterrupted service because standby parts or systems would take over. But when the memory card failed Wednesday, a fallback that attempted to shoulder the load began reporting multiple errors, Nixon said.
So Array Alpha shits the bed. You follow your failover procedures and start running on Array Zappa. That immediately starts throwing errors. Ok armchair QB's, let me switch to my Keeanu Reeves voice and ask "What do you do?" You built a pretty damned redundant system there and you're still down. Sure, it'd be nice if they had a backup in another DC they could fail to, but they don't. Doesn't matter, eventually you're playing the double / triple / quadruple hulled oil tanker game. Either way, Redundant SAN's aren't cheap and aren't all that easy (it's not exactly a "the bosses nephew who 'knows all about computers' set it up last weekend" level of complexity.) The TFA also has these points:
Full function may not be restored until Monday.
Experts who examined the system determined that no data were lost except for those being keyed into the system at the moment it failed, Nixon said.
Other than the fact that proofreading and the usage of proper grammar are no longer a requirements to work for a Virginia newspaper, what do those points tell us? Sounds to me like they hit the last line in the DR procedures: Restore from backup. Depending on what their backup strategy is (maybe they're splitting several terrabytes across a tape robot that only supports 200/400gig tapes because that robot is the only device the vendor supports.) and how truly important the affected system is (This may be a system where the powers that be said "fsck it, they can process renewals by hand and we'll bring everything back up on Monday after we test on Saturday") a return to business on Monday might be SOP. But that wouldn't sell newspapers (or make talking points with the voters...) now, would it?
Maybe there was a major screwup here. Maybe they never tested their failovers and maybe that 2nd SAN was bad out of the box. I'm a little more willing to cut some slack and say "man, that sucks. Glad it's not my ass on the line." Karma's a bitch like that. I like to take these stories as an opportunity to rethink my own single points of failure are rather than point & laugh and tell everyone how I'll never lose and data because it's I'm running RAID 5......

--
There are some people that if they don't know, you can't tell 'em.
Re:Question. by geekoid · 2010-08-27 05:16 · Score: 1

Her is an educated guess:
When getting the bid, NG promised redundancy.
NG stalled and then was behind schedule.
the redundancy system became less 'important' due to time
NG went live
NG let a bunch of contractors go
NG says there in house staff will take care of it.
NG new hires get stuck at the end of the project, do enough to consider it 'done'. Several amateur mistakes were made.
What happening right now:
People who work for the state IT are showing everyone the email the they got from NG saying the system was redundant.
They then show all the emails clearly detailing why it wasn't.
politician don't want to blame anyone that they may have to deal with give lip service until American Idol is back on.

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
Re:Question. by fishbowl · 2010-08-27 05:35 · Score: 1

I have a production server that boots from a USB thumb drive. I couldn't get any loader to boot a linux kernel from a partition on its hardware RAID, and I didn't think it was a better solution to add a boot drive, so I used a USB stick. To be honest, I didn't really intend for it to be a permanent solution, and I'm still a little surprised that this machine so adamantly refused to boot from the RAID, but I'm not seeing the problem.

--
-fb Everything not expressly forbidden is now mandatory.
Re:Question. by Anonymous Coward · 2010-08-27 08:16 · Score: 0

For what it's worth, usually these kinds of contracts are at least in principle cost-plus - meaning that the hardware costs are billed directly from a fixed budget. Salaries and overhead/profit are paid separately, hence "plus" ...
Re:Question. by conspirator57 · 2010-08-27 09:17 · Score: 1

or it could be a service level agreement that abstracts how the specified performance metrics are to be achieved, assuming that they are fungible. A bad assumption, as it turns out.

--
"If still these truths be held to be
Self evident."
-Edna St. Vincent Millay

Safely Travel in VA by Anonymous Coward · 2010-08-27 04:22 · Score: 0

Excellent, I guess this means I'll be able to safely travel through Virginia without risking getting picked up on all my outstanding warrants.

Money by Anonymous Coward · 2010-08-27 04:23 · Score: 0

The only thing I can think of is that they decided it cost too much money. This is the problem with letting penny-counters make these decisions. "Oh, this one costs a fraction as much, and they're pretty much all the same. Right???"

When are people going to stop trusting business people for technical decisions? When are they going to figure out that they hired us for our knowledge, and not to just push buttons? We don't talk about backups and failovers just to sound cool. We're trying to save their butts from a meltdown like this. My advice is that if you're in a position that you have a penny-counter telling you what to buy, then just point at this story to give your opinion more weight -- especially if you've been trying to tell them for years or something.

I guess I'm lucky that I have a boss who used to work in IT, and so she gives my opinion a lot more weight than most supervisors do. We have several redundant backups, and we have two servers that can each pick up the slack of the other at a moment's notice (it's not that big a network). Not the best solution, but far better than the State of Virginia, apparently. We've already had a couple of hiccups that this arrangement worked great through. The users didn't even notice.

I'm not saying this to brag. We have a non-profit-sized budget (read: shoestring budget). If we can do it on our budget, then so should a US state.

Re:Money by PaulIsTheName · 2010-08-27 04:41 · Score: 1

When are people going to stop trusting business people for technical decisions?
The moment tech people accept that taking risk of system failure to save cost is an acceptable business decision sometimes. I agree that this story proves that you need reasonable risk assessment to do that.

We don't talk about backups and failovers just to sound cool.
Yes we do. Too.

Single points of failure... by forkfail · 2010-08-27 04:23 · Score: 1

.... rrrr bad, m'kay?

--
Check your premises.

Typical liberal overreaction by BitHive · 2010-08-27 04:27 · Score: 5, Funny

Guys, accidents happen. This "Northrop Grumman", whoever they are, will no doubt be fired and not receive any more contracts once word of this gets out. This will put pressure on them to provide better services, or be out-competed by other entrepreneurs. Our free market system works, you just need to expect this kind of thing when it's government doing the hiring.

Re:Typical liberal overreaction by Ben+Chu · 2010-08-27 04:35 · Score: 1

How is not incorporating basic redundancy into your SAN an "accident"?
Re:Typical liberal overreaction by Fjandr · 2010-08-27 04:39 · Score: 1

Wooooooooosh!
Re:Typical liberal overreaction by plbowler · 2010-08-27 04:41 · Score: 1

uhhhhhhhh This is a joke right? If you really don't know who they are, then I can understand why you think they are at risk of losing business over this. And on what planet does the U.S. operate in a free market system? wow
Re:Typical liberal overreaction by Anonymous Coward · 2010-08-27 04:41 · Score: 0

How do you know Grumman didn't already advise them of the problem and was ignored?
Re:Typical liberal overreaction by Mr.Intel · 2010-08-27 04:42 · Score: 1

Woooooosh!

--
ASCII tastes bad dude.
Binary it is then.
Re:Typical liberal overreaction by Anonymous Coward · 2010-08-27 04:44 · Score: 0

They put in the bid for a non-redundant network. They won the bid and began building. Some people got worried about the non-redundant thing asked how much it would cost to add redundancy and got quoted a huge number.
The system was born a clusterfuck. Still a clusterfuck. Can never be changed from a clusterfuck. Cultural forces beyond my comprehension insist on a Clusterfuck.
Re:Typical liberal overreaction by EmperorKagato · 2010-08-27 04:50 · Score: 1

How do you know Grumman didn't already advise them of the problem and was ignored?
I bet you the warning was brushed aside several times.

--
----- You know you have ego issues when you register a domain in your name.
Re:Typical liberal overreaction by idiotnot · 2010-08-27 05:17 · Score: 1

They get all defensive (no pun intended) when you point out that this was one of Mark Warner's crowning achievements as governor......
Re:Typical liberal overreaction by mounthood · 2010-08-27 05:17 · Score: 1

Guys, accidents happen. This "Northrop Grumman", whoever they are, will no doubt be fired and not receive any more contracts once word of this gets out. This will put pressure on them to provide better services, or be out-competed by other entrepreneurs. Our free market system works, you just need to expect this kind of thing when it's government doing the hiring.
The problem is that it's the government selecting the vendor. If the government would just get out of the vendor-hiring-business maybe the Free Market could fix this mess.

--
tomorrow who's gonna fuss
Re:Typical liberal overreaction by Anonymous Coward · 2010-08-27 05:23 · Score: 0

Given Northrop Grumman's history with the state of Virginia, they will probably be rewarded with an extension to their contract and a bonus, rather than being fired.
Re:Typical liberal overreaction by Daniel+Dvorkin · 2010-08-27 06:54 · Score: 1

The problem is that it's the government selecting the vendor. If the government would just get out of the vendor-hiring-business maybe the Free Market could fix this mess.
I'm not sure if you're joking or not. If you're serious ... um, who do you suggest should hire people to run the government's servers, other than the government?

--
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
Re:Typical liberal overreaction by adolf · 2010-08-27 12:32 · Score: 1

I think you're missing the point: If the government would just get out of the business of governing, they wouldn't need servers to begin with.

--
Kid-proof tablet..
Re:Typical liberal overreaction by Daniel+Dvorkin · 2010-08-27 12:34 · Score: 1

And in the anarchy that would follow, no one else would need servers either. I guess that's one solution to the problem.

--
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.

Work in DMV by Anonymous Coward · 2010-08-27 04:35 · Score: 0

I work in the DMV with each jurisdiction, it is sad but Virginia is head and shoulders above Maryland and DC. Maryland's access to criminal records goes down weekly for extended periods. DC has been working to update their system to NCIC 2000 standards for 10 years. Virgina has put in more money then either jurisdiction and usually they are the most coordinated.

Re:Work in DMV by EmperorKagato · 2010-08-27 04:51 · Score: 1

Meanwhile in the 49 other states....

--
----- You know you have ego issues when you register a domain in your name.
Re:Work in DMV by pnutjam · 2010-08-27 05:42 · Score: 1

48

--
Cheap storage VM.
Re:Work in DMV by QuantumBeep · 2010-08-30 14:07 · Score: 1

Uh, 47
Re:Work in DMV by pnutjam · 2010-08-31 03:05 · Score: 1

46

--
Cheap storage VM.

Typical Republican Corruption by jedidiah · 2010-08-27 04:38 · Score: 0

> Guys, accidents happen. This "Northrop Grumman", whoever they are, will no doubt be fired
> and not receive any more contracts once word of this gets out. This will put pressure on
> them to provide better services, or be out-competed by other entrepreneurs. Our free market
> system works, you just need to expect this kind of thing when it's government doing the hiring.

What? Are you joking? Do you even know who these people are?

At worst they will get a pat on the back after this. They are
an incestuous government contractor. That's why they got this
job and someone else didn't to begin with. The real IT outfits
can't because the great advantage that legacy players have here.

--
A Pirate and a Puritan look the same on a balance sheet.

Re:Typical Republican Corruption by oodaloop · 2010-08-27 04:47 · Score: 1

Yes, he was joking, I there's a whoooosh around here somewhere for you.

--
Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.
Re:Typical Republican Corruption by Anonymous Coward · 2010-08-27 04:51 · Score: 0

Actually, even though this sub-contract has been enthusiastically supported by Republicans all along the way, the original privatization of the Virginia IT infrastructure was spear-headed by Democratic governor Mark Warner (now the senior senator from Virginia), and has been supported just as enthusiastically by Democrats. Also, it will be very hard to "crack the whip" on Northrop Grumman, since the present Virginia administration bent over backwards to get NG to locate their world headquarters in Virginia rather than Maryland, both of which are near Washington DC, where the real money lives, as opposed to the chump change 2.4 billion contract with VA. "Incestuous" in this case is far too mild a word for what is actually going on.
Re:Typical Republican Corruption by Anonymous Coward · 2010-08-27 04:58 · Score: 0

Wait, what color is this state again? How about it being political corruption, not Republican corruption. Last I checked the Democrats had more lead news stories over corruption recently than Republicans, and that's saying something. If there's a politician involved, it's probably corrupt or on it's way to being corrupt.
Note: I'm not Democrat or Republican, I just hate the idiotic pot calling the kettle black BS.
Re:Typical Republican Corruption by Anonymous Coward · 2010-08-27 05:43 · Score: 1, Informative

Actually, from the article, you would have seen that the contract was signed back in 2005, when Virginia enjoyed the presence of Mark Warner, Democrat, and now US Senator for Virginia.
Amusingly, Aneesh Chopra, the current CTO of the Obama administration, was the Virginia Secretary of Technology starting shortly after this was signed, and he never added redundancy to the service contract. This was during Warner's tenure and during Tim Kaine's (D-Va) tenure.
Also, counter to your argument, it was actually Bob McDonnell, the current Republican Governor, that renegotiated the contract to include redundancy.
With all of that said, I do not think Northrop Grumman was the best fit for this job and after so many egregious failures, they deserve to have their contract reworked in VA's favor, but bureaucracy being what it is, regardless of party politics, I doubt this will change. I really feel like this kind of contract could have gone to a small-to-medium sized VA business that could have handled it extremely well, and locally, for much less. The real sad thing is that the guy who's largest job was to oversee this contract, and did nothing, is now the CTO for the entire country. I don't care what party you are, that's a scary thought.
Re:Typical Republican Corruption by sjames · 2010-08-27 07:15 · Score: 1

The real IT outfits are deeply disadvantaged by feeling the need to actually deliver on the contract. That drives costs up and caps promises.

Ok, this really sucks!!!!!!! I know why and can by Anon-Admin · 2010-08-27 04:40 · Score: 1

not say. The F***Ing NDA stops me from saying anything about the stuff I saw in NGC's IT.

Well, I guess I can say it is BROKE NOW and you have to fix it. Told you so!

Re:Ok, this really sucks!!!!!!! I know why and can by Ironhandx · 2010-08-27 05:03 · Score: 1

NDAs are such a bitch.
I think you should talk to Julian Assange at Wikileaks so that those of us that want the juicy details can get them.
P.S. Theres a fat unmarked manila envelope in it for you. We all chipped in. Its a really nice envelope.
Re:Ok, this really sucks!!!!!!! I know why and can by NeutronCowboy · 2010-08-27 05:05 · Score: 1

There's a Post Anonymously button for that reason. Given the state of their IT department, I doubt they'll be able to figure out who broke their NDA, even if police manages to give them an IP.

--
Those who can, do. Those who can't, sue.
Re:Ok, this really sucks!!!!!!! I know why and can by Wyatt+Earp · 2010-08-27 05:18 · Score: 1

This. Go down to a Starbucks post anonymously and dish.
Their network is down, 2.4 billion dollars had everything running through a single DIMM in a netgear box they got off Ebay, they won't figure out who it was.
Re:Ok, this really sucks!!!!!!! I know why and can by fishbowl · 2010-08-27 05:38 · Score: 1

Show each individual under NDA a different, completely false but very juicy tidbit.

--
-fb Everything not expressly forbidden is now mandatory.
Re:Ok, this really sucks!!!!!!! I know why and can by NeutronCowboy · 2010-08-27 08:03 · Score: 1

This assumes smartness on the part of the department. Looking at the clusterfuck that is the current meltdown, I doubt they have the smarts for that.

--
Those who can, do. Those who can't, sue.

Northrop Grumman? Thats why... by Nadaka · 2010-08-27 04:44 · Score: 1

My company works on a project that N G lost on a re-compete bid. I can not go much into details, but suffice it to say: I am not at all surprised that they screwed up maintenance and management based on what I have had to deal with on the software they developed.

Northrup Grumman by fermion · 2010-08-27 04:47 · Score: 1

This is what you get for hiring a military contractor to do a civilian persons job. All 2.5 billion gets you in the military is a manger and toilet seat. You don't start getting functional hardware until the budget reaches 100 billion.

--
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black

What brand? by CambodiaSam · 2010-08-27 04:48 · Score: 1

Anyone know what brand of SAN went down? My company had a similar issue where our SAN had a major outage, and the vendor claimed it was "an error that never happens, we swear".

Re:What brand? by Anonymous Coward · 2010-08-27 05:13 · Score: 0

I have a confession: I slept with your wife. But it NEVER happens, I swear! You believe me, right? Surely you can't be mad until there are *two* smoking guns...
Re:What brand? by Anonymous Coward · 2010-08-27 05:49 · Score: 1, Informative

The following document contains a detailed inventory of the Storage systems used by them, and we can presume it was one of these that failed.
Full details of the contracts also enclosed.
http://www.google.com/url?url=http://www.vita.virginia.gov/procurement/contracts/docs/abstracts/va-040330-stk.pdf&rct=j&sa=X&ei=yfl3TOOLCI-54gbuteG0Bg&ved=0CBkQzgQoADAA&q=Virginia+Information+Technologies+Agency+hp+storage&usg=AFQjCNGRQFneR1OJXIxQGO0mYbizW67bow
Lol.
Re:What brand? by Chris+Snook · 2010-08-27 11:06 · Score: 1

There's a first time for everything. When I was at Red Hat, a customer (maybe you?) experienced a SAN-wide outage due to an error, caused by a rare hardware failure mode, that the vendor's engineers told me in private they had never seen before. It was one of the more reputable SAN vendors, and they worked with us on a kernel patch to recover from that error more intelligently. There's now a patch in the Linux kernel to gracefully recover from an error that has only been seen once outside of a hardware lab.
I've also talked to plenty of engineers and support people who had simply never heard of a particular problem before, because their companies lacked sufficiently well-organized support and bug tracking systems, and couldn't hold on to their experienced employees long enough to have someone around who knew what was going on the next time the problem came up.
In the world of enterprise computing, the law of large numbers is working against you. Some vendors understand this, and treat each novel failure as an opportunity to harden the product further. You usually pay a premium for this, but it's worth it. Others just swap the bad board and update their resumes. It sounds like NG went with the lowest bidder.

--
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.

Northrop hiring event by confused+one · 2010-08-27 04:52 · Score: 1

Funny that I should receive an email today inviting me to a Northrop Grumman Information Systems Hiring Event. The event occurs on the 25th of August and I received the email on the afternoon of the 27th. Failed there too!

Who is the vendor? by Anonymous Coward · 2010-08-27 04:54 · Score: 0

I wonder if they are using 3PAR.

Even funnier by SteveFoerster · 2010-08-27 04:59 · Score: 3, Interesting

As a leftover from when Virginia-headquartered AOL was the king of connectivity, you see license plates here in Virginia touting us as the Internet Capital.

--
Space game using normal deck of cards: http://BattleCards.org

It's not always the bureaucracy by roc97007 · 2010-08-27 05:03 · Score: 1

Ok, in this case it probably is the bureaucracy at fault. But it isn't in all cases. In my previous job we had an architect who would take it upon himself to "value engineer" a vendor's solution, with unpredictable results. I'm not sure why -- we had budget. Maybe it was his way of seeming more valuable? This led to "solutions" like a SAN cobbled together from disk arrays, controllers and switches from three different vendors that were not meant to work together, had never been tested in the chosen configuration, and had to be integrated and maintained in-house. Word rapidly got around that if you wanted reliable access to your data, you didn't put it on the corporate SAN.

What I don't fully understand is how NG could get what amounts to a quarter billion dollars a year to manage the state's IT infrastructure and still allow a situation like this to occur. I mean, I understand how it can HAPPEN, I don't understand why it's allowed to. Over and over again I've seen companies who have outsourced their infrastructure enter into a "battered wife" relationship with the vendor, lacking anyone with the authority, cojones and understanding to bring the vendor to heel and get the uptime they've paid for. Instead corporate IT management will often enter into a dark relationship with vendor sales management to spin downtime to the stockholders as teething issues, inadequate documentation, out of scope, or some other hand-waving to explain why the savings from outsourcing has been more than offset by loss of revenue, IT management essentially working for the vendor while drawing a paycheck from the company. But don't get me started...

--
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.

Re:It's not always the bureaucracy by pnutjam · 2010-08-27 05:45 · Score: 1

Hey now! I love to "value engineer", but I would never work back from a series of proprietary solutions.

And I never have a budget. I've dreamed of having a budget, must be nice.
I would stroke it and love it and call it George...

--
Cheap storage VM.

Offer by XanC · 2010-08-27 05:09 · Score: 1

I'll do it for $2.3 billion!

Hardware fails, Salesman fail and budgets=tight by cjdavis618 · 2010-08-27 05:21 · Score: 0

While there is more to this story than meets the eye, there is no excuse for not having redundancy if you are a state body. It could be a case that there is a backup of the data and maybe that the needed parts to fix the issue are not availible yet or haven't been delivered. Nevertheless, without details of the infrastructure, we cannot jump to conclusions of why this happened. Most contracting companies like that clearly state that they will design the system to run and build it, but backup managment is the responsibilty of the customer. That is the normal CYA tactic. We don't know that NG was even tasked with the Backup or redundancy. I will be looking for more of this to come to light in regards to the actual cause and resolution.

You want to know how it can happen? by MikeRT · 2010-08-27 05:27 · Score: 1

What I don't fully understand is how NG could get what amounts to a quarter billion dollars a year to manage the state's IT infrastructure and still allow a situation like this to occur. I mean, I understand how it can HAPPEN, I don't understand why it's allowed to.

What makes you think Northrop Grumman had a choice? They still work for the state IT department at the end of the day. If the state IT department says "buy this POS because it's cheaper and don't build in redundancy because it's too expensive," then those are the NGC employees' marching orders.

This likely happened because of a perfect storm (I feel dirty even using that term) of government cheapness, government contractors lacking backbone and an event ramming the two together at supercollider speeds. I bet you right now there are admins on both sides of the contractor/employee divide right now saying "cheap sons of bitches wouldn't do $X" because that is usually how these things work.

Re:You want to know how it can happen? by roc97007 · 2010-08-27 05:51 · Score: 1

> What makes you think Northrop Grumman had a choice? They still work for the state IT department at the end of the day.
There are typically very high penalties for not meeting your service levels. A 24 hour unplanned outage can blow a half year's profits for the contract. Like any outsourcing company, NG did have a choice -- don't take that contract under those conditions.

--
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.

oblig. quote by StripedCow · 2010-08-27 05:30 · Score: 1

There's something rotten in the state of Virginia?

--
If Pandora's box is destined to be opened, *I* want to be the one to open it.

It happens by kilodelta · 2010-08-27 05:30 · Score: 3, Informative

But $2.4 billion over ten years comes out to $240,000,000 per YEAR! With that kind of money they could replace their infrastructure a few times over every year.

This is a clear example of the malfeasance that happens when government gets corrupted by corporate interests. Taxpayers in VA should be up in arms about this one.

Here's my story of state agency screw-ups. Two jobs ago I was working for the Secretary of State's office here. We had the opportunity and funding to get our IT infrastructure in order when the Help America Vote Act (HAVA) became law. We were able to build out a secure and redundant room to house our critical infrastructure.

Physical access by key and alarm code only, Redundant power which included an APS Symmetra UPS system, backed up by a 125kW natural gas fired generator. Even made sure to extend tendrils from the redundant power out to the MDF so the ISP could use our power system. Also had redundant cooling tied to the generator.

The one Achilles Heel of the operation was DNS. Ours was provided from outside our space.Suggested they build a zone locally that way we'd have DNS services if the state's went down. But they quashed it as being too difficult! Ut si!

Well one day there's a massive power outage in the city. They were still up and running, lights on, air conditioning on but couldn't get in or out of the internal network even though the ISP circuits were still up. Yup, DNS!

Re:It happens by Anonymous Coward · 2010-08-27 07:37 · Score: 0

I disagree that 240M/year is too much for a state level IT system. Consider this:
* Thousands and thousands of workstations
* Help desk for thousands and thousands of state government users
* Hundreds of servers, hosting thousands of websites
* Hosting, and perhaps development of perhaps 100 web applications
When you consider security, redundancy, and legal complications, this is a completely reasonable fee. Fubars happen, and they ought to learn or thing or two (both the government and the contractor), but getting all up in arms about this is stupid.
Re:It happens by CAIMLAS · 2010-08-27 08:17 · Score: 1

Are you kidding? THat's a trivial amount to:
* maintain tens if not hundreds if not thousands of proprietary (legacy) applications
* maintain the many, many workstations
* maintain the fabric for many, many workstations
* maintain the servers which provide services, many of which are interconnected and do not cope with modern technologies well.
* maintain the storage for all of that
* SECURE all of the above
* make it as fault tolerant as possible
Shit, I suspect the Cisco contract is probably a good 3rd of that per year alone.

--
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
Re:It happens by Anonymous Coward · 2010-08-27 08:23 · Score: 0

Ken Cuccinelli (AG) will be right on it!
No? Oh, right: free enterprise at work. Gotcha. That and he's busy bashing gays, suing UVA over the temperature hockey stick (Mann), and covering up the bare breasts on the state seal. Nevermind, won't be hearing from him about this.
Re:It happens by Anonymous Coward · 2010-08-27 09:51 · Score: 0

With that kind of money they could replace their infrastructure a few times over every year

But you forget that the gov't department needs to pay all those pensions, medical, insurance, and incentives. And don't forget all those $150/hr IT contractors, their product upgrades and their $250/hr tech consultants.

That sort of leaves about $-50K to spend on the infrastructure within a year's budget.
Re:It happens by Anonymous Coward · 2010-09-03 04:10 · Score: 0

DNS is very often overlooked as one of the most critical parts of a network. People just don't get how important that piece of infrastructure really is.

Reminds me of the good ol' days by Locke2005 · 2010-08-27 05:31 · Score: 1

In my first job, I changed the boot-up message on the VAX to "If only my girlfriend when down as often as this computer!" I kinda assumed it would scroll up off the terminal and nobody would see it. It, uh, didn't. One of our female programmers, who was famous for overreacting, came into work and threw a hissy fit. We fixed the message and decided to tell everyone we couldn't figure out who put it there. This is why you shouldn't give all developers administrator privileges!

--
I've abandoned my search for truth; now I'm just looking for some useful delusions.

This is what Politics in VA is all about by Anonymous Coward · 2010-08-27 05:32 · Score: 1, Insightful

This is what Politics in VA is all about.

Favors handed out; tax money wasted.; public screwed.

Rinse and repeat.

service level agreements by Anonymous Coward · 2010-08-27 05:49 · Score: 0

Which is why service level agreements are so important. You never have to fire them. When their profit margin on the project hits zero, they'll quit.

I heard... by Anonymous Coward · 2010-08-27 05:53 · Score: 1, Insightful

That they were going to do "Remediation" of the NAS when the problem started, and they had EMC guys on site, and everything. They must have killed the primary when they attached the secondary. Don't wait for a weekend outage window, lets just do it now on a Tuesday afternoon at 2pm. No one will know... oops....

Re:I heard... by MightyMartian · 2010-08-27 05:56 · Score: 1

Chuckle...
My wife gets pissed when I have to stay late or go in on the weekend to replace a switch or move some wires around. "Plumbers don't do that... Electricians don't do that..." she says. "No, they don't, and everybody gets pissed off when you can't flush the toilet and all the lights are off."

--
The world's burning. Moped Jesus spotted on I50. Details at 11.

What do they have that could be down? by dazedNconfuzed · 2010-08-27 06:04 · Score: 1

Whenever I drive thru Virginia (up I-81) there's a sign announcing "Entering Virginia's Technology Corridor" which is followed by hundreds of miles of rolling green pastures.

What, there was a proliferation of cow-tipping?

--
Can we get a "-1 Wrong" moderation option?

Happened before? by mr100percent · 2010-08-27 06:12 · Score: 1

Reminds me of a Classic TheDailyWTF: I'm Sure You Can Deal

Grrrr to the incredulous... by bartwol · 2010-08-27 06:12 · Score: 1

To anybody who feels incredulous at the notion of a single point of failure taking down a purportedly redundant system:I suspect you have limited experience with the issues and challenges of managing a very large system infrastructure. The complexity of such systems goes well beyond the knowledge of any individual, so notions of fault tolerance across the enterprise are highly theoretical. Even with extensive planning and testing, the gotcha is in what you don't know. Sometimes, one of those What-You-Don't-Knows reveals itself, and that is when it first becomes known.

The need for continued live operation of production systems typically precludes the opportunity to test them as realistically or extensively as one would wish. In fact, across large organizations and locations and departments and applications, systems managers don't even attempt to assert that they are free of single-points-of-failure, nor do they provide guarantees of non-stop operation. Real attempts at non-stop fail-safe systems are generally limited to narrower, truly mission critical applications such as aeronautical systems where lives or huge measures of capital depend upon system availability. Such criticality can rarely be ascribed to administrative systems, and they therefore rarely get the attention or funding needed to build and assure non-stop operation. And rightfully so...the cost of non-stop operation is not justified by the costs/risks of occasional failures.

So for those of you who assert that Virginia's systems should never go down, or shouldn't go down for more than 24 hours, I ask: How do you justify that assertion? Does it have a cost/benefit basis, or is it perhaps just a "soft" assertion?

Northrup Grumman outsource part of there own IT as by Anonymous Coward · 2010-08-27 06:14 · Score: 0

Northrup Grumman outsource part of there own IT as well. They don't own any of there own hardware no they leasing it and the leasing firm as there own tech guys as well.

I do it by Anonymous Coward · 2010-08-27 06:17 · Score: 0

I'll manage the system for you. It will only cost you $0.37 and transferred to my account in India.

Not the only failure VITA has had by Anonymous Coward · 2010-08-27 06:32 · Score: 1, Informative

They tried to do a full renumbering of the state's IP address space. This has morphed into a MPLS rollout when it turned out that too much was breaking as they moved various offices around.

State workers hate the whole VITA idea - it has been nothing but disaster and failure since it started.

I love the idea that my tax dollars have been funding this clusterfuck.

Re:It's always schedule by sjames · 2010-08-27 07:25 · Score: 1

Not putting all of your eggs in one basket, even a double walled basket with 2 handles and shock absorbers, would be a good start.

Same as Harris- 2009 Air Traffic Control sys prob by BubbaDave · 2010-08-27 07:29 · Score: 1

Lets privatize our most important infrastructure!

When a Salt Lake City router went offline, only government telecom contractor Harris knew that the backup card was not immediately available and one technician had access to where it was kept. Meanwhile, hundreds of aircraft and thousands of passengers were thrown off schedule as the lack of an FAA filing system left pilots submitting flight plans manually.
http://www.eweek.com/c/a/Enterprise-Networking/The-Story-Behind-FAAs-FlightPlan-System-Crash-773289/
http://www.eweek.com/c/a/Data-Storage/FAAs-FlightPlan-System-Crashes-Again-Delays-Hundreds-of-US-Flights-199160/

Dave

Typical State Job Interview by hackus · 2010-08-27 07:56 · Score: 1

So tell us a little bit about your education Mr. X.

"Well, I have a certificate in Microsoft Administration and a computer science degree, plus I am Cisco certfied."

Oh excellent!!! Thank you for your time.

So tell us a little bit about your education Mr. Y.

"Well, I have a degree in computer science and I ran several storage area networks for several years now from my previous employer Widgets are Us"

But, do you have any certifications?

"No."

Thanks Mr. Y. It has been nice speaking with you, don't call us, we will call you.

Mr. X gets hired, and promptly tanks the whole storage network for the entire state.

-Hack

--
Got Geometrodynamics? Awe, too hard to figure out? Too bad.

Having had... by Rhys · 2010-08-27 08:05 · Score: 1

Memory go bad in a "san device" (I say in quotes because nobody in their right mind would actually think a singlepathed non-redundant disk array is really san-grand hardware) from a fruit-flavored vendor before, I can actually have some pity for the guys responsible/working on it. Debugging it is a great time too, because your filesystem rebuild generally works. As does copying small amounts of data. It is only once you try to copy a couple terabytes things go to hell.

Filesystem data and inode corruption both coming and going. Best part is fsck of course just makes things worse as it detects the real errors and the fake errors induced by reads of the bad ram.

Luckily we had backups.

--
Slashdot Patriotism: We Support our Dupes!

money saving measure! by Thud457 · 2010-08-27 08:49 · Score: 1

S.R. Hadden: First rule in government spending: why build one when you can have two at twice the price?

--

the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff

Re:money saving measure! by ultranova · 2010-08-27 09:55 · Score: 2, Interesting

First rule in government spending: why build one when you can have two at twice the price?

And sometimes that's exactly the right approach, except you should really build three or four or ten. One might argue that that's the very purpose of the government: to force inefficiency where short-term self-interest would result in long-term disaster - in other words, to avoid the tragedy of the commons.

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

That's not the worst of it. by Anonymous Coward · 2010-08-27 09:36 · Score: 0

If you think that's bad, Virginia Tech's main educational portal, Scholar, has been down for like two hours now!

Acronym Fail by Nefarious+Wheel · 2010-08-27 09:50 · Score: 1

Redundant Array of Inexpensive Disks. RAID. Ok, maybe that scared them. Redundant Array of Raid Controllers - RARC? Nope, sounds Chinese. How about Redundant Infrastructure Array Audits? Nope, than definitely will not do...

--
Do not mock my vision of impractical footwear

You think this is bad... by breser · 2010-08-27 14:43 · Score: 1

The Kansas Department of Health has had their systems offline for nearly a month due to a hard drive failure. As a result nobody can get birth certificates.

http://www.google.com/hostednews/ap/article/ALeqM5iWdp8MfL7qrxjB8X8UWvKYC8Jw-AD9HRDAI00

Slashdot Mirror

State of Virginia Technology Centers Down

190 comments