Anatomy of the VA's IT Meltdown

← Back to Stories (view on slashdot.org)

Anatomy of the VA's IT Meltdown

Posted by Zonk on Tuesday November 20, 2007 @05:04AM from the use-your-words dept.

Lucas123 writes "According to a Computerworld story, a relatively simple breakdown in communications led to a day-long systems outage within the VA's medical centers. The ultimate result of the outage: the cancellation of a project to centralize IT systems at more than 150 medical facilities into four regional data processing centers. The shutdown 'left months of work to recover data to update the medical records of thousands of veterans. The procedural failure also exposed a common problem in IT transformation efforts: Fault lines appear when management reporting shifts from local to regional.'"

14 of 137 comments (clear)

In other words.... by Like2Byte · 2007-11-20 05:08 · Score: 4, Insightful

Business as usual for the VA.

Once again, the VA shows its true colors and mucks up another project funded by taxpayers for the well-being of our nations Veterans. A more screwed up organization one will not find.
1. Re:In other words.... by Enry · 2007-11-20 06:04 · Score: 4, Informative
  
  I can follow up a bit on this, since I worked for the DVA for a few years in the early 90s. Even then, just about all records were online and searchable. A veteran that went from Albany, NY to Tampa, FL and got sick could get his records transferred overnight (electronically) between the two hospitals, and there were ways to get metadata about the veteran immediately, including recent visits at any location and reason for the visit. I imagine that improvements in networks mean that these records can be viewed immediately.
  
  At the time, there seemed to be a lot of waste (think $10,000 CD burner in 1993ish, optical cards with images and data impressed on them, etc). But they really were trying to be ahead of the game - a friend of mine showed me his green card and it was almost identical to a design I was working with when I was at the DVA. They also had mechanisms for charging back to private insurance companies in the event a veteran was only partially covered for a visit.
  
  Oh, and just about all the software that was written and in use by those hospitals are in the public domain and downloadable for free - many other hospitals use VistA as their base.
2. Re:In other words.... by Like2Byte · 2007-11-20 06:43 · Score: 4, Insightful
  
  The VA is far more than just another hospital. It is supposed to aid US Veterans of all service branchs to see to the needs of them from educational loans, purchasing a home, medical care/assitance and others. See their site: http://va.gov./
  
  If any one hospital or chain of hospitals peformed as consistantly lousey as the VA has that hospital would have been sued into oblivion decades ago. Hundreds of thousands of vets who've used the VA's services can attest. But, we can't neccessarily sue the VA because they're part of the government. Go to any VA hospital in the US. Odds are that after you pass through the pretty facade they've set up you'll find patient after patient sitting in a wheel chair or bed lined along some wall waiting for some over-worked, over-stressed and under-staffed doctor and not getting the care they deserve.
  
  The VA needs to take a lesson from the corporate world and change it's face. Rename itself, start fresh. AND START DOING THEIR G-D JOB! That's the best dismal chance they've got to make things right. As it is right now there isn't a Vet in the US or abroad that thinks highly of the VA. And if there is, I'd find 100 that would refute any positive statement made about the VA.
  
  And, yes - I'm a Vet. My Father is a Vet. My Grandfather is a Vet. My Uncle is a Vet. I don't recall them looking forward to communicating with the VA, either.
  
  In closing, if the VA *did* do their job the homeless wouldn't consist of 25% US Veterans that couldn't re-adjust to civilian life after witnessing the horrors of war!
  
  http://www.cnn.com/2007/US/11/08/homeless.veterans/
  http://www.cnn.com/HEALTH/blogs/paging.dr.gupta/2007/05/mia-in-plain-sight.html
3. Re:In other words.... by bockelboy · 2007-11-20 08:19 · Score: 3, Informative
  
  I beg to differ. If you think the VA is crap, go to a private hospital. The VA consistently ranks better than any hospital system in the US. The following article is 2 years old, but it outlines how it beats the crap out of other hospital systems:
  
  http://www.washingtonmonthly.com/features/2005/0501.longman.html
  
  If you think the VA is bad, you can always go to your favorite HMO and have a higher chance of death.
  
  Did I mention that the VA is a leader in hospital IT infrastructure and is decades ahead of other hospitals?
  
  http://en.wikipedia.org/wiki/Veterans_Health_Information_Systems_and_Technology_Architecture
  
  The VA is the largest hospital system in the US and its budget is decreased most years after adjusted for inflation. Given the predicament that Congress puts them in, they've done pretty well.
  
  However, every single mistake they make is a public headline. Private hospitals have the luxury of being sued and quietly settle for $$$. Instead, the VA has to endure lots of bad publicity.
  
  If the VA was a corporation, costs would skyrocket and even more corners would be cut. If you want to make it better, how about you ask Congress to provide adequate funding for the avalanche of people they are getting?
I see the problem by moogied · 2007-11-20 05:10 · Score: 4, Funny

Too many discplines combined..
Anatomy Medical.
centralize IT systems IT.
four regional Topographical.
Fault lines appear Seismology.
There clearly is just not enough synergy..

--
So basically, -1 troll/offtopic is really slashdots way of saying "I hate that you thought of something before me."
Assumption junction, what's your function? by digitaldc · 2007-11-20 05:15 · Score: 3, Insightful

Volpp assumed that the data center in Sacramento would move into the first level of backup -- switching over to the Denver data center. It didn't happen.

DOH! Looks like it was all just due to someone's assumption that someone else would do their job.
From my experience, you can assume things happened, but if you don't verify that they actually happened - you are DOOMED.

--
He who knows best knows how little he knows. - Thomas Jefferson
1. Re:Assumption junction, what's your function? by Qzukk · 2007-11-20 05:29 · Score: 5, Informative
  
  DOH! Looks like it was all just due to someone's assumption that someone else would do their job.
  
  DOH! Looks like someone was making assumptions without reading the article. They considered switching to the backup, but since they didn't know whether the problem was on their end or the server's end, they were afraid that switching to the backup data center would destroy that one as well.
  
  --
  If I have been able to see further than others, it is because I bought a pair of binoculars.
my 2 cents. by Brigadier · 2007-11-20 05:24 · Score: 4, Insightful

unfortunately one of the best ways to learn how well your disaster recovery system works is to have a disaster. The problem with scheduled drills is the scenarios themselves are planned out and typically not run system wide ie test the part of the system then that part of the system etc. on RTFA it seems much of the breakdown occurred because too many people assumed. There was also no centralized decision making identities who had access to all the information. All scenarios when view from there individual perspective seemed to have made the right decision. However sometimes when implementing a global recovery plan one system may have to be sacrificed by another.
awesome! by 192939495969798999 · 2007-11-20 05:25 · Score: 3, Informative

Awesome, sorry if someone already posted but I just couldn't resist the following quote:

Instantly, technicians present began to troubleshoot the problem. "There was a lot of attention on the signs and symptoms of the problem and very little attention on what is very often the first step you have in triaging an IT incident, which is, 'What was the last thing that got changed in this environment?'" Raffin said.

p.s. I am shocked at how many junior cowboy IT people remain employed, given the supposed glut of hire-able and knowledgeable folks.

--
stuff |
Zonk, you retard by sootman · 2007-11-20 05:28 · Score: 5, Insightful

I'm sure I'll get modded to -5, Flamebait, but fucking A, Zonk, Slashdot isn't a newspaper. You don't need to be so economical in your headlines. When I saw the headline, I first thought of VA Linux--you know, the guys who kinda sorta own you. "Medical centers" threw me, so I thought for a second that it might mean the state of Virginia. Then it dawned on me that you probably meant the United States Department of Veterans Affairs. I'm sure I'm not the only one.

Please, God, isn't there some kind of Editing 101 correspondence-school course we can send all these guys to? I mean, I love Slashdot to death, but please God, can you give the staff just one ounce of basic editorial skills: spelling, grammar, etc? Teach them to write for clarity, not just brevity? Maybe go for broke and touch on dupe-checking, fact-checking, changing links so they point to the original article instead of some guy's AdSense-laden blog page that says nothing more than "here's the story"?

You're EDITORS, for God's sake (even if in name only), you are indeed allowed to EDIT submissions.

--
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
Why always centralizing? by guruevi · 2007-11-20 05:31 · Score: 3, Insightful

I wonder why higher management always wants to centralize their resources. The internet protocol and subsequent many IT applications were built to be efficient in small and decentralized environments.

1) Trying to centralize gives us large expensive computers that are made out of the same components as smaller ones and thus fail just as the smaller ones do, however, ever trying to cram more crap on the same machine will bring down everything at once whenever it fails.
2) Trying to centralize has the ultimate goal to eliminate jobs but they need those people since they know all the little details and hickups their systems have. If people know a project is going to eliminate their job, they won't be cooperative. IT not being cooperative is very bad in this world where everything is computerized.
3) Eventually the same number of people is going to have to work in the centralized system just because you also centralize the problems and more problems will bring more people, more people will bring more overhead and inefficiency, more inefficiency will bring more people (at least that's the default in today's business world, throwing more people at an IT problem doesn't make it disappear faster)
4) More people in a project that was designed to be more cost efficient means the managers will have to cut expenses. Cut expenses brings underpaid people, underpaid people bring less or no experience and higher turnover, higher turnover means more cutting expenses.

Therefore: keep your local IT guy(s) and infrastructure although you can't squeeze 100% of work/day and it will bring a little more expense. The end-users have a better relationship with the guy(s) and that makes happier people. Centralizing brings more overhead, less customer-interaction with IT and thus more inefficiency throughout the business.

--
Custom electronics and digital signage for your business: www.evcircuits.com
+1 C'mon Editors by ggvaidya · 2007-11-20 06:01 · Score: 4, Funny

I had a real fun time parsing this article.

1. Looks at title: omg! Slashdot's parent company had an IT meltdown! ha-ha! But waitaminute ...
2. Looks at icon: a ... crown? The Queen? Perhaps they mean *our* overlords, VA Linux? Or is VA Linux a monarchist organisation now?
3. Looks at summary: and ... medical? Why are th... oh HANG ON WAIT A MINUTE
4. Looks at icon: I remember that! It means ... government! Crown, government, get it? So, VA Linux screwed up a government's medical system? That makes ...
5. Looks into the inner recesses of my mind: ... sense, but ... something's out of place, something's ... just ... not ... quite ...
7. Looks at lightbulb over head: of course! There *is* no VA Linux! It's Sourceforge, Inc now! But that must mean ...
6. Looks at summary: ... carefully ... the VA, why the VA, shouldn't it be ... Vir..ginia?!

Gee thanks, Zonk, just what I needed before going to sleep. Now I'll dream of the Queen in Virginia melting down medical computers for Slashdot's open source overlords. Again.

Last thing I needed ...
It happens by ACMENEWSLLC · 2007-11-20 06:27 · Score: 3, Insightful

What they were doing was a major change to their IT infrastructure. That's massive. Things happen. The fact that they were down at 17 of 128+3 (131) data centers because some IT staffer changed a port # at one of their hub data centers without following proper procedure -- that's minor.

Seems to me that things worked otherwise well is a major accomplishment. They are still on the old system and are entering in data back into that system and migrating into the new system. But it seems things went well otherwise.

Anytime you do a major shift like this, it's hard. The users hate it because they can do their job very quickly on the system they are use to, but now have to learn a new system and slow down.

Things happen.
I work at the heart of this... by Anonymous Coward · 2007-11-20 06:52 · Score: 5, Insightful

1st off... VISTA is not Windows VISTA. It's the "Veterans Health Information Systems and Technology Architecture". Do a google search on that.

VISTA runs on HP's VMS, and on top of that it runs Cache from Intersystems. (And yes it costs the tax payers a lot! But a lot less since we've been centralizing it over the last 3 or 4 years.)

It is a HUGE system.

The centralization that we're currently undergoing is massive, this problem was (IMHO) scape goated to a poor change control process.

I know what was change, I know who changed it, and I know when they changed it. However, this 'melt down' has happened three times... (Not to the same drastic outcome.) It comes down to VMS locking out logons because locks aren't being released properly. (Now you could argue that the reason locks got behind was this change... But I don't think that is the real reason because of our previous problems.)

It's that simple. Ask the VISTA manager over lunch sometime. They weren't afraid of data corruption. They were afraid if they moved the systems, the other system would lock up too with too much user load.

There goes "VISTA". Everyone logged in is fine. Everyone not on... Isn't getting on.

Now comes the bad part... No procedures!

We take 32 medical centers, and throw their IT into a data center. You 'had' clear lines of who owns what, and what happens when they go down. Now you centralize all that... Who raises the flag when something bad happens? Is it the site that has the problem? Is it someone who now controls the system at the data center? Who is responsible for what?

Oh wait... OI&T only has a dozen staff... And almost NONE of those people are technical. Everyones pay was simply moved from one appropriation to another. But what about the IT systems?!?! We moved those too, but didn't hire any permanent staff to take care of it? We just rubber banded a bunch of people together that work across the whole west coast and hand them a pager and say good luck?

Suffice it to say, we have some REALLY REALLY hard working people... And some really bad management. (Congress forcing us to do things on a time table is really annoying. Especially since they expect results, but don't expect any documentation... What do you think is going to get skipped?)

Congress: How is that data center move going!
Howard: We've moved 28 sites!
Congress: Good Job!
Howard: .:Thinks:. Too bad they don't know about everything we've short changed to make such an obscene deadline!

Then again... Howard doesn't even know everything we skip to get things done.

Bah