Failed Win XP Upgrade Wipes Out UK Government Agency
Lurker McLurker writes "The BBC and the Register report that the UK Government's Department for Work and Pensions attempted to upgrade seven PCs from Windows 2000 to Windows XP, and ended up with BSODs on over 60,000 machines. I wonder if the National Health Service is regretting awarding Microsoft a £500 million contract now." The Guardian also has a good story.
They wanted that new version of Internet Explorer with the fancy built-in pop-up blocker.
Never email donotemail@WeAreSpammers.com
I can imagine it now
Intern: "Sir, Microsoft have bought out Windows XP Service Pack 2. It's had numerous bug reports of dying pcs and software not working anymore. THIS is the time to upgrade to Windows XP, then upgrade to SP2 because windowsupdate won't stop bugging the hell out of us until we do!"
Boss: "You mean we could cock something up, and it might not even be our fault for a change?! Lets pay someone vast amounts of money to do it!"
The Gaurdian reports it was a week long outage. Now, I may be completely wrong here, but surely all they had to do was restore those pcs back to their previous Windows 2000 state using the daily backups they do... I mean, it's only common sense to do backups on such a critical syst...oh, wait, nevermind.
</cynical>
Get paid to search..It's geniune and
OH SHI-
If only they had reached the conclusion hinted at in this BBC News article a year or two ago, this would not have happened.
It's certainly bad PR for Microsoft though, perhaps this will serve as a wake-up call to other governments that "other options" are out there.
Every time I hear about a big government IT fuck-up it seems to be caused by EDS. Yet the government keep awarding them contracts. Why?
Incompentent admins can turn any minor upgrade to a catastrophic failure. Don't blame M$ for this one unless there are irrefutable proof that the admins did everything by the numbers.
Quidquid latine dictum sit, altum sonatur.
Every Desktop Shutdown.
All those moments will be lost in time, like tears in rain.
It's like a thousand solitaire players suddenly cried out in frustration and then silence...
I like muppets.
Jon.
The BBC article mentions that EDS is responsible for the ugprade. They're partnered with Altiris, so I'd be willing to bet that the upgrade was carried out using the Altiris Client Management Suite.
It's a great set of tools--we own it at work and managed our own Win2k -> WinXP upgrade using the PC Transplant and Deployment Server tools, but can massively bone you if you don't do enough testing. PC Transplant, in particular, can hurt if you--that's the application that lifts your profile off of one PC and slaps it down on another, so that you don't have to re-configure your Exchange settings, Office personalizations, backup documents and application settings and bookmarks, and a whole mess of other things. When doing an OS migration, if you don't design your personality transplant template correctly, you can end up with all kinds of Win2k-specific settings stuffed into your WinXP profile, which can lead to all kinds of crazy-ass problems.
From the article: Another source says that the DWP was trialing Windows XP on a small number ("about seven") of machines. "EDS were going to apply a patch to these, unfortunately the request was made to apply it live and it was rolled out across the estate, which hit around 80 per cent of the Win2k desktops. This patch caused the desktops to BSOD and made recovery rather tricky as they couldn't boot to pick any further patches or recalls. I gather that MS consultants have been flown in from the US to clear up the mess." EDS is also thought to be flying in fire brigades."
/.
Brilliant work on the part of EDS, trying to patch the wrong systems, lord only knows what can happen then.
You could force an XPSP2 onto a 2k machine... would you still blame Microsoft for it? That seems to be the case here, EDS screwed up, and of course it's Microsoft's fault in the eyes of
Help Brendan pay off his student loans
"On another note, How did upgrading seven machines to XP BSOD 60000"
If you read the register article, it says that they were attempting to only push the update out to 7 PCs, but it actually went to all 60,000.
I would imagine they were using something like Microsofts SMS services or Bigfix to push out packages, and simply selected push out to all instead of a test community.
I don't think this is a nail in Microsofts coffin, I have seen similar things happen in the mainframe world where patches intended for dev hit live production systems with similar bad consequences. It has to count as a bad day at the office for the person pushing the button though.
It also highlights the difficulty in pushing out big updates to major networks of PCs, be they running Windows or Linux. The complexity of moving from Win NT to XP has proved so complex in my organisation that for the future Longhorn upgarde and beyond we are now looking to Citrix to allow the migrations of applications across servers and essentially use the PC as a thin client for all but core office and email apps.
Obviously these sysadmins were incompetent. Everybody knows that a BSOD is impossible under Windows XP. If they had simply upgraded the other 60,000 machines to XP first, and then updated these 7 problem systems, this whole problem would easily have been avoided.
So ... 5 working days, 60,000 PCs (= 60,000 employees?)
Assume £8/hr employee. 40 hours of work a week. 60,000 unusable systems.
=> TCO increased by £19.2m for the 8 PCs they upgraded (before costs incurred fixing the problem)! £2m TCO per system for Windows XP eh? A clear example that Windows TCO can increase rather horribly if something goes wrong, and this was a standard upgrade. It's £320 per PC if you count all 60,000 systems - that's still horrendous.
the UK Government's Department for Work and Pensions attempted to upgrade seven PCs from Windows 2000 to Windows XP, and ended up with BSODs on over 60,000 machines.
In actual fact, the Register quotes:
According to one, a limited network upgrade from Windows 2000 to Windows XP was taking place, but instead of this taking place on only a small number of the target machines, all the clients connected to the network received a partial, but fatal, 'upgrade.'
and then below it:
Another source says that the DWP was trialing Windows XP on a small number ("about seven") of machines. "EDS were going to apply a patch to these, unfortunately the request was made to apply it live and it was rolled out across the estate, which hit around 80 per cent of the Win2k desktops.
So, by merging them you get the following story:
There was a trial of seven PC's, instead of patching only those seven, the request to roll it out was accidently performed and every computer attempted to install a botched version of XP.
Somewhat slightly different to the Slashdot version wouldn't you say?
In addition, I'm pretty sure that if you accidently deployed a botched version of the linux kernel then it too would probably have a similar effect.
Avantslash - View Slashdot cleanly on your mobile phone.
When a government ends up with BSODs on 60000 computers, it can't be good for Microsoft.
No, but that doesn't necessarily mean it's bad for the rest of us!
Let's hope Congress plans to upgrade soon!
See? Even Microsoft is good for something!
was the government spokesperson. After the intro to this piece on Radio 4 this morning, her opening sentence was "Let me correct you, 20% of our workstations are functioning". Talk about a positive spin.
I have found that many MPs when questioned on anything related to technology simply say that "it is a complex issue", which to me isn't good enough when such huge amounts of money and significant impact on people's lives is involved.
There is a huge contract that'll be up for grabbs soon - EDS are preparing themselves to manage the UK national identity database and identity card scheme. This is one we could lobby our representatives on to ensure they do it right..
Where to have the debate where it might be read by those who mater:
Free service to fax your MP
Boris
Richard Allan
Tom Watson
Shaun Woodward
Citing the recent and ongoing failures such as that cited in the article, and the UK Child support agency's computer failure. as well as the NHS computer system UK
UK Laptops
Microsoft sells itself as easy to administer, what in management terms means that the systems are so /user friendly/ that any moron can administer them.
/user friendly/ GUI program.
So, admin stupidity can also be blamed on MS, it's part of the TCO studies that make the decision to buy MS.
Aside from that, a point-and-click update cannot fail so miserably. A script made by the admin, of course should, because you can assume that someone smart (and bold) enoguh to make a little script should be responsible for their decisions. Some guy clicking checkboxes shouldn't be allowed by those means to break 60000 computers, through a
GUIs for dummies should have enough checks to prevent such underiable effects, they have a sufficiently constrained domain to be able to do so. If the guy wanted to do a legal task that the tools dosnt' allow, he could always write some Visual Basic Script, and then he would be on his own. Bringing down an organization by mis-clicking checkboxes is responsability of the guy that provided the checkboxes, too.
"What happened to all the competent people??"
They emigrated, most likely. One of the problems with incompetence is that it's self-reinforcing, the competent get more and more fed up with having to deal with incompetence all day and find something better to do with their time.
different manufacturers and different configurations
You know that (re)installing Windows on a large number of systems of different types, for example when an upgrade fails, is a total fucking nightmare, yes?
At least Linux comes with 99% of drivers pre-installed. With Windows you have to find them on the net first, then find some way of getting them to the target system (because you don't have a NIC driver, remember?).
Something that makes me curious, you hear Ballmer lament about the lower TCO of windows. You hear the linux community shriek about it's lower TCO. The bottom line is really this, if your sysAdmins are less than competent and bugger up something like this which system would have a lower cost to recover? This is a really good thing to know when you are considering any enterprise system. Call it, TCCR (total cost of catastrophic recovery). Ballmer, Linux communities answer me this!
All your database are belong to us
Yes. It's not like the upgrade could detect the version of the program it's being applied to, and only install if the version matches the version it is intended for. That is completely unheard of, and would be impossible technically.
This was sarcasm, FYI.
This situation is more analogous to a wrong signal causing the door to open and then jam. And yes, such a door manufacturer deserves to be blamed.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
The question about all of this that I am left with is, how did the patch even install? Microsoft has had sanity checking on their patches for ages, checking not only the Windows version, but even service pack levels and any other prerequisites. Ever tried installing a patch intended for IE6-SP1 over plain IE6 for example? I'm assuming that this is some custom patch rolled by EDS, rather than an official Microsoft one downloadable by all and sundry. Still, the story appears to have made it onto UK prime time news, so no doubt more details will emerge...
UNIX? They're not even circumcised! Savages!
Easy, a dialog like this appeared:
"Do you want to update the machines on your network now?"
[Accept]
No cancel button.
--
Wiki de Ciencia Ficcion y Fantasia, un cuento por Fly.
you can call them senators if it makes you feel better.
...but is there any actual evidence is was a Microsoft error? I like bashing Windows as much as the next guy, but it seems this is at least as likely to be a huge fumble by the admins.
They go over budget because when a project is accurately costed, some idiot manager somewhere goes beserk and says it must be done in time-(large chunk of time) and for cost-(managers' & directors' bonuses). Knowing this most s/w projects are unrealistically timed and funded. Anyway, EDS has right royally screwed up on all the big govt. projects yet the govt. continues to use them. Is that as a result of competence?
Did he inhale?
Frankly, I think "a scant few" is pushing it ... despite the number of clueless morons, many here do have at least some idea what's going on and how to sensibly address some IT problems.
As for managing large networks of desktops, that's another very different matter. Not many people have high-level experience doing that.
My network, for example, is only thirty machines. Hardly huge. In fact, it gives me the opposite perspective on a lot of issues, because I find many of the large-site friendly features of Windows networks utterly useless for a small site, and no small-site friendly managability features to compensate.
Personally, I've trialled XP at work as a possible upgrade for our 9x machines, and come to the conclusion that it's not worth the pain. It might be good if you have the management tools, a dedicated test network, and an admin team dedicated to designing and rolling out updates. For small sites, however, it's pure hell. Even controlling how the clients update themselves is hard without an extra server to do the job. I also found accessible information for small-site management to be very thin on the ground.
We're now using thin clients for some of our network, and seeing very good results. Yes, they're Linux based - MS looked good until we figured in the CALs and the isssues with NT-based terminal server security. I'm far from floored by the results with Linux - the bugs, oh, the bugs, I'm drowning in stupid f***ing bugs. There's also more than a little totally retarded design, and the classic issues with no two apps having the same open/save dialog.
That said, for our basic users the results have been very good. They need little support, hardware and software costs are both low, and things generally run very smoothly. Trials with more demanding users aren't going as well (see above rant about bugs and bad design), but current development in the OS is addressing most of the issues I've run into and I expect to be able to move the 9x users across to the thin clients mid-late next year.
I do agree with you that managing a large collection of Linux desktops would probably be pure hell. It's awful to even think about, frankly, especially upgrades. *shudder*. My solution would be to simply not use desktops, but instead move most users to department level thin client services hanging off a redundant set of beefy servers. I'd use LDAP to store user and sytem information (yes, much like AD) as I currently do on my network. For many users, such a setup can be expected to work very well, and dramatically reduces the admin nightmare compared to Linux desktops. I also wouldn't even try to migrate all users to Linux - only basic users for whom it would work well, such as those who only need email, a browser, a word processor, and access to a couple of specific in-house apps.
As for migration - I can't possibly imagine how it could be done in a sane way. I suspect a lot of custom tools would have to be written, the migration would need to be a rolling one, and there would need to be a lot of staff on hand to handle glitches. That doesn't sound like fun to me.
The worst part of moving my users over to the thin clients was migrating their data and settings. That despite the fact that almost all of it was already on the servers, and their systems were pretty basic and very uniform. Doing it in a large company wouldn't be nice.
They're probably using something like Novadigms's Radia. And instead of linking the correct 7 PCs, they linked to all of them (misconfigured group). In that case, it's not a case if installing a patch that is installed using the new mechanisms, the "Patch Manager" simply dumps the files to all the machines that connect up using it's client, and force an overwrite.
Given, they should actually have an install script that checks the OS before it actually dumps the install package on there, but hey.
Not normally an MS apologist, but this isn't really Microsoft's problem. It's the contracted company that made the update package failing to ascribe it to the right download group.
So, the analogy. It's like some perfectly good system being installed, and someone presses the button marked 'open all doors' instead of simply open door 7.
I don't see anyone really blaming the door manufacturer here (Microsoft or the contractors), although I'd hazard a guess that the person who skipped over the part of the process that said 'double check the groups you assign this patch to' will be sorely chastised...
Upgrades NEVER work! Not for Windows 95, 98, ME, 2000, XP, Longhorn, whatever! It will never be a good idea to try and replace a MS OS without doing a clean install.
This is first day stuff.
The public sector in the UK is nothing more than unemployment benefit for the middle classes.
In my experience (having worked for both) in terms of inefficieny and stupidity, there's only one thing worse than the British Public sector and that's the British Private sector.
My company used to be part of a large public sector concern and was sold off. Since then we seem to spend nearly of our time/money:
Changing company logo and name every 6-12 months
Adding a new problem management system which we have to learn every 6 months (we currently have about 5 each of which was supposed to replace all the others).
Paying huge bonuses to upper managent.
Paying huge car allowances to middle management including those who refuse to drive.
Not giving any rises under the so-called performance related pay scheme for 4 years despite meeting profit targets because all the money has gone on the above 2 items.
Making skilled people redundant then recruiting at vast expense people with the same skills 2 months later.
Making skilled people redundant then reemploying them at twice the pay as contractors for the next 2 years because they're still needed.
Repeatedly shuffling kit from datacenter to datacenter around the country at vast expense and disruption to our customers.
Ordering expensive buffets for management meetings , 95%+ of which get thrown away.
Managers having a schedule involving meetings all over the country which means that they spend about 25 hours out of 40 driving.
Managers refusing to use video-conferencing for meetings even in the light of the above.
How many of these things happened when I was in the public sector? Virtually none. We didn't have the money to throw around on such things. We were forced to be efficient.
Also if this private sector company I'm referring to was atypically inefficient, presumably it would do so badly it would collapse or be taken over. So this implies that many private sector companies are like this.
It's very easy to slag off the public sector if you use stereotypes, generalizations and distortions.
This isn't Microsoft that ballsed it up, nor is it inherently the fault of DWP. Chances are it's an underpaid sysadmin somewhere who hit the wrong checkbox when rolling out the patch.
If someone can manage this by selecting the "wrong checkbox" then the system is broken by design.
Microsoft sell a complex system with the claim idiots can administer it. The DWP employ/contract idiots to administer a complex, but vital, system. Niether of these are "innocent parties".
If you give a chimp an Uzi with a defective trigger mechanism and a bunch of people get shot, whose fault is it: the chimp's or the Uzi's? My first networking experience was with AppleTalk; plug it in and you had a network. I was subsequently required--with co-worker--to learn everything we could about Windows networking so we could implement it in one of our products.
My co-worker and I spent the next period AMAZED that Windows networking even worked at all. The system of domain controllers and WINS servers and browse lists and host files... it's too byzantine to be believed. There is, without doubt, a corporate network somewhere that could be comopletely undone by someone opening a wireless laptop in the wrong place at the wrong time. Add Windows XP and the attendant SP2 fun they're having and you get chaos.
Yes, those delightful folks at EDS are the chimps in this scenario, but Microsoft's products are definitely the defective Uzi. And I note that the BBC News article studiously avoided mentioning either of them. Hmm... Microsoft wouldn't be doing everything it can to tamp down this PR disaster, would it?
Naaah!