WGA Meltdown Blamed On Human Error

← Back to Stories (view on slashdot.org)

WGA Meltdown Blamed On Human Error

Posted by Zonk on Monday September 3, 2007 @12:51AM from the kinda-of-big-for-an-oopsie dept.

Erris writes "As commentators like Ars Technica slam WGA as deeply flawed, Microsoft is blaming human error and swears it won't happen again. 'Alex Kochis, Microsofts senior WGA product manager, wrote in a blog posting that the troubles began after preproduction code was installed on live servers. ... rollback fixed the problem on the product-activation servers within 30 minutes ... but it didnt reset the validation servers. ... "we didnt have the right monitoring in place to be sure the fixes had the intended effect"' Critics were not impressed. 'A system thats not totally reliable really should not be so punitive, said Gartner Inc. analyst Michael Silver. Michael Cherry, an analyst at Directions on Microsoft in Kirkland, Wash., said he was surprised that it was even possible to accidentally load the wrong code onto live servers ... [and asks], "what other things have they not done?' This is not the first time this has happened, either."

4 of 250 comments (clear)

Min score:

Reason:

Sort:

"won't happen again"? by haeger · 2007-09-03 01:00 · Score: 5, Insightful

So, if it's human error that caused the problem, how can the swear that it won't happen again? Will there be no more humans working at microsoft anymore?
I don't get it?
People make mistakes and as long as people are involved in any process they will cock up from time to time.

The point about systems not being so punitive is a valid one and should be brought up more often and louder. People who've paid money for their product should not be punished for an error on microsofts end.

.haeger

--
You are not entitled to your opinion. You are entitled to your informed opinion. -- Harlan Ellison
Re:It's a fair point by Anonymous Coward · 2007-09-03 01:06 · Score: 5, Insightful

Sure, for a 24 hour window pirates would have a free-for-all in getting perfectly valid WGA results.

Actually, pirates would probably very quickly figure out how to set the WGA server failure condition in Windows to get the automatic pass without ever actually contacting the real WGA servers, which would render WGA completely worthless. Well... more so.

I don't use Windows, can't stand Microsoft, and had a hearty laugh at the news of the WGA meltdown, but the problem is not as easy to solve from a technical standpoint as you believe.
Re:Zoom by gatzke · 2007-09-03 01:08 · Score: 5, Insightful

Slashdot is not about journalistic integrity, it never has been. It is about nerd topics and dupes.

ACs complaining about twitter does look like astroturfing. MS has enough money to pay a few guys to beat back public opinion on well-known public tech sites. Without facts disputing the current article, it looks like you are just pro-MS ranting against a anti-MS article without any substance.

Fact- WGA broke for a while causing many people troubles.

Fact- Some people don't like having to phone MS all the time to keep a product running.

Fact- MS has paid astroturfers to anonymously post pro-MS grassroots stuff online.
Not an acceptable answer by Anonymous Coward · 2007-09-03 01:14 · Score: 5, Insightful

Look, most of us here work (directly or indirectly) in software. Who hasn't had a launch fail, or a product go bad, in a way that's negatively impacted customers. Such things DO happen. Usually not out of malice, and even sometimes not from carelessness--there are things that sometimes you can't catch on a test system. So to that extent, I feel for the folks who caused this problem..

So why do I call it unacceptable? Because of the difference in standards. On Microsoft's side, they are holding the user to a high level of scrutiny, and reserve the right to cripple some OS features if Microsoft believes the install is pirated. No discussions. Go directly to "aero jail".

Which is possibly understandable if their stance is "look, we're losing billions here--we need to fight piracy." But if they're going to take such radical and punitive measures as locking down OS features based on their tool, then they have to have an absolutely rock solid fail resistant totally monitored system. Basically, they need to hold WGA to a higher standard than most business software. This needs to be the gold standard if they want people to trust the system (and TFA links to a number of other reasonably well-balanced Ars articles that suggest it is not).

Oops, we forgot to monitor the validation boxes? You can't be organic about this--add monitoring for problems as they're discovered on a system this critical not just to Microsoft, but to their customers. You have to anticipate what MIGHT happen, even if "there's no way that should ever occur." You have to think of things that should never happen, but would be problematic if they did.

The fact that they failed here, if it never happens again, might not be a huge deal. But their answer shreds confidence that this is an isolated issue. The fact that this specific failure might not happen again gives me no comfort. Because their answer indicated that they didn't get it when they designed the system, and the don't get it now.

What they SHOULD have said is "boy, this was something we never thought could happen. We have fixed the issue, and are confident we have the monitoring to prevent this specific issue going forward. And we are undertaking a comprehensive review of our validation and monitoring systems to make sure nothing even remotely close to this could ever possibly happen again." Nothing less should be acceptable.