WGA Meltdown Blamed On Human Error
Erris writes "As commentators like Ars Technica slam WGA as deeply flawed, Microsoft is blaming human error and swears it won't happen again. 'Alex Kochis, Microsofts senior WGA product manager, wrote in a blog posting that the troubles began after preproduction code was installed on live servers. ... rollback fixed the problem on the product-activation servers within 30 minutes ... but it didnt reset the validation servers. ... "we didnt have the right monitoring in place to be sure the fixes had the intended effect"' Critics were not impressed. 'A system thats not totally reliable really should not be so punitive, said Gartner Inc. analyst Michael Silver. Michael Cherry, an analyst at Directions on Microsoft in Kirkland, Wash., said he was surprised that it was even possible to accidentally load the wrong code onto live servers ... [and asks], "what other things have they not done?' This is not the first time this has happened, either."
I don't get it?
People make mistakes and as long as people are involved in any process they will cock up from time to time.
The point about systems not being so punitive is a valid one and should be brought up more often and louder. People who've paid money for their product should not be punished for an error on microsofts end.
You are not entitled to your opinion. You are entitled to your informed opinion. -- Harlan Ellison
Critics were not impressed. 'A system thats not totally reliable really should not be so punitive, said Gartner Inc. analyst Michael Silver. Michael Cherry, an analyst at Directions on Microsoft in Kirkland, Wash.,
WGA is a natural, if not perfect (or even good) business response to the problem of piracy (leaving out all the debate over whether it's a good or bad thing for Microsoft as a whole). But the technical implementation leaves a lot to be desired; if anything, the response to a WGA server failure should be automatic pass (fail safe) instead of an automatic fail (fail deadly).
Sure, for a 24 hour window pirates would have a free-for-all in getting perfectly valid WGA results, but at the same time legitimate customers would not be inconvenienced. As far as I can see, that's the only way to keep WGA while minimising the backlash against it.
I write bullshit
Slashdot is not about journalistic integrity, it never has been. It is about nerd topics and dupes.
ACs complaining about twitter does look like astroturfing. MS has enough money to pay a few guys to beat back public opinion on well-known public tech sites. Without facts disputing the current article, it looks like you are just pro-MS ranting against a anti-MS article without any substance.
Fact- WGA broke for a while causing many people troubles.
Fact- Some people don't like having to phone MS all the time to keep a product running.
Fact- MS has paid astroturfers to anonymously post pro-MS grassroots stuff online.
Look, most of us here work (directly or indirectly) in software. Who hasn't had a launch fail, or a product go bad, in a way that's negatively impacted customers. Such things DO happen. Usually not out of malice, and even sometimes not from carelessness--there are things that sometimes you can't catch on a test system. So to that extent, I feel for the folks who caused this problem..
So why do I call it unacceptable? Because of the difference in standards. On Microsoft's side, they are holding the user to a high level of scrutiny, and reserve the right to cripple some OS features if Microsoft believes the install is pirated. No discussions. Go directly to "aero jail".
Which is possibly understandable if their stance is "look, we're losing billions here--we need to fight piracy." But if they're going to take such radical and punitive measures as locking down OS features based on their tool, then they have to have an absolutely rock solid fail resistant totally monitored system. Basically, they need to hold WGA to a higher standard than most business software. This needs to be the gold standard if they want people to trust the system (and TFA links to a number of other reasonably well-balanced Ars articles that suggest it is not).
Oops, we forgot to monitor the validation boxes? You can't be organic about this--add monitoring for problems as they're discovered on a system this critical not just to Microsoft, but to their customers. You have to anticipate what MIGHT happen, even if "there's no way that should ever occur." You have to think of things that should never happen, but would be problematic if they did.
The fact that they failed here, if it never happens again, might not be a huge deal. But their answer shreds confidence that this is an isolated issue. The fact that this specific failure might not happen again gives me no comfort. Because their answer indicated that they didn't get it when they designed the system, and the don't get it now.
What they SHOULD have said is "boy, this was something we never thought could happen. We have fixed the issue, and are confident we have the monitoring to prevent this specific issue going forward. And we are undertaking a comprehensive review of our validation and monitoring systems to make sure nothing even remotely close to this could ever possibly happen again." Nothing less should be acceptable.
Strictly speaking, there are no tasks I do today that I couldn't do in 1997.
Speak for yourself. Just because *you personally* don't use the extra processing power, memory, and storage that are available doesn't mean that lots of others don't. For example, I'm in the middle of digitizing and OCRing 110 years of local newspapers from microfilm into archival-quality PDFs for an historical society. Quite simply, you *cannot* have too much processing power when doing OCR -- I'm running multiple instances of ABBYY FineReader Corporate on a 2x Quad Core Xeon that has been pegged for two weeks now. It's quick, multithreads across all 8 cores and does a great job, but there's simply too much data. Note that this project would have been completely impossible in 1997 -- there simply wasn't enough processing power, memory or storage available to do it on anything less than a supercomputer. And that's not even considering truly bandwidth- and processor-intensive tasks related to video, weather meodeling, etc.
As for your task, it may not have been done on single machine in a reasonable timeframe and certainly not in a point and click fashion. However you could have easily integrated the ABBY engine into a networked batch OCR solution and then hired the capacity to run it (eg: a renderfarm).
Ahhh, spoken like someone who's never done a project like this before. So easy to plan in your head on Slashdot in 30 seconds, isn't it?
If creating the required integration work to ABBYY's OCR engine to some sort of distributed processing farm wasn't cost-prohibitive (which it is -- historical societies aren't exactly made of money), how would you suggest I upload over a terabyte of raw image data in a timely fashion to said render farm? And then download it again once completed (not as big of a problem, but still an issue)?
The bigger question is whether or not to take on OCR in-house at all. If you want to sub-out OCR, then you have to wait until the scanning is complete (weeks) -- sending partial jobs via hard drive is more expensive than sending everything at once at the end. It's still too much money at the end of the day -- much, much cheaper to keep it in-house, and the QA process is better. The cheapest option is to buy the fastest server your budget permits and run it 24x7 in parallel with scanning and final PDF assembly / burning. ABBYY FineReader multithreads on recognition, but NOT on opening batches or writing out PDFs. That is the real bottleneck, and the reason it's necessary to run multiple instances.
Who is General Failure and why is he reading my hard disk?