WGA Meltdown Blamed On Human Error

← Back to Stories (view on slashdot.org)

WGA Meltdown Blamed On Human Error

Posted by Zonk on Monday September 3, 2007 @12:51AM from the kinda-of-big-for-an-oopsie dept.

Erris writes "As commentators like Ars Technica slam WGA as deeply flawed, Microsoft is blaming human error and swears it won't happen again. 'Alex Kochis, Microsofts senior WGA product manager, wrote in a blog posting that the troubles began after preproduction code was installed on live servers. ... rollback fixed the problem on the product-activation servers within 30 minutes ... but it didnt reset the validation servers. ... "we didnt have the right monitoring in place to be sure the fixes had the intended effect"' Critics were not impressed. 'A system thats not totally reliable really should not be so punitive, said Gartner Inc. analyst Michael Silver. Michael Cherry, an analyst at Directions on Microsoft in Kirkland, Wash., said he was surprised that it was even possible to accidentally load the wrong code onto live servers ... [and asks], "what other things have they not done?' This is not the first time this has happened, either."

3 of 250 comments (clear)

Min score:

Reason:

Sort:

It's a fair point by Joe+Jay+Bee · 2007-09-03 01:01 · Score: 5, Interesting

Critics were not impressed. 'A system thats not totally reliable really should not be so punitive, said Gartner Inc. analyst Michael Silver. Michael Cherry, an analyst at Directions on Microsoft in Kirkland, Wash.,

WGA is a natural, if not perfect (or even good) business response to the problem of piracy (leaving out all the debate over whether it's a good or bad thing for Microsoft as a whole). But the technical implementation leaves a lot to be desired; if anything, the response to a WGA server failure should be automatic pass (fail safe) instead of an automatic fail (fail deadly).

Sure, for a 24 hour window pirates would have a free-for-all in getting perfectly valid WGA results, but at the same time legitimate customers would not be inconvenienced. As far as I can see, that's the only way to keep WGA while minimising the backlash against it.

--
I write bullshit
Re:Have we gone backwards? by PeeAitchPee · 2007-09-03 01:21 · Score: 5, Interesting

Strictly speaking, there are no tasks I do today that I couldn't do in 1997.

Speak for yourself. Just because *you personally* don't use the extra processing power, memory, and storage that are available doesn't mean that lots of others don't. For example, I'm in the middle of digitizing and OCRing 110 years of local newspapers from microfilm into archival-quality PDFs for an historical society. Quite simply, you *cannot* have too much processing power when doing OCR -- I'm running multiple instances of ABBYY FineReader Corporate on a 2x Quad Core Xeon that has been pegged for two weeks now. It's quick, multithreads across all 8 cores and does a great job, but there's simply too much data. Note that this project would have been completely impossible in 1997 -- there simply wasn't enough processing power, memory or storage available to do it on anything less than a supercomputer. And that's not even considering truly bandwidth- and processor-intensive tasks related to video, weather meodeling, etc.
Re:Have we gone backwards? by PeeAitchPee · 2007-09-03 02:25 · Score: 5, Interesting

As for your task, it may not have been done on single machine in a reasonable timeframe and certainly not in a point and click fashion. However you could have easily integrated the ABBY engine into a networked batch OCR solution and then hired the capacity to run it (eg: a renderfarm).

Ahhh, spoken like someone who's never done a project like this before. So easy to plan in your head on Slashdot in 30 seconds, isn't it?

If creating the required integration work to ABBYY's OCR engine to some sort of distributed processing farm wasn't cost-prohibitive (which it is -- historical societies aren't exactly made of money), how would you suggest I upload over a terabyte of raw image data in a timely fashion to said render farm? And then download it again once completed (not as big of a problem, but still an issue)?

The bigger question is whether or not to take on OCR in-house at all. If you want to sub-out OCR, then you have to wait until the scanning is complete (weeks) -- sending partial jobs via hard drive is more expensive than sending everything at once at the end. It's still too much money at the end of the day -- much, much cheaper to keep it in-house, and the QA process is better. The cheapest option is to buy the fastest server your budget permits and run it 24x7 in parallel with scanning and final PDF assembly / burning. ABBYY FineReader multithreads on recognition, but NOT on opening batches or writing out PDFs. That is the real bottleneck, and the reason it's necessary to run multiple instances.