The Story Behind a Windows Security Patch Recall

What the... by P2PDaemon · 2007-05-04 08:47 · Score: 2, Insightful

Why are the trolls out in force here? Oh, Microsoft... Nevermind...

Fascinating by wbean · 2007-05-04 09:00 · Score: 4, Insightful

This is fascinating. The system for exiting a process is so complicated that a lot of implementations fail. In fact, it's so complicated that even Microsoft can't get it right. Sounds like an unbounded loop to me.

Re:Fascinating by Anonymous Coward · 2007-05-04 09:39 · Score: 2, Insightful

Sounds like an unbounded loop to me.
That's quite an appropriate analogy. If you RTFA, you would know that the loop in question is designed to be bounded by a guard variable/event, but they had already terminated the thread that sets the guard to the state that allows the loop to terminate.

The root cause of the hang is that most programmers are not really aware of the states involved at process termination, so they assume invalid things about the DLL process termination event -- namely that it's okay to wait for something that may have been locked/entered by a child thread.

p.s. To the sibling AC, I know you were going for funny, but they're not trying to solve the haling problem.

An error he committed? by drinkypoo · 2007-05-04 09:02 · Score: 5, Insightful

he talks about how an error he committed led to the recall of a Windows security patch.

Okay, he made an error. Why the HELL wasn't it caught in QA? Microsoft wants us to believe that the reason that we have to wait for patches is that they are getting some kind of exhaustive QA. This patch and executable were specifically created to avoid problems with invalid shell extensions. Don't you think that given that fact the thing to do would be to test it with some invalid shell extensions?

This is the reason that Windows admins have to be so much more paranoid about patches than the rest of us. A Windows patch is highly likely to be a big pile of crap that causes your system to not work properly. I think we can all remember certain service packs that broke various versions of Windows NT pretty much completely...

If you can't have confidence that security patches will fix more than they break, how can you have sufficient confidence to even install that vendor's products, let alone count on them for mission-critical applications?

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Re:An error he committed? by geekoid · 2007-05-04 09:28 · Score: 3, Insightful

While Kudos to him for taking responsibility, the QA excuse doesn't seem to fit.

IT was an error hat happened all the time, under its most basic use.

While the global OS QA might be excused for some wierd bug that happens under unforseen circumstance, this wasn't even tested to see if it fixed what it wqas supposed to.

Sounds like sloppy(i.e. none) QA to me.

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
Re:An error he committed? by bmajik · 2007-05-04 10:28 · Score: 5, Insightful

I'm a software tester at Microsoft, although I'm not involved with the Windows team or the security process.

Just so we're clear:

Microsoft is not selling you products that have gone through exhaustive QA, nor are we issuing patches that have gone through exhaustive QA.

The key word here is "exhaustive".

You can imagine that as much as it costs a business when they get a hotfix from us that breaks them, it costs us _at least_ that much in real employee hours (dollars), not to mention the direct and indirect, monetary and non-monetary costs of having to admit that we screwed up a patch.

Software testing cannot tell you how good your product is, only in what ways it doesn't appear to be bad. Every release decision is a _decision_, and its based on necessarily incomplete data put together by imperfect humans with non-infinite time.

A release decision is a culmination of many nested risk/reward tradeoffs. Sometimes, that decision gets made incorrectly, or at least gets made in a way with known or even unknown downsides.

You'll notice that the patch was an interaction problem with an antique 3rd party product. From my time doing admin work on Solaris, IRIX, and Linux machines, I can tell you the big difference between this situation and "those" situations. I never _ran_ 3rd party software on Solaris, IRIX, or Linux (well, I ran 3rd party software on linux all the time, but i just expected it to break anytime i patched anything.. it was a mandatory recompile of any dependant libraries and applications).

I also think your glasses are a little rosy. There were some IRIX patches back in the day that you couldn't back out. Or that wrecked your XFS volumes. I think in every operating system there has been at least one instance of a patch / upgrade / new version that some user opted to back out, because it hurt them and their scenarios more than it helped.

I run very little non-Microsoft software on my windows machines and thus I rarely worry about patches from MS. If you're doing something weird, you need to be more risk averse. IIRC, Microsoft's official recommendation for businesses with critical systems is to install patches in a pre-production environment to ensure compatability with the specific intricacies of your business. You can choose to play fast and loose, but you should be aware that you're making a risk/reward tradeoff decision, based on incomplete data.

Just like we have to do.

--
My opinions are my own, and do not necessarily represent those of my employer.
Re:An error he committed? by Blakey+Rat · 2007-05-04 11:46 · Score: 4, Insightful

And the manpower to run it all costs... how much?

Seriously, though, just putting all that equipment in one building would create a zeppelin-hangar-sized building. Finding any specific router or PCI modem would be near impossible. The logistical difficulties of your plan I think would be insurmountable, not even considering the manpower question.

The real point Raymond mentions is that if MS does tons of testing on all the hardware they have available, they get bad press for being slow to release patches. If not, they get bad press for having to recall buggy patches. It's a lose/lose situation for them.

--
Comment of the year
Re:An error he committed? by mosch · 2007-05-07 07:32 · Score: 2, Insightful

So, somebody installed 10.3 on what was either the oldest possible supported laptop, or possibly an unsupported laptop, then a video driver update caused some problems, all of which were fixable by somebody who fully admitted that they don't know OS X, only Unix and Windows.

Obnoxious, sure, but not really different than any other OS. (In fact I have had brand-spanking new Windows hardware that would lose video if I applied WindowsUpdate recommended driver updates.)

I'd have to be pretty stupid to think that any OS is perfect, but I'd have to be even dumber to think that your awful, whiny article (hosted on MSDN, LOL) is worth any discussion at all.

Lesson by Jeffrey+Baker · 2007-05-04 09:09 · Score: 4, Insightful

I think the lesson here is not that this guy should have been more careful about programming, it's that no amount of careful programming can overcome a stupid design. It's stupid that there are magical filenames in the form of UUIDs that cause Explorer to load and run arbitrary DLLs. You can't get around this stupidity with some kind of speculative watchdog thread that works with what sound to me like some seriously questionable heuristics.

They should have simply got rid of the magic naming system in favor of something explicit, such as a Shell Extension Interface that a shell extension must fully implement.

Honesty by florescent_beige · 2007-05-04 10:01 · Score: 5, Insightful

This illustrates the kind of employee I like to have. One who can talk about his mistakes the same way he talks about anything else work-related.

Some years ago I myself made a rather expensive mistake which involved the design of an aircraft structure. The fellow I was working for at the time had one of those razor-blade intellects and I got called into his office for a chat. When he asked me what happened I had two choices, weasel or turkey. In engineering it's always possible to talk the complicated talk and hope to obfusticate your way out of a situation, but fortunately I said "I make a mistake." And you know what? That was exactly the answer he was looking for.

You see, the most important thing is not to be perfect, it's to be honest. That's what a boss, of which I am one now, wants.

If you have a boss that doesn't want that, better watch out for yourself.

--
Equine Mammals Are Considerably Smaller

Re:Honesty by labnet · 2007-05-04 12:39 · Score: 3, Insightful

Being a Boss as well, thats exactly what our culture looks for.
Honesty, but without emotional baggage.
A stuffup is a stuffup, learn and move on.

Reading /. for so many years now, you would think 90% of posters are uber humans that never make a mistake, and be dammed if you do. Not sure if I would want to work for most of the /. crowd.

--
46137

education by Anonymous Coward · 2007-05-04 10:58 · Score: 2, Insightful

Reminds me of a famous story about Jack Welch, former GE CEO. One of the company's division managers made a mistake costing the company $10 million in one quarter. When the quarterly reports came out, he got a call from headquarters telling him to be in Welch's office in NY the next morning. Welch grilled the man for some time, asking him what he was thinking and how he could possibly lose so much money. When it seemed Welch had finished, the manager said he understood that Welch had to fire him now. To which Welch replied, "Why would I fire you when I just invested $10 million in your education?"

Re:Backwards compatibility by IamTheRealMike · 2007-05-05 06:00 · Score: 2, Insightful

Its only useful in a closed-source world where you cannot modify programs to suit the new API's.

How much open source work have you actually done? I've done a lot, and this idea is one I see very often in people who haven't done any serious API development work before. The approach of attempting to patch every app when an API changes simply doesn't scale. There's a reason all the important open source APIs (gtk, glibc, alsa, X etc) have "gone stable" in the past 5 years, and it's simply a better approach.

Anyway, ignoring the obvious (!!) problems of scaling such an approach, you are confusing two unrelated things. Microsoft can simply/clean up APIs too - they have done it with DirectX and .NET, but that's irrelevant. The problem here is that there are lots of people in the world writing software who perhaps aren't well qualified, and even the ones that are well qualified make mistakes, even with the implementation of quite simple interfaces like IUnknown. I myself have messed up IUnknown before, in fact.

The root problem that caused the hang was attempting to cleanly handle buggy software. This is a common motif in software, hell, it practically motivated the move from the Windows 9x design to the NT fully protected architecture.

The result is that even Microsoft can't get reasonably trivial things right.

Multi-threading is never trivial.

Not to mention almost all Windows software code being highly complicated compared to equivalent code on other systems.

I worked on Wine for a long time, which implements or maps the Win32 API. The complexity of Linux, Windows and MacOS X are all much the same - they are of the same design era, even OS X which is based on lots of older code at its heart. While the more modern parts of the Linux APIs like GTK+ are better than the Win32 equivalents that's just an age thing: the Win32 API has evolved over a much longer period of time. That means it's uglier (the world has learned a lot about API design since the 80s), but it also means there are far more people out there who know it, better tools support, and critically, more apps that use it!

Re:Backwards compatibility by Bill+Dog · 2007-05-05 11:15 · Score: 2, Insightful

Using a multi-threaded approach here, when SMP scalability is not an issue, suggests that either their API design is crap, and requires threading, or that their engineers are incompetent and use threads unnecessarily. Threads are never trivial - but what they were trying to do was quite trivial. Its their fault they involved threads in there.

This is one of the stupidest comments I've read here in a long time. A secondary "watchdog" thread was employed to enforce a time-out on the helper program's sniffing of a given shell extension, so in case the main thread hung trial hosting a faulty shell extension, there would still be another thread of logic outside of the infinite loop that could run and tell Windows Explorer the result.

If you knew anything about what you're trying to talk about, you'd know that multi-threading is used for these kinds of situations, as well as in GUI programming. And not just "when SMP scalability is an issue". This has nothing to so with the Win32 API design, it just was tackling a very specific problem. It doesn't mean that the Win32 API "requires threading", or that MS's engineers are incompetent, and that they used an additional thread unnecessarily here. Threads can be trivial, and this is I would say actually the most trivial case of their use. It's to their credit that they involved threads here (and might actually have been the only way), and it's to your ignorance that you don't understand any of this and got everything wrong about it.

The flaw was in doing the WaitForSingleObject() in the DLL's detach process function without specifying a timeout value. Even if you have no reason to think that the thread won't be there to signal you eventually, sometimes the unthinkable occurs.

--
Attention zealots and haters: 00100 00100

Slashdot Mirror

The Story Behind a Windows Security Patch Recall

14 of 135 comments (clear)