The Story Behind a Windows Security Patch Recall
bheer writes "Raymond Chen's blog has always been popular with Win32 developers and those interested in the odd bits of history that contribute to Windows' quirks. In a recent post, he talks about how an error he committed led to the recall of a Windows security patch."
Why are the trolls out in force here? Oh, Microsoft... Nevermind...
This is fascinating. The system for exiting a process is so complicated that a lot of implementations fail. In fact, it's so complicated that even Microsoft can't get it right. Sounds like an unbounded loop to me.
Okay, he made an error. Why the HELL wasn't it caught in QA? Microsoft wants us to believe that the reason that we have to wait for patches is that they are getting some kind of exhaustive QA. This patch and executable were specifically created to avoid problems with invalid shell extensions. Don't you think that given that fact the thing to do would be to test it with some invalid shell extensions?
This is the reason that Windows admins have to be so much more paranoid about patches than the rest of us. A Windows patch is highly likely to be a big pile of crap that causes your system to not work properly. I think we can all remember certain service packs that broke various versions of Windows NT pretty much completely...
If you can't have confidence that security patches will fix more than they break, how can you have sufficient confidence to even install that vendor's products, let alone count on them for mission-critical applications?
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
I think the lesson here is not that this guy should have been more careful about programming, it's that no amount of careful programming can overcome a stupid design. It's stupid that there are magical filenames in the form of UUIDs that cause Explorer to load and run arbitrary DLLs. You can't get around this stupidity with some kind of speculative watchdog thread that works with what sound to me like some seriously questionable heuristics.
They should have simply got rid of the magic naming system in favor of something explicit, such as a Shell Extension Interface that a shell extension must fully implement.
This illustrates the kind of employee I like to have. One who can talk about his mistakes the same way he talks about anything else work-related.
Some years ago I myself made a rather expensive mistake which involved the design of an aircraft structure. The fellow I was working for at the time had one of those razor-blade intellects and I got called into his office for a chat. When he asked me what happened I had two choices, weasel or turkey. In engineering it's always possible to talk the complicated talk and hope to obfusticate your way out of a situation, but fortunately I said "I make a mistake." And you know what? That was exactly the answer he was looking for.
You see, the most important thing is not to be perfect, it's to be honest. That's what a boss, of which I am one now, wants.
If you have a boss that doesn't want that, better watch out for yourself.
Equine Mammals Are Considerably Smaller
Reminds me of a famous story about Jack Welch, former GE CEO. One of the company's division managers made a mistake costing the company $10 million in one quarter. When the quarterly reports came out, he got a call from headquarters telling him to be in Welch's office in NY the next morning. Welch grilled the man for some time, asking him what he was thinking and how he could possibly lose so much money. When it seemed Welch had finished, the manager said he understood that Welch had to fire him now. To which Welch replied, "Why would I fire you when I just invested $10 million in your education?"
How much open source work have you actually done? I've done a lot, and this idea is one I see very often in people who haven't done any serious API development work before. The approach of attempting to patch every app when an API changes simply doesn't scale. There's a reason all the important open source APIs (gtk, glibc, alsa, X etc) have "gone stable" in the past 5 years, and it's simply a better approach.
Anyway, ignoring the obvious (!!) problems of scaling such an approach, you are confusing two unrelated things. Microsoft can simply/clean up APIs too - they have done it with DirectX and .NET, but that's irrelevant. The problem here is that there are lots of people in the world writing software who perhaps aren't well qualified, and even the ones that are well qualified make mistakes, even with the implementation of quite simple interfaces like IUnknown. I myself have messed up IUnknown before, in fact.
The root problem that caused the hang was attempting to cleanly handle buggy software. This is a common motif in software, hell, it practically motivated the move from the Windows 9x design to the NT fully protected architecture.
Multi-threading is never trivial.
I worked on Wine for a long time, which implements or maps the Win32 API. The complexity of Linux, Windows and MacOS X are all much the same - they are of the same design era, even OS X which is based on lots of older code at its heart. While the more modern parts of the Linux APIs like GTK+ are better than the Win32 equivalents that's just an age thing: the Win32 API has evolved over a much longer period of time. That means it's uglier (the world has learned a lot about API design since the 80s), but it also means there are far more people out there who know it, better tools support, and critically, more apps that use it!
Using a multi-threaded approach here, when SMP scalability is not an issue, suggests that either their API design is crap, and requires threading, or that their engineers are incompetent and use threads unnecessarily. Threads are never trivial - but what they were trying to do was quite trivial. Its their fault they involved threads in there.
This is one of the stupidest comments I've read here in a long time. A secondary "watchdog" thread was employed to enforce a time-out on the helper program's sniffing of a given shell extension, so in case the main thread hung trial hosting a faulty shell extension, there would still be another thread of logic outside of the infinite loop that could run and tell Windows Explorer the result.
If you knew anything about what you're trying to talk about, you'd know that multi-threading is used for these kinds of situations, as well as in GUI programming. And not just "when SMP scalability is an issue". This has nothing to so with the Win32 API design, it just was tackling a very specific problem. It doesn't mean that the Win32 API "requires threading", or that MS's engineers are incompetent, and that they used an additional thread unnecessarily here. Threads can be trivial, and this is I would say actually the most trivial case of their use. It's to their credit that they involved threads here (and might actually have been the only way), and it's to your ignorance that you don't understand any of this and got everything wrong about it.
The flaw was in doing the WaitForSingleObject() in the DLL's detach process function without specifying a timeout value. Even if you have no reason to think that the thread won't be there to signal you eventually, sometimes the unthinkable occurs.
Attention zealots and haters: 00100 00100