Finding More Than One Worm In the Apple
davecb (6526) writes "At Guido von Rossum's urging, Mike Bland has a look at detecting and fixing the "goto fail" bug at ACM Queue. He finds the same underlying problem in both in the Apple and Heartbleed bugs, and explains how to not suffer it again."
An excerpt: "WHY DIDN'T A TEST CATCH IT?
Several articles have attempted to explain why the Apple SSL vulnerability made it past whatever tests, tools, and processes Apple may have had in place, but these explanations are not sound, especially given the above demonstration to the contrary in working code. The ultimate responsibility for the failure to detect this vulnerability prior to release lies not with any individual programmer but with the culture in which the code was produced. Let's review a sample of the most prominent explanations and specify why they fall short.
Adam Langley's oft-quoted blog post13 discusses the exact technical ramifications of the bug but pulls back on asserting that automated testing would have caught it: "A test case could have caught this, but it's difficult because it's so deep into the handshake. One needs to write a completely separate TLS stack, with lots of options for sending invalid handshakes.""
The "Apple" had only one bug, the Goto Fail bug - since Apple did not use OpenSSL they never had the second bug.
So why is the headline painting Apple as the source of both bugs?
"There is more worth loving than we have strength to love." - Brian Jay Stanley
For the same reason new viruses will always defeat anti-virus software: Each virus is tested against existing anti-virus programs and only released into the wild when it has defeated all of them.
.
I've often said that you don't fix a software bug until you've fixed the process that allowed the bug to be created. The above quote is of a similar sentiment.
goto fail;
goto fail;
Sorry, but it needs to be said: this is sloppy, he-man coding. Is there a problem with using brackets? Is it your carpal tunnel syndrome? Are you charged by the keystroke?
This is how mistakes happen. For shame!
quiquid id est, timeo puellas et oscula dantes.
I've been in this field for 20+ years now, and I don't necessarily (in fact, I usually don't) agree with whatever the current trend is (which is probably why my karma is negative). One underlying trend, has been to make software something that can be made by anyone - to remove the requirement of having a special mind that is able to think through algorithms and code. This has generally been accomplished through process, and abstraction. Process - if we can describe a method well enough, then anyone should be able to follow it to it's logical conclusion. Abstraction - we keep adding layers upon layers in an effort to simplify and streamline that which is a complex thing (lots of numbers in sequence to control a microprocessor and it's accompanying hardware). You can probably tell that I'm not a great fan of either - though I'm really really trying to not be a negative type, and to go with the flow more. But I can't help my fundamental feelings that there is just no substitute for a smart individual with a gift of understanding the logic of code. I'm always against process because it takes the gift that i was given and neutralizes it. Personal feelings aside, I just don't think that all the process in the world is ever going to get ahead of the curve that is the battle between perfectly functional software and bugs.
If you make brilliant code that only you can understand, sorry to be harsh but you aren't that brilliant. We definitely need to value people who can generate and perfect algorithms, but do you think anyone would remember/value the Pythagorean Theorem if it was 40 steps long? No, he thought of a (then brilliant) way to do it simply and easily so that one only needs to understand basic math to pull it off. This is what we need more of; a single elegant algorithm that is so short it is hard to misuse is better than 1,000 algorithms that are all so hard to understand that only the author knows exactly how it works and will be forgotten as soon as the particular language or application fades into the past.
Turning on all warnings and forcing them to errors certainly would have caught the bug in Apple's SSL code. Anyone who just lets warnings fly by in C code is an idiot. Even if the warning is mildly silly, getting it out of the way lets the important warnings stand out. Sensible warnings from C compilers are the very reason we don't use lint anymore. Even then you still have to watch out, because some warnings won't appear at low optimization levels, and I recall hearing that there are a few obscure warnings not turned on by -Wall.
Also, it could have possibly been introduced by a bad merge. One of the things that putting braces on every if/for/while/etc. does is give merges more context to keep from fucking up, or at least a chance to cause brace mismatch.
As for Heartbleed, just the fact that the code wouldn't work with a compile time option to use the system malloc instead of a custom one should have been enough to raise some red flags. Because rolling your own code to do something "more efficiently" than the system libraries never introduces new problems, right?
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
We have some lovely elements coming together right here on the slashdot blurb:
1. Stupid pun instead of a descriptive title
2. Full caps in the article excerpt
3. Trying to bring up coding "culture"
4. Assertion that it totally could have been caught beforehand, but they aren't sure exactly how.
Somehow, I don't think I'm missing much by not reading the article.
This is clearly the automatic resolution of a merge conflict by the versioning control software. These are such a nightmare to debug and happen all the time. Developers rarely check their entire change visually post merge. Though this can be found using static analysis that force coding standards (such as forcing the use of brackets or proper indentation for the lexical scope). Though the bugs from automatic conflict resolution can only be really improved through better versioning software. These are without question the worst and most frustrating bugs.
If you make brilliant code that only you can understand
There's a false dichotomy here. He said that only *some* are qualified enough to create solutions to complex problems. You are saying his claim is that only *one* can understand, implying that the problem can't possibly be too hard, and that any hard code to follow is just because the developer is terrible at coding.
As a counter to your example of the Pythagorean Theorem, what about post-graduate math and science? There are tons of things which would make 40 steps seem easy by comparison. Should society forgo those just because only some people are realistically going to be able to understand and apply that correctly?
A very ubiquitous situation is that with the 'anyone can understand it or else it shouldn't exist at all' philosophy, there is no way we'd have cryptographic libraries at all.
I will agree that his stance against processes is a bit too harsh, but I've been around enough to know in some scenarios such a jaded perspective would be perfectly understandable. I've seen some projects that had appropriate and helpful processes that did help quality, but been witness to many many more that had ineffective process that achieved nothing but create busy work while still churning out crap code.
XML is like violence. If it doesn't solve the problem, use more.
Yeah, the "culture" is "hurry up and get it done so you can get on to the next thing because if something takes more than an hour to do it's not worth doing" and it exists in every single software development organization on planet Earth. Until these things actually start costing real money to people with real power, this will continue.
Proud neuron in the Slashdot hivemind since 2002.
Ok, writing "goto fail;" twice in a row is a bug. But it's not the real bug. This code was checking whether a connection was safe, and executed a "goto fail;" statement if one of the checks failed. It also executed one "goto fail;" by accident, skipping one of the checks. But one would think that a statement "goto fail;" would make the connecction fail! In that case, sure, there was a bug, but the bug should have led to all connections failing, which would have made it obvious to spot (because the code wouldn't have worked, ever).
So the real problem is a programming style where executing a statement "goto fail;" doesn't actually fail! If a function returns 0 for success / non-zero for failure like this one, it should have been obvious to add an "assert (err != 0)" to the failure case following the fail: label. And that would have _immediately_ caught the problem. There should be _one_ statement in the success case "return 0;" and _one_ statement in the failure case "assert (err != 0); return err;".