Finding More Than One Worm In the Apple
davecb (6526) writes "At Guido von Rossum's urging, Mike Bland has a look at detecting and fixing the "goto fail" bug at ACM Queue. He finds the same underlying problem in both in the Apple and Heartbleed bugs, and explains how to not suffer it again."
An excerpt: "WHY DIDN'T A TEST CATCH IT?
Several articles have attempted to explain why the Apple SSL vulnerability made it past whatever tests, tools, and processes Apple may have had in place, but these explanations are not sound, especially given the above demonstration to the contrary in working code. The ultimate responsibility for the failure to detect this vulnerability prior to release lies not with any individual programmer but with the culture in which the code was produced. Let's review a sample of the most prominent explanations and specify why they fall short.
Adam Langley's oft-quoted blog post13 discusses the exact technical ramifications of the bug but pulls back on asserting that automated testing would have caught it: "A test case could have caught this, but it's difficult because it's so deep into the handshake. One needs to write a completely separate TLS stack, with lots of options for sending invalid handshakes.""
The "Apple" had only one bug, the Goto Fail bug - since Apple did not use OpenSSL they never had the second bug.
So why is the headline painting Apple as the source of both bugs?
"There is more worth loving than we have strength to love." - Brian Jay Stanley
For the same reason new viruses will always defeat anti-virus software: Each virus is tested against existing anti-virus programs and only released into the wild when it has defeated all of them.
After reading TFA, I'm not sure I like the suggested approach to the "fix" in the code by replacing the two if blocks with a common method where you pass in all sorts of parameters.
Yes duplicate code is bad, I agree that's a "code smell" (one of the worst coding terms every to be invented BTW).
But just as odiferous to me, is a method with a billion arguments like the combined extracted method has. Sure duplicate if statements smell bad, but the replacement is worse and also harder to comprehend.
I know it's in theory more testable, but at what cost when the code is more obsfucated? If the code and tests are harder to understand are you really better off?
"There is more worth loving than we have strength to love." - Brian Jay Stanley
If you're selling that you coulda/woulda caught all X for X that haven't happened yet, you're selling snake oil. The reality is that this computer stuff is a little harder than it looks to do properly, and if all you have to offer is marketing bullshit and a History of Art degree, maybe you should leave it to the professionals, and push for budget to do things correctly rather than just do them.
.
I've often said that you don't fix a software bug until you've fixed the process that allowed the bug to be created. The above quote is of a similar sentiment.
goto fail;
goto fail;
Sorry, but it needs to be said: this is sloppy, he-man coding. Is there a problem with using brackets? Is it your carpal tunnel syndrome? Are you charged by the keystroke?
This is how mistakes happen. For shame!
quiquid id est, timeo puellas et oscula dantes.
So good test should catch this goto fail for sure, either functional test or an unit test. Looks like neither are thorough for the library.
Bot more importantly, if static analysis or structural coverage of code was done, both would point out that there is something wrong with the code.
All of these testing strategies should be done for such s critical piece of software.
Automated unit test stubs with range checking and input fuzzing. Took me two weekends to build one atop Doxygen. If your build environment does not do this you're maliciously stupid.
Compiler and static-analysis warnings also could have detected the unreachable code, though false warnings might have drowned out the signal if such tools weren't already being used regularly.
I'd purpose that these tools weren't being used properly rather than turning the issue into a nail for the unit testing hammer.
I've been in this field for 20+ years now, and I don't necessarily (in fact, I usually don't) agree with whatever the current trend is (which is probably why my karma is negative). One underlying trend, has been to make software something that can be made by anyone - to remove the requirement of having a special mind that is able to think through algorithms and code. This has generally been accomplished through process, and abstraction. Process - if we can describe a method well enough, then anyone should be able to follow it to it's logical conclusion. Abstraction - we keep adding layers upon layers in an effort to simplify and streamline that which is a complex thing (lots of numbers in sequence to control a microprocessor and it's accompanying hardware). You can probably tell that I'm not a great fan of either - though I'm really really trying to not be a negative type, and to go with the flow more. But I can't help my fundamental feelings that there is just no substitute for a smart individual with a gift of understanding the logic of code. I'm always against process because it takes the gift that i was given and neutralizes it. Personal feelings aside, I just don't think that all the process in the world is ever going to get ahead of the curve that is the battle between perfectly functional software and bugs.
If you make brilliant code that only you can understand, sorry to be harsh but you aren't that brilliant. We definitely need to value people who can generate and perfect algorithms, but do you think anyone would remember/value the Pythagorean Theorem if it was 40 steps long? No, he thought of a (then brilliant) way to do it simply and easily so that one only needs to understand basic math to pull it off. This is what we need more of; a single elegant algorithm that is so short it is hard to misuse is better than 1,000 algorithms that are all so hard to understand that only the author knows exactly how it works and will be forgotten as soon as the particular language or application fades into the past.
Turning on all warnings and forcing them to errors certainly would have caught the bug in Apple's SSL code. Anyone who just lets warnings fly by in C code is an idiot. Even if the warning is mildly silly, getting it out of the way lets the important warnings stand out. Sensible warnings from C compilers are the very reason we don't use lint anymore. Even then you still have to watch out, because some warnings won't appear at low optimization levels, and I recall hearing that there are a few obscure warnings not turned on by -Wall.
Also, it could have possibly been introduced by a bad merge. One of the things that putting braces on every if/for/while/etc. does is give merges more context to keep from fucking up, or at least a chance to cause brace mismatch.
As for Heartbleed, just the fact that the code wouldn't work with a compile time option to use the system malloc instead of a custom one should have been enough to raise some red flags. Because rolling your own code to do something "more efficiently" than the system libraries never introduces new problems, right?
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
We have some lovely elements coming together right here on the slashdot blurb:
1. Stupid pun instead of a descriptive title
2. Full caps in the article excerpt
3. Trying to bring up coding "culture"
4. Assertion that it totally could have been caught beforehand, but they aren't sure exactly how.
Somehow, I don't think I'm missing much by not reading the article.
This is clearly the automatic resolution of a merge conflict by the versioning control software. These are such a nightmare to debug and happen all the time. Developers rarely check their entire change visually post merge. Though this can be found using static analysis that force coding standards (such as forcing the use of brackets or proper indentation for the lexical scope). Though the bugs from automatic conflict resolution can only be really improved through better versioning software. These are without question the worst and most frustrating bugs.
I kinda agree... I mean, I never go out of my way to artificially complicate my code. I'm not one of those people that uses macros just for the sake of showing how clever I can make them. The problem is that - to use your example - how many people do we have now that actually learn how to derive the Pythagorean Theorem? How do we build on that? The gifted people that COULD build on it, can't - because they're sandboxed into a process, or a higher level abstraction. My kids - they were in awe of the Saturn V, the F-1 Engine - all of that technology (with the slide rules!) and knowhow that we had back in the 60's (after watching Apollo 13)... I told them that I didn't think we had the capacity to do that anymore. Then I read an article recently where they were looking at the F-1 engine like it was some alien artefact. Point being, I just don't think that we value our smart people anymore because we just stick them into the same process as every dope that has a connection to get his job. We need find the smart people, encourage them with financial incentives and get THEM to write the code.
Who benefitted? The TLAs. Accidental, my ass.
The article doesn't give the revision history of how that code got there. Who put it there, when did they put it there, and what did the code look like before and after they put it there. We need the names!
It's remarkable how many organizations don't enable aggressive compiler warnings (or worse, ignore or disable them). One of the best practices I've learned is to turn on every warning that you possibly can and use the option that treats all warnings as compiler errors. The code from Apple may have been properly unit tested. However, if this was the result of a bad automated merge, unit tests are often not repeated on the resulting code base headed for system test. The GCC "-Wunreachable-code" option would have caught this type of error.
The article contains the same flaw that people who rabidly declare unit tests as a panacea. The article basically shows that after discovery of a bug, a unit test can retroactively be constructed that would have caught the bug, therefore it's inexcusable that the bug got released, ignoring the fact that is hindsight. Unit tests are not without their utility certainly, but practically speaking you will not be able to construct unit tests that catches every single possible scenario. This is tricky enough for trying to catch functional problems, but for security problems where an adversary is explicitly trying to bend something beyond even what the developer conceived of in design, unit tests become even more tricky. If someone has the foresight in implementing a feature to craft a test case to explicitly try malicious things, then they probably wouldn't have messed up the code in the first place. Of course, there is value in having the first developer with that awareness institute such a test case so that a follow up activity gets checked, but I think in most of the cases the bug came with the first checkin of the function, meaning the developer just never considered the possibility at all. This means they made buggy code and they would have or in fact did also made inadequate test cases. You can't just say 'if Apple had done unit tests, their code would have been perfect!'. There are projects without unit tests that fare pretty well and there are projects with unit tests that fail miserably in terms of quality.
I have heard people claim with a straight face that they now have '100% coverage' through unit tests and then go on to say at-will releases are therefore safe to do without any particular testing.
XML is like violence. If it doesn't solve the problem, use more.
They are all tools that can be applied to improve the quality of the code. No one thing is "The Solution".
* Test Driven Development (TDD) is a good approach to ensure that the code you write is testable. This will not work for things like UI code, but other code will benefit.
* Unit Tests can either be developed via a TDD-like approach (easier to do), or after the code is written (harder to do).
* Automated Regression Tests (a superset of Unit Tests) provide good coverage for ensuring code works as expected without involving a large manual testing team. These will only detect the things covered by the automated tests.
* Static Code Analysis tools can pick up a lot of problem areas, but will not detect every problem. These results can be used to identify what tests need to be created to prevent future regression.
* Fuzz testing is good at providing strange data to e.g. a protocol or file format parser. These are intended to be soak tests -- e.g. "does my regular expression parser handle all these strange and possibly invalid constructs". Fuzz testing would have most likely found the heartbleed bug (because it would have permutated the length of data to request). Any failures here should be converted to Unit/Regression tests to ensure that the problem is (a) fixed by any code changes made and (b) does not occur in the future. Fuzz testing will typically find hard to identify bugs (e.g. data races) that are not easy to identify from manually constructed tests or static analysis.
* Manual/ad hoc testing is important as it can uncover bugs that the developers are not aware of.
* Code and Security Reviews help identify potential issues (e.g. if you have someone knowledgeable about SQL injection, they can assess whether some code is vulnerable to that attack).
None of these is a silver bullet, but the more you have the better the code will be.
If you make brilliant code that only you can understand
There's a false dichotomy here. He said that only *some* are qualified enough to create solutions to complex problems. You are saying his claim is that only *one* can understand, implying that the problem can't possibly be too hard, and that any hard code to follow is just because the developer is terrible at coding.
As a counter to your example of the Pythagorean Theorem, what about post-graduate math and science? There are tons of things which would make 40 steps seem easy by comparison. Should society forgo those just because only some people are realistically going to be able to understand and apply that correctly?
A very ubiquitous situation is that with the 'anyone can understand it or else it shouldn't exist at all' philosophy, there is no way we'd have cryptographic libraries at all.
I will agree that his stance against processes is a bit too harsh, but I've been around enough to know in some scenarios such a jaded perspective would be perfectly understandable. I've seen some projects that had appropriate and helpful processes that did help quality, but been witness to many many more that had ineffective process that achieved nothing but create busy work while still churning out crap code.
XML is like violence. If it doesn't solve the problem, use more.
Of course, it's obvious today that a test for behavior on inconsistent requests should have been done in OpenSSL. As well as a test for each failure cause should have been done by Apple. And next week, when an off-by-one bug bites us on an integer overflow in libfoobar, people will say testing for that condition should have been trivial.
So, yes, some conditions can be found with fuzzers. Of course, fuzzers work in an erratic way, and not all bugs can be triggered by them. But maybe fuzzing our code (more importantly, our security-sensitive code) will yield better results than preparing tests for those components in the system we are aware of.
Then again, properly fuzzing takes quite a bit of time. It is way less fun to watch a fuzzer than to see tests making green check marks...
Yeah, the "culture" is "hurry up and get it done so you can get on to the next thing because if something takes more than an hour to do it's not worth doing" and it exists in every single software development organization on planet Earth. Until these things actually start costing real money to people with real power, this will continue.
Proud neuron in the Slashdot hivemind since 2002.
At least Apple's bug could've been caught with basic unit-testing. This is the snippet of code from Apple's bug:
static OSStatus ...
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
uint8_t *signature, UInt16 signatureLen)
{
OSStatus err;
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0) ...
goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
goto fail;
goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
goto fail;
fail:
SSLFreeBuffer(&signedHashes);
SSLFreeBuffer(&hashCtx);
return err;
}
Just implement a unit test with the following logic:
1. When SSLHashSHA1.update() is called, DO NOT return an error.
2. Expect 2 calls to SSLHashSHA1.update() and check the input parameter on each call.
3. Expect 1 call to SSLHashSHA1.final() and check the input parameters are what you'd expect.
That simple unit test would've caught this issue without any need of duplicating code.
diegoT
I mean, if unbraced if statements are so deadly, then why are they not outlawed unless specifically allowed by compiler directive. On your head be it.
Ok, writing "goto fail;" twice in a row is a bug. But it's not the real bug. This code was checking whether a connection was safe, and executed a "goto fail;" statement if one of the checks failed. It also executed one "goto fail;" by accident, skipping one of the checks. But one would think that a statement "goto fail;" would make the connecction fail! In that case, sure, there was a bug, but the bug should have led to all connections failing, which would have made it obvious to spot (because the code wouldn't have worked, ever).
So the real problem is a programming style where executing a statement "goto fail;" doesn't actually fail! If a function returns 0 for success / non-zero for failure like this one, it should have been obvious to add an "assert (err != 0)" to the failure case following the fail: label. And that would have _immediately_ caught the problem. There should be _one_ statement in the success case "return 0;" and _one_ statement in the failure case "assert (err != 0); return err;".
It takes a genius to write code that can be understood by an idiot.
With buffer overflows, over and over it's said they can be tested for, that there's no just reason for buffer overflows in this day and age.
The fact is it takes more money, more time, and can easily be patched if one pops up.