Finding More Than One Worm In the Apple
davecb (6526) writes "At Guido von Rossum's urging, Mike Bland has a look at detecting and fixing the "goto fail" bug at ACM Queue. He finds the same underlying problem in both in the Apple and Heartbleed bugs, and explains how to not suffer it again."
An excerpt: "WHY DIDN'T A TEST CATCH IT?
Several articles have attempted to explain why the Apple SSL vulnerability made it past whatever tests, tools, and processes Apple may have had in place, but these explanations are not sound, especially given the above demonstration to the contrary in working code. The ultimate responsibility for the failure to detect this vulnerability prior to release lies not with any individual programmer but with the culture in which the code was produced. Let's review a sample of the most prominent explanations and specify why they fall short.
Adam Langley's oft-quoted blog post13 discusses the exact technical ramifications of the bug but pulls back on asserting that automated testing would have caught it: "A test case could have caught this, but it's difficult because it's so deep into the handshake. One needs to write a completely separate TLS stack, with lots of options for sending invalid handshakes.""
The "Apple" had only one bug, the Goto Fail bug - since Apple did not use OpenSSL they never had the second bug.
So why is the headline painting Apple as the source of both bugs?
"There is more worth loving than we have strength to love." - Brian Jay Stanley
For the same reason new viruses will always defeat anti-virus software: Each virus is tested against existing anti-virus programs and only released into the wild when it has defeated all of them.
... security experts are not that fond of unit testing.
After reading TFA, I'm not sure I like the suggested approach to the "fix" in the code by replacing the two if blocks with a common method where you pass in all sorts of parameters.
Yes duplicate code is bad, I agree that's a "code smell" (one of the worst coding terms every to be invented BTW).
But just as odiferous to me, is a method with a billion arguments like the combined extracted method has. Sure duplicate if statements smell bad, but the replacement is worse and also harder to comprehend.
I know it's in theory more testable, but at what cost when the code is more obsfucated? If the code and tests are harder to understand are you really better off?
"There is more worth loving than we have strength to love." - Brian Jay Stanley
If you're selling that you coulda/woulda caught all X for X that haven't happened yet, you're selling snake oil. The reality is that this computer stuff is a little harder than it looks to do properly, and if all you have to offer is marketing bullshit and a History of Art degree, maybe you should leave it to the professionals, and push for budget to do things correctly rather than just do them.
.
I've often said that you don't fix a software bug until you've fixed the process that allowed the bug to be created. The above quote is of a similar sentiment.
goto fail;
goto fail;
Sorry, but it needs to be said: this is sloppy, he-man coding. Is there a problem with using brackets? Is it your carpal tunnel syndrome? Are you charged by the keystroke?
This is how mistakes happen. For shame!
quiquid id est, timeo puellas et oscula dantes.
So good test should catch this goto fail for sure, either functional test or an unit test. Looks like neither are thorough for the library.
Bot more importantly, if static analysis or structural coverage of code was done, both would point out that there is something wrong with the code.
All of these testing strategies should be done for such s critical piece of software.
Automated unit test stubs with range checking and input fuzzing. Took me two weekends to build one atop Doxygen. If your build environment does not do this you're maliciously stupid.
I've been in this field for 20+ years now, and I don't necessarily (in fact, I usually don't) agree with whatever the current trend is (which is probably why my karma is negative). One underlying trend, has been to make software something that can be made by anyone - to remove the requirement of having a special mind that is able to think through algorithms and code. This has generally been accomplished through process, and abstraction. Process - if we can describe a method well enough, then anyone should be able to follow it to it's logical conclusion. Abstraction - we keep adding layers upon layers in an effort to simplify and streamline that which is a complex thing (lots of numbers in sequence to control a microprocessor and it's accompanying hardware). You can probably tell that I'm not a great fan of either - though I'm really really trying to not be a negative type, and to go with the flow more. But I can't help my fundamental feelings that there is just no substitute for a smart individual with a gift of understanding the logic of code. I'm always against process because it takes the gift that i was given and neutralizes it. Personal feelings aside, I just don't think that all the process in the world is ever going to get ahead of the curve that is the battle between perfectly functional software and bugs.
Compiler and static-analysis warnings also could have detected the unreachable code, though false warnings might have drowned out the signal if such tools weren't already being used regularly.
I'd purpose that these tools weren't being used properly rather than turning the issue into a nail for the unit testing hammer.
Turning on all warnings and forcing them to errors certainly would have caught the bug in Apple's SSL code. Anyone who just lets warnings fly by in C code is an idiot. Even if the warning is mildly silly, getting it out of the way lets the important warnings stand out. Sensible warnings from C compilers are the very reason we don't use lint anymore. Even then you still have to watch out, because some warnings won't appear at low optimization levels, and I recall hearing that there are a few obscure warnings not turned on by -Wall.
Also, it could have possibly been introduced by a bad merge. One of the things that putting braces on every if/for/while/etc. does is give merges more context to keep from fucking up, or at least a chance to cause brace mismatch.
As for Heartbleed, just the fact that the code wouldn't work with a compile time option to use the system malloc instead of a custom one should have been enough to raise some red flags. Because rolling your own code to do something "more efficiently" than the system libraries never introduces new problems, right?
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
You guys, I swear. Must be nice up there on your thrones. Captain Hindsight, where are you? Ahh .. over on slashdot posting ..
There isn't a piece of useful software EVER RELEASED IN THE HISTORY OF HISTORY that hasn't had bugs. It was a bug, the fixed it, move along, nothing more to see here ...
If you are that worried about it, go write your own OS/Compliler/Language and while you're at it design your own hardware just to be sure. Oh ...
We have some lovely elements coming together right here on the slashdot blurb:
1. Stupid pun instead of a descriptive title
2. Full caps in the article excerpt
3. Trying to bring up coding "culture"
4. Assertion that it totally could have been caught beforehand, but they aren't sure exactly how.
Somehow, I don't think I'm missing much by not reading the article.
This is clearly the automatic resolution of a merge conflict by the versioning control software. These are such a nightmare to debug and happen all the time. Developers rarely check their entire change visually post merge. Though this can be found using static analysis that force coding standards (such as forcing the use of brackets or proper indentation for the lexical scope). Though the bugs from automatic conflict resolution can only be really improved through better versioning software. These are without question the worst and most frustrating bugs.
heartbleed
...
WHY DIDN'T A TEST CATCH IT?
A couple of months before this posting, I would have thought OpenSSL has done positive things for the community. Now I know that this is unreliable software that must be replaced as soon as viable. I've since learned better. Maybe OpenSSL didn't always suck. But, at the time of this writing in May of 2014, it sucketh.
A couple of days ago (at the time of this writing), OpenSSL Vahalla Rampage: "Sometimes bad tests can be more harmful than no tests at all" documents yet another bad test in the OpenSSL code. The OpenSSL code is riddled with problems, including tests that don't work as intended. So, there is a simple answer to the question: why a test didn't catch heartbleed. The answer is that reliable testing was not being performed. Multiple tests that were created, didn't work, and so couldn't be relied upon anyway.
Thank goodness there is a competent team on this planet who knows how to do things right, and is now taking care of the problem. (The makers of OpenSSH are ripping OpenSSL to shreds, and creating LibReSSL, the library that is a re-implementation of SSL code.)
For tons more examples of how cripplingly bad OpenSSL has become, see other articles on the OpenSSL Vahalla Rampage site. The site will probably mean nothing to people who don't know programming (or don't have enough experience to have been introduced to techniques like manually freeing up memory that was dynamically allocated earlier). I don't quite comprehend some of the intricacies, but some I do, and those that I do, are ROFL-making material. Like this amusing observation (which will amuse anyone who has used to C preprocessor), and many others.
Perhaps one of the best examples is the code that uses a goto statement, to jump to this label:
if (0)
{
err:
That gem was found from Flingpoo! : OpenSSL is written by monkeys, yet another ridiculing site documenting the current tragedy of OpenSSL. That site has some other wretched examples, like the #ifdef...if...#endif...else mis-construct.
Fuzztesting doesn't work for some of these things. I'll leave it as an exercise to your ego to figure out why.
NOTE: IF fuzz testing was "The Solution", as you allude to, we would not need unit testing at all. And clearly we need unit tests.
Who benefitted? The TLAs. Accidental, my ass.
The article doesn't give the revision history of how that code got there. Who put it there, when did they put it there, and what did the code look like before and after they put it there. We need the names!
It's remarkable how many organizations don't enable aggressive compiler warnings (or worse, ignore or disable them). One of the best practices I've learned is to turn on every warning that you possibly can and use the option that treats all warnings as compiler errors. The code from Apple may have been properly unit tested. However, if this was the result of a bad automated merge, unit tests are often not repeated on the resulting code base headed for system test. The GCC "-Wunreachable-code" option would have caught this type of error.
The article contains the same flaw that people who rabidly declare unit tests as a panacea. The article basically shows that after discovery of a bug, a unit test can retroactively be constructed that would have caught the bug, therefore it's inexcusable that the bug got released, ignoring the fact that is hindsight. Unit tests are not without their utility certainly, but practically speaking you will not be able to construct unit tests that catches every single possible scenario. This is tricky enough for trying to catch functional problems, but for security problems where an adversary is explicitly trying to bend something beyond even what the developer conceived of in design, unit tests become even more tricky. If someone has the foresight in implementing a feature to craft a test case to explicitly try malicious things, then they probably wouldn't have messed up the code in the first place. Of course, there is value in having the first developer with that awareness institute such a test case so that a follow up activity gets checked, but I think in most of the cases the bug came with the first checkin of the function, meaning the developer just never considered the possibility at all. This means they made buggy code and they would have or in fact did also made inadequate test cases. You can't just say 'if Apple had done unit tests, their code would have been perfect!'. There are projects without unit tests that fare pretty well and there are projects with unit tests that fail miserably in terms of quality.
I have heard people claim with a straight face that they now have '100% coverage' through unit tests and then go on to say at-will releases are therefore safe to do without any particular testing.
XML is like violence. If it doesn't solve the problem, use more.
They are all tools that can be applied to improve the quality of the code. No one thing is "The Solution".
* Test Driven Development (TDD) is a good approach to ensure that the code you write is testable. This will not work for things like UI code, but other code will benefit.
* Unit Tests can either be developed via a TDD-like approach (easier to do), or after the code is written (harder to do).
* Automated Regression Tests (a superset of Unit Tests) provide good coverage for ensuring code works as expected without involving a large manual testing team. These will only detect the things covered by the automated tests.
* Static Code Analysis tools can pick up a lot of problem areas, but will not detect every problem. These results can be used to identify what tests need to be created to prevent future regression.
* Fuzz testing is good at providing strange data to e.g. a protocol or file format parser. These are intended to be soak tests -- e.g. "does my regular expression parser handle all these strange and possibly invalid constructs". Fuzz testing would have most likely found the heartbleed bug (because it would have permutated the length of data to request). Any failures here should be converted to Unit/Regression tests to ensure that the problem is (a) fixed by any code changes made and (b) does not occur in the future. Fuzz testing will typically find hard to identify bugs (e.g. data races) that are not easy to identify from manually constructed tests or static analysis.
* Manual/ad hoc testing is important as it can uncover bugs that the developers are not aware of.
* Code and Security Reviews help identify potential issues (e.g. if you have someone knowledgeable about SQL injection, they can assess whether some code is vulnerable to that attack).
None of these is a silver bullet, but the more you have the better the code will be.
The real solution is actually dry-running your code.
Despite my absolute hatred for shit like UML, the basic foundations are incredibly helpful, those being the pseudocode and sequence diagrams. DETAILED sequence diagrams. The rest of UML is literally the same apple viewed from different angles and a waste of time.
Pseudocode, basic flow. Do a good basic sequence diagram of what needs to happen based on that, anonymous "functions / entities". Detailed pseudocode with specific actions and/or detailed sequence diagram that contains the separate compartments and how they communicate. Done.
This will catch most errors. Even with programmers that are only learning. When they are taught to go through things line-by-line, it gets them actually thinking about it. And when you do the iterative sequence / pseudocode method, it is hard to fail as easily as the heartbleed bug. It was an honest, basic mistake, but one that could have been avoided.
Then when you test that, it should bring up very little errors. Unit tests, fuzz tests. And very specifically use pre-built libraries out there that test the absolute hell out of things and try to break them. Remaking the wheel is often the biggest reason so many insecure things exist (ESPECIALLY PHP, it is so easy to write bad code in that due to the way the language works. Horribly).
Re-use others code / tools for testing. It has most likely been created by large groups of people over many years and is bulletproof.
So many people want to do it on their own and it is a silly attitude and you have nothing to prove to anyone.
I don't understand why everyone is harping about fuzztesting either. It's randomly poking at a black box. Pointless.
If you aren't sure what the code does and you're too lazy or incompetent to properly white box test it, then for the love of god, do a full input permutation test. Test all possible input values. If you have to, let it run until it completes or until you get a better idea.
Of course, it's obvious today that a test for behavior on inconsistent requests should have been done in OpenSSL. As well as a test for each failure cause should have been done by Apple. And next week, when an off-by-one bug bites us on an integer overflow in libfoobar, people will say testing for that condition should have been trivial.
So, yes, some conditions can be found with fuzzers. Of course, fuzzers work in an erratic way, and not all bugs can be triggered by them. But maybe fuzzing our code (more importantly, our security-sensitive code) will yield better results than preparing tests for those components in the system we are aware of.
Then again, properly fuzzing takes quite a bit of time. It is way less fun to watch a fuzzer than to see tests making green check marks...
I fucking HATE split infinitives.
Nothing makes you look like a bigger imbecile than splitting an infinitive.
Yeah, the "culture" is "hurry up and get it done so you can get on to the next thing because if something takes more than an hour to do it's not worth doing" and it exists in every single software development organization on planet Earth. Until these things actually start costing real money to people with real power, this will continue.
Proud neuron in the Slashdot hivemind since 2002.
At least Apple's bug could've been caught with basic unit-testing. This is the snippet of code from Apple's bug:
static OSStatus ...
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
uint8_t *signature, UInt16 signatureLen)
{
OSStatus err;
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0) ...
goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
goto fail;
goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
goto fail;
fail:
SSLFreeBuffer(&signedHashes);
SSLFreeBuffer(&hashCtx);
return err;
}
Just implement a unit test with the following logic:
1. When SSLHashSHA1.update() is called, DO NOT return an error.
2. Expect 2 calls to SSLHashSHA1.update() and check the input parameter on each call.
3. Expect 1 call to SSLHashSHA1.final() and check the input parameters are what you'd expect.
That simple unit test would've caught this issue without any need of duplicating code.
diegoT
I mean, if unbraced if statements are so deadly, then why are they not outlawed unless specifically allowed by compiler directive. On your head be it.
Ok, writing "goto fail;" twice in a row is a bug. But it's not the real bug. This code was checking whether a connection was safe, and executed a "goto fail;" statement if one of the checks failed. It also executed one "goto fail;" by accident, skipping one of the checks. But one would think that a statement "goto fail;" would make the connecction fail! In that case, sure, there was a bug, but the bug should have led to all connections failing, which would have made it obvious to spot (because the code wouldn't have worked, ever).
So the real problem is a programming style where executing a statement "goto fail;" doesn't actually fail! If a function returns 0 for success / non-zero for failure like this one, it should have been obvious to add an "assert (err != 0)" to the failure case following the fail: label. And that would have _immediately_ caught the problem. There should be _one_ statement in the success case "return 0;" and _one_ statement in the failure case "assert (err != 0); return err;".
With buffer overflows, over and over it's said they can be tested for, that there's no just reason for buffer overflows in this day and age.
The fact is it takes more money, more time, and can easily be patched if one pops up.