Security Review Summary of NIST SHA-3 Round 1
FormOfActionBanana writes "The security firm Fortify Software has undertaken an automated code review of the NIST SHA-3 round 1 contestants (previously Slashdotted) reference implementations. After a followup audit, the team is now reporting summary results. According to the blog entry, 'This just emphasizes what we already knew about C, even the most careful, security conscious developer messes up memory management.' Of particular interest, Professor Ron Rivest's (the "R" in RSA) MD6 team has already corrected a buffer overflow pointed out by the Fortify review. Bruce Schneier's Skein, also previously Slashdotted, came through defect-free."
That is what they get for mandating the code be in ANSI C. How about allowing reference implementation in SPARK, ADA or something else using design-by-contract. After all, isn't something as critical as a international standard for a hash function the type of software d-b-c was meant for?
Learning HOW to think is more important than learning WHAT to think.
If you step into my heap one more time with your fucking malloc, I'm going to derefernce your null pointer bitch!
-Christian Bale
... because implementation is where people screw up.
Your null pointer bitch derefernced[sic] herself and crashed, or I'll take out your fucking lights. How would you like that?
In a word, no. A reference implementation is supposed to be a working version of the code, not just a mathematical description. With a working version, it's possible to do things like test its real world performance or cut and paste directly into a program that needs to use the function. That's obviously only possible if you have a version that works on real-world processors.
Consider Skein as an example. One of the things that Bruce Schneier described as a major goal of its design is that it uses functions that are highly optimized in real-world processors. That means that it's possible to make a version that's both very fast and straightforward to program, an important criterion for low-powered embedded applications. You won't discover that kind of detail until you implement it.
There's no point in questioning authority if you aren't going to listen to the answers.
I just read that as unmangled code.
I'm really behind on the latest programming paradigms.
I should add that I work for Fortify and that I initiated the SHA-3 review in my spare time as a private project. The Slashdot article on December 21 caught my interest.
Take off every 'sig' !!
"... because implementation is where people screw up." ... came through defect-free."
"Bruce Schneier's Skein,
So by deductive logic, Bruce is a robot. Also previously slashdotted.
Yes, I can't wait for managed Linux to come out. That sounds like a great idea....
This doesn't follow from TFA. The blog points out two instances of buffer overflows. The first one you could argue they messed up "memory management" because they used the wrong bounds for their array in several places... but they don't sound very "careful" or "security conscious" since checking to make sure you understand the bounds of the array you're using is pretty basic.
But that's not what bothered me. The second example is a typo where TFA says someone entered a "3" instead of a "2". In what dimension is mis-typing something "messing up memory management"? That just doesn't follow.
Oooooh good *odd pause* for you!
Read something about http://research.microsoft.com/en-us/groups/os/singularity/ :)
The summary is kind of a troll, since most of the submissions actually managed to get through without ANY buffer overflows.
Buffer overflows are not hard to avoid, they are just something that must be tested. If you don't test, you are going to make a mistake, but they are easy to find with a careful test plan or an automated tool. Apparently those authors who had buffer overflows in their code didn't really check for them.
C is just a tool, like any other, and it has tradeoffs. The fact that you are going to have to check for buffer overflows is just something you have to add to the final estimate of how long your project will take. But C gives you other advantages that make up for it. Best tool for the job, etc.
Qxe4
The alternative was supposed to be throwing money at Fortify by the way. If your conclusion is to switch to SPARK then Fortify needs to work on their PR, *cough*, I mean blogging.
At the very least, using a C-like language with safety, like Cyclone, would be a reasonable performance/safety tradeoff for a lot of users compared to the current tradeoffs (which leave quite a bit to be desired). I'm guessing the main stumbling block would be reimplementation overhead (Linux already exists in C, has a lot of code, and is a fairly quickly moving target) and lack of interest on the part of kernel hackers (who have little interest in using non-C languages), rather than performance of the resulting system.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
My respect for Microsoft (represented as a percent) just had an underflow error and is now equal to zero.
Microsoft talking about reliable systems as though they know what they're talking about?
I suspect the problem is related to the poor coding practices used in academia. I see college professors who write code that barely compiles in GCC without a bunch of warnings about anachronistic syntax. Some of the C code used constructs that are unrecognizable to someone who learned the language within the past 10 years, and is completely type unsafe.
I can't tell much from the code on the link, but I do see #define used for constants which is no longer appropriate (yet is EXTREMELY common to see). C99 had the const keyword in it, probably even before that.
eventually, some code has to do the memory management, whether it's in C, asm, or some HDL. You cannot escape this no matter how many pointless layers of code you place between the program and hw. Modern OS's are supposed to do this, the ones that don't do it right being broken, making so-called 'managed code' redundant. All it does is take excess resources, slowing things down. They might be quick to develop, but they are slow to execute, making the user experience a living hell. Because of this, I avoid 'managed' applications whenever possible. Yuck.
But that just raises the question of how to define a hash function mathematically? The lambda calculus, Godel Numbers? Things like cryptographic hash functions don't tend to be nice algebraic thingies like f(x)=x*x+7, especially since they're usually iterative and deliberately messy - the pretty functions are likely to be less secure.
On the other hand, there are things like cryptol in which you may be able to specify hash functions more mathematically. For example, here is a cryptol implementation of skein.
I know nothing of the sort. How about asking some developers who have a history of getting both the security and the memory management correct which intellectual challenge they lose the most sleep over?
The OpenBSD team has a history of strength in both areas. I suspect most of these developers would laugh out loud at the intellectual challenge of the memory management required to implement a hash function. It's about a hundred thousand lines of code short of where the OpenBSD team gets grey hair over memory management problems in the C language.
I was just having an intense conversation about restrictive land covenants with my GF. If the economic cycle tips downward, the covenant holder (often a not-for-profit manila file folder which is legally distinct from the insolvent main entity) ceases to afford regular maintenance. Suddenly it turns into a dandelion orchard, and everyone in the community is dead certain that every dandelion on every lawn originated from this single source, whereupon some soon to be re-elected politico harpoons the legal infrastructure that permitted these things to flourish in the first place.
"What we know about C" and "what we know about dandelions" are surprisingly equivalent.
I wouldn't hire a programmer who can't get memory management right to take on any significant intellectual challenge. It's just a way to feel good about yourself without having the aptitude to cut your way out of a wet paper bag.
90% of software development projects are not aptitude driven. Let's stop fooling ourselves into thinking that the languages that work well in those contexts having anything to offer those of us dealing with the other 10%
Memory management is a subcase of resource management with a particularly harsh way of delivering the news: you suck. A memory managed language deprives the environment of so many golden opportunities to deliver this message, despite the fact that you still suck. By the time you don't suck, you've ceased to regard unmanaged memory as a core intellectual challenge, and trained yourself to work within an idiom where you hardly ever get it wrong anyway.
The C string functions that cause the worst of the grief were widely known to be a bad idea by the mid 1980s. They originated a point in history where linking sprintf() was sometimes considered a luxury you couldn't afford. In the microcontroller world, it's still common that an environment provides three different versions of printf/sprintf: basic, basic plus more of the format options and maybe long integers, and then the full version which also includes floating point. The middle option is the beer budget. The first option is for when you can't even afford beer. These micros are not so different that the mini-computers on which C and Unix were originally created.
Furthermore the efficiency of the string functions tends to ripple outward, as they tend to carry the torch for the platform's memory subsystem performance in most C code bases. What do the Z80, 6809, and 8086 all have in common? Instruction set tweaks to make zipping along a string of bytes a lot zippier.
These tricks are then rolled into vendor optimized string libraries and made available to the developer via the ratified ANSI C string functions.
It's unfortunate that all this industry of tweaking toward core performance was consolidated under a string API whose modern legacy is to have informed so many programmers that "you suck" that the general sentiment is to vote it off the island, as if such a thing is possible with a cockroach or a rat or dandelion dandruff.
Microsoft research is not the Microsoft you know. They are not bound by 28 years of incompatibility. There main purpose is to remove the brains of the hands of the other companies...
$ cat bo.c
int a[3];
void f()
{
a[3] = 1;
}
$ lint bo.c
bo.c:4: warning: array subscript cannot be > 2: 3
Lint is so basic, I can't imagine not using it....
What's a "null pointer bitch"?
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Because you can't compile a mathematical definition.
If you've read the works of E.W. Dijkstra (start with Cruelty), you'd understand that a programming language isn't much more than a formal system for expressing mathematical definitions. Perhaps Haskell or another purely functional language might fit your intuitive understanding of a "mathematical definition" better than a procedural language like C, C++, P*, or Java.
Look up bootstrapping. Most modern compilers can compile themselves using a stage or two of simpler compilers
Look up trusting trust. Defects in the compiler, whether intentional or unintentional, can propagate themselves to the compiled work, even if the compiled work is the compiler itself.
why the fuck dynamic memory allocation is used in a C implementation of a hashing algorithm?
One of those female pointer dogs, but one that doesn't actually exist.
No existe.
Lint is so basic, I can't imagine not using it....
Lint is not found in Ubuntu. Did you mean Splint? And can you recommend anything analogous for C++ programs, for which Splint has no front-end?
If you're still writing unmanaged code, you get what you deserve. It's 2009, not 1989.
Try running managed code in the 4 MB RAM of a widely deployed handheld computer. Now try making that managed code time-competitive and space-competitive with an equivalent program in C++ compiled to a native binary.
Really? What film is this from?
Its OK to admit you're incapable of understanding kernel design. There are plenty of idiots like you.
Go peddle your FUD somewhere else.
In other news, the first SHA-3 conference will be held in Belgium this week. The NIST hopes to be able to reduce the amount of contestants for the SHA-3 contest to a more manageable level by the end of that; for more info read on here.
Are they ready for Round 2??
Wait, why is this managed code not compiled to native binaries and optimized?
MD6 by Rivest and Skein by Schneier et. al. seem to be getting a lot of attention, but another celebrity cryptographer, Dan J. Bernstein, also has a hash in this race, called "CubeHash."
DJB continued his tradition of offering cash rewards for people to find security problems with his code, giving out (so far) monthly prizes of 100 Euros to the most interesting cryptanalysis of CubeHash.
So far, the primary criticism of CubeHash is that it's slow, running some 10 to 20 times slower than many of the others in the competition. Dan brushes off this criticism by stating on his site: "for most applications of hash functions, speed simply doesn't matter."
To be honest, when compared efforts like MD6 and Skein, with their mathematic proofs of security, VHDL and other in-hardware reference implementations, and their amazing optimizations in both speed and efficiency (Skein can process half a GByte of data per second on modern hardware, and consumes only 100 bytes) -- entries like CubeHash seem to have that longshot underdog appeal, like a New Zealand soccer World Cup team.
"With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea...."
RFC 1925
I've had to work on an app where the main developer didn't know / didn't care about void *, and used char * everywhere instead. In fact he used char* even when the type was unique, and type cast at every call, and at the beginning of the called function.
When I called him on it, he said that I was doing philosophy and that he had real work to do.
My phone has a Java runtime. It works, and it's in fact a very sensible choice for the application (where security and binary portability matters more than performance). Even today, many embedded devices are powerful enough to run bytecode-interpreted languages, and this will only become more true in the future.
xkcd is not in the sudoers file. This incident will be reported.
it's in fact a very sensible choice for the application (where security and binary portability matters more than performance)
Except that binary portability doesn't matter, and while security is an absolute requirement, performance must be as high as possible.
Many applications hash huge volumes of data. SHA-256 can hash around 60 MBps on a ~2Ghz core, and that's too slow for many applications. WAY too slow. I have an application where I'd like to be able to hash over 20 MBps on an XScale processor. The rest of the system can easily sustain this data rate, but the hash is the bottleneck. The hash should not be the bottleneck.
Performance is a critically-important characteristic of a secure hash function and it will likely be the factor that decides the winner, as it was with AES.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
unmanaged code is dead.
Yes, because it warms my heart to think of reference crypto code that would crash if it wasn't running in a sandbox.
Dewey, what part of this looks like authorities should be involved?
Wait, why is this managed code not compiled to native binaries and optimized?
Because the device uses different digital signing keys for managed and unmanaged code, and end users don't have the unmanaged one.
And, if I'm not mistaken the basis of almost all platforms is ASM and C, i.e. unmanaged code.
And this includes all versions of Windows also, not?
This always gets me though: why doesn't [the trusting trust attack] apply to C?
Bruce Schneier pointed out that one can bootstrap a compiler using a different implementation of the language as a (probabilistic) measure against defects introduced by trusting trust. Build it on systems with different compilers, bit-compare the binaries generated on each system, and if they match, you can be reasonably sure that there is no such defect. But unlike C, which has implementations from GNU, Borland, Watcom, M$, Green Hills, and numerous other vendors, a lot of the managed languages lack multiple widely used complete implementations. For example, there really isn't an alternative to Sun Java.