Imparting Malware Resistance With a Randomizing Compiler
First time accepted submitter wheelbarrio (1784594) writes with this news from the Economist: "Inspired by the natural resistance offered to pathogens by genetically diverse host populations, Dr Michael Franz at UCI suggests that common software be similarly hardened against attack by generating a unique executable for each install. It sounds like a cute idea, although the article doesn't provide examples of what kinds of diversity are possible whilst maintaining the program logic, nor what kind of attacks would be prevented with this approach." This might reduce the value of MD5 sums, though.
You think you have buggy software now, this idea will multiply a single bug into a dozen.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
Can you imagine parsing a stack trace or equivalent from one of these? Each stack is different.
Ignoring the fact that Heisenbugs would be much more prevalent.
Part of programming is paring of states. The computer is an (effectively) infinite-state machine. When you add bounds and checks you're reducing the number of states. This would add a great deal, making bugs more prevalent. Since a lot of attacks are based on bugs, this may increase the likelihood of some attacks.
..would a professor of CompSci think this is a good idea, despite the hundreds of problems it *causes* with existing practices and procedures?
Oh, wait.. maybe because the idea is patented and he'll get paid a lot.
http://www.google.com/patents/US8239836
So we should use something like ABS with that randomisation enabled? Or should we trust to download distinct blobs for every download? For the latter, nice try NSA, but I don't want you to be abled to incorporate spyware into my download and not be noticed. ... extra features. The blobs should be signed by more entities, so then all would have to be NSLed.
Its already a pity software gets signed only by so few entities (usually one at a time, at least for deb). Perhaps I know that the blob came from Debian, but I can't verify whether it is the version the public gets, or the special version with some
Some malware already does this, which definitely helps it evade heuristic scans. Sounds worth exploring, but i bet it will make the AV they force me to run at work that much more frustratingly restrictive.
You can already do this with Gentoo, you're highly unlikely to use the same combination of compiler, kernel, assembler, libraries, use flags, compiler flags etc as anyone else...
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
The problem with any nondeterministic compiler is that it prevents use of diverse double-compiling, a method to detect the sort of compiler backdoor described by Ken Thompson in "Reflections on Trusting Trust". You'd have to bootstrap the compiler with nondeterminism turned off (and with GUIDs, timestamps, and multithreaded allocation of symbols for anonymous objects turned off too) in order for the DDC bootstrap construction to converge.
In any case, I've implemented a technique like this on the Nintendo Entertainment System. I wrote a preprocessor that shuffles the order of functions in the file, the order of opcodes within a function that don't depend on each other's results, and the order of global variables (or the order of fields in an object). One reason I implemented it was to use one variable as another's canary to make buffer overflows easier to detect in an assembly language program. The other is watermarking the binary so that I can tell who leaked a particular copy of the beta version to the public. If you're interested, you can find my shuffle tool in the source code of Concentration Room.
I must respectfully disagree with you on every point you raise.
A randomised stack would cause certain types of bugs to manifest themselves much earlier in the development process. Nothing decreases the cost of a bug hunt more than proximity to the actual coding event.
Such an environment rewards programmers who invest more to validate their loops and bounds more rigorously in the first place. Nothing reduces the cost of a bug more than not coding it in the first place.
There's nothing that stops the debugging team from debugging against a canonical build, if they wish to do so. If they have a bug that the canonical build won't manifest, they wouldn't even have known about the bug without this technique added to the repertoire. If many such bugs become known early in the development process—bugs that manifest on some randomised builds, but not on the canonical debug build—you're got an excellent warning klaxon telling you what you need to know—your coding or management standards suck. Debugging suck, if instigated soon enough to matter, returns 100x ROI as compared to debugging code.
Certainly the number of critical vulnerabilities that exist against some compiled binary can only increase in number. So what? The attacker most likely doesn't know in advance which version any particular target will run. The attacker must now develop ten or one hundred exploits where previously one sufficed (or one exploit twice as large and ten times more clever).
If the program code mutated on every execution, you would have some valid points. That would be stupid beyond all comprehension. An attacker could just keep running your program until it comes up cherries.
The developer controls the determinism model. It's an asset in the war. There can be more when it helps our own cause, and less when it assists our adversaries.
Determinism should be not reduced to a crutch for failing to code correctly in the first place. Get over it. Learn how. Live in an environment that punishes mistakes early and often.
The problem with this in "Explain like I'm Five" terms:
You can have no idea what the program you are running does.
You cannot trust it. You cannot know it hasn't been tampered with. You cannot know a given copy works the same as another copy. You cannot know your executable has no back doors.
On the security minded front we have a trend towards striving for deterministic build capability; so that we have some confidence and method of validating that a source code to executable transformation hasn't been tampered with, that the binaries you just downloaded were actually generated from the source code in a verifiable way.
Another technique I'm seeing in secure conscious areas is executable whitelisting, where IT hashes and whitelists executables, and stuff not on the whitelist is flagged and/or rejected.
Now this guy comes along and runs headlong in the other direction suggesting every executable should be different. And I'm not sure I see any real benefit, nevermind a benefit that offsets the losses outlined above.
..would a professor of CompSci think this is a good idea, despite the hundreds of problems it *causes* with existing practices and procedures? Oh, wait.. maybe because the idea is patented and he'll get paid a lot.
http://www.google.com/patents/...
As an employee of the University of California a professor is *required* to report any discovery or method that *might* be patentable to the University.
The University takes it from there, it has an office that researches viability, handles the process and then licenses the patents to "industry". With respect to licensing small local companies are given a better deal than larger internationals. As for the licensing fees collected, 50% goes to the University, 25% to the department (UC Irvine's Computer Science department in this case) and 25% to the employee(s).
At least that is how it was a few years ago when I was a grad student at UC.
What kinds of bugs do you think would manifest earlier using this technique ...
The GP mentioned a randomized stack. An uninitialized variable would be one, something that often accidentally has a value that does no harm (a zero possibly).
... and why do you think that earlier manifestation of that class of bugs will outweigh the tremendous burden of chasing down all the heisenbugs that only occur on some small percentage of randomized builds?
You do realize that your argument for the status quo and not dealing with the "heisenbugs" is essentially arguing to leave a coding bug in place? Recompiling will not necessarily introduce new bugs, rather change the behavior of existing bugs.
I've seen many of the sort of bugs this recompiling technique may expose, I spent some years porting software between different architectures. Not only did we have different compilers but we had different target CPUs. It was a friggin awesome environment for exposing unnoticed bugs. Software that had run reliably under internal testing for weeks on its original platform failed immediately when run on a second platform. And it kept failing immediately after several crashing bugs were fixed. The original developers, who were actually quite skilled, looked at several of the bugs eventually found and wondered how the program ever ran at all. I've seen this repeated on multiple teams at multiple companies over the years.
Also developers working on one platform eventually learned to visit a colleague working on the "other" platform when they had a bug that was hard to reproduce. There was a good chance that a hard to manifest bug on one platform would be easier to reproduce on the other.
There is nothing like cross platform development to help shake out bugs.
This recompilation idea would seem to offer some of these same benefits. Yes it complicates reproducibility of crashes in the field but if one can get a recompilation seed with that crash dump/log its more like of dealing with an extra step not some impossible hurdle.
Plus recompiling with a different seed each time the developer does a test run at their workstation could help find bugs in the first place, reducing the occurrences of these pesky crashes in the field.
I'm not saying these proposed recompilations in the field are definitely a good idea, just that the negatives seem to be exaggerated. It looks like something interesting, worth looking into a bit more.