Imparting Malware Resistance With a Randomizing Compiler
First time accepted submitter wheelbarrio (1784594) writes with this news from the Economist: "Inspired by the natural resistance offered to pathogens by genetically diverse host populations, Dr Michael Franz at UCI suggests that common software be similarly hardened against attack by generating a unique executable for each install. It sounds like a cute idea, although the article doesn't provide examples of what kinds of diversity are possible whilst maintaining the program logic, nor what kind of attacks would be prevented with this approach." This might reduce the value of MD5 sums, though.
You think you have buggy software now, this idea will multiply a single bug into a dozen.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
Can you imagine parsing a stack trace or equivalent from one of these? Each stack is different.
Ignoring the fact that Heisenbugs would be much more prevalent.
Part of programming is paring of states. The computer is an (effectively) infinite-state machine. When you add bounds and checks you're reducing the number of states. This would add a great deal, making bugs more prevalent. Since a lot of attacks are based on bugs, this may increase the likelihood of some attacks.
How about writing good, exploit-free code instead of closing the barn doors after the cows have escaped?
..would a professor of CompSci think this is a good idea, despite the hundreds of problems it *causes* with existing practices and procedures?
Oh, wait.. maybe because the idea is patented and he'll get paid a lot.
http://www.google.com/patents/US8239836
So we should use something like ABS with that randomisation enabled? Or should we trust to download distinct blobs for every download? For the latter, nice try NSA, but I don't want you to be abled to incorporate spyware into my download and not be noticed. ... extra features. The blobs should be signed by more entities, so then all would have to be NSLed.
Its already a pity software gets signed only by so few entities (usually one at a time, at least for deb). Perhaps I know that the blob came from Debian, but I can't verify whether it is the version the public gets, or the special version with some
Some malware already does this, which definitely helps it evade heuristic scans. Sounds worth exploring, but i bet it will make the AV they force me to run at work that much more frustratingly restrictive.
by generating a unique executable for each install
... and cloning a unique customer support team for each install!
You can already do this with Gentoo, you're highly unlikely to use the same combination of compiler, kernel, assembler, libraries, use flags, compiler flags etc as anyone else...
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
This technique would probably be more effective for making detection resistant malware than protecting against malware. The software would still function almost the same, so if it is still interacted with in the same manner, it could still be vulnerable to the same exploit. It also makes it much more difficult to verify the software is valid, meaning that it actually INCREASES the risk factor for malware on account of being a perfect recipe for trojans.
The real solution to the problem he is trying to solve is not having a monoculture. This does nothing to solve it. If you have different code bases for operating systems, browsers, etc., the ability to infect all of them may be hampered. That's the same advantage of humans and dogs and snakes not being susceptible to the same pathogens. His form of diversty is more of an environmental one, so it's like different potatoes in a bag looking different despite the fact that they are almost certainly clones of each other. That does nothing against a blight.
This is my signature. There are many like it, but this one is mine.
The problem with any nondeterministic compiler is that it prevents use of diverse double-compiling, a method to detect the sort of compiler backdoor described by Ken Thompson in "Reflections on Trusting Trust". You'd have to bootstrap the compiler with nondeterminism turned off (and with GUIDs, timestamps, and multithreaded allocation of symbols for anonymous objects turned off too) in order for the DDC bootstrap construction to converge.
In any case, I've implemented a technique like this on the Nintendo Entertainment System. I wrote a preprocessor that shuffles the order of functions in the file, the order of opcodes within a function that don't depend on each other's results, and the order of global variables (or the order of fields in an object). One reason I implemented it was to use one variable as another's canary to make buffer overflows easier to detect in an assembly language program. The other is watermarking the binary so that I can tell who leaked a particular copy of the beta version to the public. If you're interested, you can find my shuffle tool in the source code of Concentration Room.
It would probably cause more problems than it's worth, but it might be able to render some form of cheating worthless. If each program had a different layout then knowing what address you needed to hook into to cheat could be a problem. I don't see how it could cause more problems than anti-cheat software already does.
If you think a bit further... An operating system could load an executable at a different address every time it is used, without recompilation!
extern warranty;
main()
{
(void)warranty;
}
Okay, this technology is described in depth in a 2013 paper called librando: Transparent Code Randomization for Just-in-Time Compilers. There might be even newer information available somewhere, if Mr. Franz or his colleagues have continued the research.
I must respectfully disagree with you on every point you raise.
A randomised stack would cause certain types of bugs to manifest themselves much earlier in the development process. Nothing decreases the cost of a bug hunt more than proximity to the actual coding event.
Such an environment rewards programmers who invest more to validate their loops and bounds more rigorously in the first place. Nothing reduces the cost of a bug more than not coding it in the first place.
There's nothing that stops the debugging team from debugging against a canonical build, if they wish to do so. If they have a bug that the canonical build won't manifest, they wouldn't even have known about the bug without this technique added to the repertoire. If many such bugs become known early in the development process—bugs that manifest on some randomised builds, but not on the canonical debug build—you're got an excellent warning klaxon telling you what you need to know—your coding or management standards suck. Debugging suck, if instigated soon enough to matter, returns 100x ROI as compared to debugging code.
Certainly the number of critical vulnerabilities that exist against some compiled binary can only increase in number. So what? The attacker most likely doesn't know in advance which version any particular target will run. The attacker must now develop ten or one hundred exploits where previously one sufficed (or one exploit twice as large and ten times more clever).
If the program code mutated on every execution, you would have some valid points. That would be stupid beyond all comprehension. An attacker could just keep running your program until it comes up cherries.
The developer controls the determinism model. It's an asset in the war. There can be more when it helps our own cause, and less when it assists our adversaries.
Determinism should be not reduced to a crutch for failing to code correctly in the first place. Get over it. Learn how. Live in an environment that punishes mistakes early and often.
This might reduce the value of MD5 sums, though.
In order to compile a different version of the executable on each install it would need to be distributed as source code. Why would the source code MD5 sums take a hit if the compiler is randomizing at compile time? The source MD5 sums would remain intact, it's the binary that gets mixed up.
I swapped all the data bits around on my motherboard!
Hahaha!
Good luck!
Oh wait...
Mostly random stuff.
Why bother with this at the compiler level?
Just find 10,000 instruciton pairs that can be reordered as they have no interdependancies, and reorder each of the pairs at random during the install phase. That gives you 2^10,000 unique executibles, but all the debugging symbols and so on will remain the same.
I guess that doesn't help you against stack-smashing and so on. But will allow you to fingerprint who leaked your binary onto bittorrent - which would be its eventual use.
Why not compile at install and have the user "move the mouse around" to generate some "randomness" .... (who does that again.......ah... putty.exe I think). This way the md5 of the transferred files is same, but the exe's would be different. Sucks to be the user waiting for that new app to "install" though.
Forgive my ignorance, but I just started using Freenas and was really impressed by the "jails" each plugin runs in. Wouldn't sandboxing or jails help out with this problem? I believe OS X does it and not sure how Linux does it. My only Windows box is Win 98 so I can run Civ 2 properly.
This might reduce the value of MD5 sums, though.
Yeah, maybe just a little...
The problem with this in "Explain like I'm Five" terms:
You can have no idea what the program you are running does.
You cannot trust it. You cannot know it hasn't been tampered with. You cannot know a given copy works the same as another copy. You cannot know your executable has no back doors.
On the security minded front we have a trend towards striving for deterministic build capability; so that we have some confidence and method of validating that a source code to executable transformation hasn't been tampered with, that the binaries you just downloaded were actually generated from the source code in a verifiable way.
Another technique I'm seeing in secure conscious areas is executable whitelisting, where IT hashes and whitelists executables, and stuff not on the whitelist is flagged and/or rejected.
Now this guy comes along and runs headlong in the other direction suggesting every executable should be different. And I'm not sure I see any real benefit, nevermind a benefit that offsets the losses outlined above.
That might have to wait for formal verification methods to be made cheap enough for mass-market software. We have automated type checking and memory-safe languages, but there are still ways to write exploitably incorrect code in a managed environment.
I can't see how Franz's idea is materially different from "Randomized instruction set emulation" by Barrantes, Ackley, Forrest, and Stefanovic (2005).
I worked in this field a good many years ago, and I remember how we hoped that new Windows environments would suppress the prevalence of viral executables.
Then Macro Viruses turned up.
Now, Macro Viruses work at a higher level than machine code. They will therefore work on ANY machine that recognises, for instance, the WORD macro language - a mainframe, if WORD was ported to it. And you can't change macro languages - they are standardised.
I've seen many academics propose the 'answer' to viruses, and watched them ALL fall flat on their faces.
Back in the real world jumbling things around in an effort to sometimes mitigate against a static adversary (cough cough.. in ur dreams) is also likely to make matters worse than they would be in the first place for some unfortunate souls.
This isn't biology ... you don't get to play natural selection with widely distributed code and discard paying customers who's mutation was not "fit enough" to survive an attack.
"Dr Franz puts the chance of a hacker successfully penetrating one of his randomised application programs at about one in a billion. "
This is foolish beyond imagination.
No doubt these odds would shorten if his approach were taken up widely, for hackers are endlessly ingenious. But at the moment they mean that, if his system of multicompilers were used universally, any given hack would affect but a handful of the machines existing on the entire planet.
Also foolish beyond imagination.
Dr Franz has already built a prototype that can diversify programs such as Firefox and Apache Linux. Test attacks designed to take over computers running the resulting machine code always failed. The worst thing that happened was that the attack crashed the target machine, requiring a reboot
Please tell me this article was written by a computer generated nonsense generator.
(n/t)
The anti-virus product makers are really going to hate this.
Malware usually comes with it's own software/executables and interfaces via windows api.
So randomizing operating system/software executables seems of little benefit.
What am I missing here ?
This is only an issue because of unchecked pointer arithmetic. For garbage collected and range checked items, you can't take advantage of co-location of data. In a JVM, if you try to cast an address to a reference to a Foo, it will throw an exception at the VM level. Indexing arrays? Push index and array on the stack, and it throws an exception if index isn't in range when it gets an instruction to index it. In these cases, pointer arithmetic isn't used. In some contexts, you MUST use pointer arithmetic. But if the pointer type system is rich enough (See Rust) then the compiler will have no trouble rejecting wrong references, and even avoiding races involving them. In C, an "int*" is not a pointer to an int. It is really a UNION of three things until compiler proves otherwise: "ptr|null|junk". If the compiler is satisfied that it can't be "junk", its type is then a union of "ptr|null". You can't dereference this type, as you must have a switch that matches it to one or the other. The benefit of this is that you can never actually deref a null pointer, and you end up having exceptions at the point where the non-null assumption began, rather than deep inside of some code at some random usage of that nullable variable. As for arrays, if an array "int[] y" is claimed, than that means that y[n] points to an int in the same array as y[0] does. Attempts to dereference should be proven by the compiler or rejected; even if that means that like the nullable deref, you end up having to explicitly handle the branch where the assumption doesn't hold. You can't prove the correctness of anything in the presence of unlimited pointer arithmetic. You can set a local variable to true, and it can be false on the next line before you ever mention it because some thread scribbled over it. Pointers are ok. Pointer arithmetic is not ok except in the limited cases where the compiler can prove that it's correct. If the compiler can't prove it, then you should rewrite your code; or in the worst case annotate that area of code with an assumption that doubles as an assert; so that you can go find this place right away when your program crashes.
I assume kernels would be subject to the same kind of "random build" procedure. I can on.j09nxk
*core dumped*
Address Space Layout Randomization. Hey look, your program's are randomized in memory -each- time you load them. A randomizing compiler will only protect you from malware that modifies binaries, at which point you've already been h@x0rd lol. Next.
This is what polymorphic software does, and I think you'll find it on pretty much every computer that's part of a botnet.
By this measure, botnet software should be really difficult to detect and compromise -- and yet it isn't.
Also, it's worth noting that while government-sponsored and targeted attacks would be more difficult using this method, most malware depends on whatever the current security flaws are and/or human failure to initially get its foot in the door.
And the logic path wouldn't be changing, even if the compiled structure was randomized.
Plus, I think you'll find that many AM scanners these days include "doesn't follow the structure of a standard compiler" as one of the major red flags in looking for malware.
Viruses in nature mutate randomly. Computer viruses don't.
Computer virus designers are intelligent, hostile, and evil in intent.
If there's a way around it, they'll find it and it's game over.
Besides, many if not most attack vectors wouldn't care a whit - tricking a user into executing code would still work, SQL injection, cross site scripting...
This seems to me the wrong level for software diversity, too low. A bug in the source will be executed in all variants (think sql injection), while an exploit that depends on particular bytes in particular locations can already be made difficult by ASLR.
What about having higher level protocols that the software of a given category must adhere to, and various programs that treat data according to those protocols? You know, like that internet thing before the prevalence of web2.0 megasites, or like posix. Then every piece of malware cannot do universal damage and every botnet has to deal with a different host configuration.
---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
would never introduce bugs of it's own now, would it? Programmers don't want their compilers to get this prettified. This is a feature no one is asking for.
Nor do I view this approach as particularly effective against malware, which is it's stated purpose.
They use lists of known file hashes to search for files unique to your computer. If this were done they would have to examine every file.
..would a professor of CompSci think this is a good idea, despite the hundreds of problems it *causes* with existing practices and procedures? Oh, wait.. maybe because the idea is patented and he'll get paid a lot.
http://www.google.com/patents/...
As an employee of the University of California a professor is *required* to report any discovery or method that *might* be patentable to the University.
The University takes it from there, it has an office that researches viability, handles the process and then licenses the patents to "industry". With respect to licensing small local companies are given a better deal than larger internationals. As for the licensing fees collected, 50% goes to the University, 25% to the department (UC Irvine's Computer Science department in this case) and 25% to the employee(s).
At least that is how it was a few years ago when I was a grad student at UC.
"Inspired by the natural resistance offered to pathogens by genetically diverse host populations, Dr Michael Franz at UCI suggests that common software be similarly hardened against attack by generating a unique executable for each install."
..
What a good idea, isn't this what they did with the Space Shuttle
"Microcode is a layer of hardware-level instructions or data structures involved in the implementation of higher level machine code instructions in central processing units" ref.
What kinds of bugs do you think would manifest earlier using this technique ...
The GP mentioned a randomized stack. An uninitialized variable would be one, something that often accidentally has a value that does no harm (a zero possibly).
... and why do you think that earlier manifestation of that class of bugs will outweigh the tremendous burden of chasing down all the heisenbugs that only occur on some small percentage of randomized builds?
You do realize that your argument for the status quo and not dealing with the "heisenbugs" is essentially arguing to leave a coding bug in place? Recompiling will not necessarily introduce new bugs, rather change the behavior of existing bugs.
I've seen many of the sort of bugs this recompiling technique may expose, I spent some years porting software between different architectures. Not only did we have different compilers but we had different target CPUs. It was a friggin awesome environment for exposing unnoticed bugs. Software that had run reliably under internal testing for weeks on its original platform failed immediately when run on a second platform. And it kept failing immediately after several crashing bugs were fixed. The original developers, who were actually quite skilled, looked at several of the bugs eventually found and wondered how the program ever ran at all. I've seen this repeated on multiple teams at multiple companies over the years.
Also developers working on one platform eventually learned to visit a colleague working on the "other" platform when they had a bug that was hard to reproduce. There was a good chance that a hard to manifest bug on one platform would be easier to reproduce on the other.
There is nothing like cross platform development to help shake out bugs.
This recompilation idea would seem to offer some of these same benefits. Yes it complicates reproducibility of crashes in the field but if one can get a recompilation seed with that crash dump/log its more like of dealing with an extra step not some impossible hurdle.
Plus recompiling with a different seed each time the developer does a test run at their workstation could help find bugs in the first place, reducing the occurrences of these pesky crashes in the field.
I'm not saying these proposed recompilations in the field are definitely a good idea, just that the negatives seem to be exaggerated. It looks like something interesting, worth looking into a bit more.
Instructions unclear, anus stuck in ceiling fan.
Considering most delivery mechanisms consists of an exact duplication, this will definitely increase costs to market.
I mean how the costs don't outweight the benefits. Dammit, I always proof-read what i think I wrote, not what I actually wrote.
> This might reduce the value of MD5 sums, though.
Much to the contrary. Now, you could no longer download twice and compare. Instead you would be forced to compare to the MD5 sum.
As a professional software tester let me be the first to say noooooooooooo !
[site]
This doesn't fix the problem. It makes the chances of exploitation a bit smaller, on a "per-try" basis.
Back in the old days, some daemons or setuid programs would do insecure things with /tmp. So the hacker would make a program: /etc/passwd. Get it to add "\nmyroot::0:0::::\n" to make the system allow you to login as root without a password....)
target = "/tmp/somefile";
while (1) {
unlink (target);
link ("/etc/passwd", target);
unlink (target);
link ("/tmp/myfile", target);
}
The daemon would check access permissions of the "target", hopefully after the last line in the loop, then open and write the target, hopefully after the second line inside the loop. Leave this running, trigger the target app, and you get the target app to write somewhere where it shouldn't (in this case
The same applies to this stack/compiler randomization tricks: The hacker first tries at a slow pace, but instead of hacking your system, fails to get in because he's crashing your service deamon. You notice your service going down every day or so. Buggy software. Stupid randomization! No time to fix, and you make the daemon restart automatically. And bingo! Now the hacker can try thousands of times!
In cryptography, care has been taken that you can't figure out one of the "bits" of the key by a simple search. So that the exponential search (find the key among 2^256 possible keys) does not become "256 times: find bit n". To guarantee that no "bit leaking" will happen in a buggy program is very, very difficult: The designers of the program don't know where the bug is, the compiler doesn't know where the bug is, but the attacker does!
So... if this goes mainstream, the hackers will find a way to extract little bits of knowledge of the randomization, determine what the actual randomization was, and then attack the service as usual.
Of course, there will be cases where say: the time for the attack is increased beyond the attack-detection-time. So instead of the attack being succesful, the attack might be detected and averted.
Anyway, I much rather have something that actually WORKS instead of "has a chance of working". But maybe that's just me.
huh. this sounds very similar to the theoretical virus designs i came up with many years ago. yes, you heard right: turn it round. instead of the programs on the computer being randomised so that they are resistant to malware attacks, randomise the *malware* so that it is resistant to *anti-virus* detection. the model is basically the flu or common cold virus.
here's where it gets interesting: comparing the use of randomisation in malware vs randomisation in defense against malware, it's probably going to start being used in malware before it gets used in defending against malware. why? because malware attackers have nothing to lose. unfortunately, they are likely to keep their compilers secret. even *more* unfortunately, successful creation of anti-malware randomising compilers means that the malware attackers can use them as well.
but, that is just a risk that has to be taken, and make sure a decent job is done of it.
"Have you tried recompiling it?"
I dont use Windows neither for desktop, nor for my servers.
"IT Department. Have you tried randomizing your compiler?"
I mean how the costs don't outweight the benefits. Dammit, I always proof-read what i think I wrote, not what I actually wrote.
Me too. That is when I bother to proofread. :-)
On /. too, years ago (2005) with SELF-CHECKING executables (very easy to do & yes: It works - compressed/packed exe + sizecheck @ startup technique)-> http://it.slashdot.org/comment...
Every single one of my programs since 1997 have done this & yes it works... even this offering of mine lately also:
APK Hosts File Engine 9.0++ 32/64-bit:
http://start64.com/index.php?o...
APK
P.S.=> Simply by having an app essentially check its size (or CRC32 etc.) @ startup (various routines for that exist in most std. Win32/64 PE's) DOWN TO THE BYTE-SIZE LEVEL & IF IT CHANGES EVEN BY 1 BYTE - stop the program + signal the user of this change OR WHATEVER YOU CHOOSE AS THE APPROPRIATE MEASURE in that case (potentially created by malware odds are, attaching to the .exe file itself)!
THUS, you can STOP traditional viruses from EVER taking hold (by altering jump tables & attaching code @ the tail-end of an .exe), period...
... apk
A way I suggested in "Coding for DEFCON" on /. in 2005: SELF-CHECKING executables (very easy to do & yes: It works - compressed/packed exe + sizecheck @ startup technique)-> http://it.slashdot.org/comment...
Every single one of my programs since 1996 have done this & yes it works... even this offering of mine lately also:
APK Hosts File Engine 9.0++ 32/64-bit:
http://start64.com/index.php?o...
APK
P.S.=> Simply by having an app check its size (or CRC32 etc.) @ startup (various routines for that exist in most std. Win32/64 PE's) ,b>DOWN TO THE BYTE-SIZE LEVEL & IF IT CHANGES EVEN BY 1 BYTE - stop the program + signal the user of this change OR WHATEVER YOU CHOOSE AS THE APPROPRIATE MEASURE in that case (potentially created by malware odds are, attaching to the .exe file itself)!
THUS, you can STOP traditional viruses from EVER taking hold (by altering jump tables & attaching code @ the tail-end of an .exe), period...
... apk
A way I suggested in "Coding for DEFCON" on /. in 2005: SELF-CHECKING executables (very easy to do & yes: It works - compressed/packed exe + sizecheck @ startup technique)-> http://it.slashdot.org/comment...
Every single one of my programs since 1996 have done this & yes it works... even this offering of mine lately also:
APK Hosts File Engine 9.0++ 32/64-bit:
http://start64.com/index.php?o...
APK
P.S.=> Simply by having an app check its size (or CRC32 etc.) @ startup (various routines for that exist in most std. Win32/64 PE's) DOWN TO THE BYTE-SIZE LEVEL & IF IT CHANGES EVEN BY 1 BYTE - stop the program + signal the user of this change OR WHATEVER YOU CHOOSE AS THE APPROPRIATE MEASURE in that case (potentially created by malware odds are, attaching to the .exe file itself)!
THUS, you can STOP traditional viruses from EVER taking hold (by altering jump tables & attaching code @ the tail-end of an .exe), period...
... apk
On /. in 2005: SELF-CHECKING executables (very easy to do & yes: It works - compressed/packed exe + sizecheck @ startup technique)-> http://it.slashdot.org/comment...
Every single one of my programs since 1996 have done this & yes it works... even this offering of mine lately also:
---
APK Hosts File Engine 9.0++ 32/64-bit:
http://start64.com/index.php?o...
---
* This technique would greatly assist in stalling this part of the malicious code problem if every executable did this to itself (possibly eliminating the need for antivirus software altogether).
APK
P.S.=> Simply by having an app check its size (or CRC32 etc.) @ startup (various routines for that exist in most std. Win32/64 PE's) DOWN TO THE BYTE-SIZE LEVEL & IF IT CHANGES EVEN BY 1 BYTE - stop the program + signal the user of this change OR WHATEVER YOU CHOOSE AS THE APPROPRIATE MEASURE in that case (potentially created by malware odds are, attaching to the .exe file itself)!
THUS, you can STOP traditional viruses from EVER taking hold (by altering jump tables & attaching code @ the tail-end of an .exe), period...
... apk
Would each instance (ie: download) of the software need to be signed using the private key by the author? Wouldn't that make the whole process more expensive and cumbersome?