Slashdot Mirror


DARPA Delving Into the Black Art of Super Secure Software Obfuscation

coondoggie writes Given enough computer power, desire, brains, and luck, the security of most systems can be broken. But there are cryptographic and algorithmic security techniques, ideas and concepts out there that add a level of algorithmic mystification that could be built into programs that would make them close to unbreakable. That's what the Defense Advanced Research Projects Agency (DARPA) wants for a new program called "Safeware." From DARPA: “The goal of the SafeWare research effort is to drive fundamental advances in the theory of program obfuscation and to develop highly efficient and widely applicable program obfuscation methods with mathematically proven security properties.”

17 of 124 comments (clear)

  1. Good luck with that. by jcochran · · Score: 3, Insightful

    The objective of "mathematically proven security properties" via program obfuscation is definitely not achievable. After all, it's a given security principle of "security through obfuscation" is unsupportable. If an adversary is capable of obtaining the executable of a program, they can also reverse engineer that same executable. It may take a lot of effort, but it is always achievable.

    1. Re:Good luck with that. by roman_mir · · Score: 2, Insightful

      OTOH all security is by obscurity, what is a password if not a piece of data that is obscured from most people and supposedly is only known by the one that owns it?

    2. Re:Good luck with that. by Racemaniac · · Score: 2

      How sure are you? If you can execute the program, that still doesn't mean you can predict exactly what it does and understand everything it could possibly do.
      A simple example of checking a password: you can see that the program hashes it and checks it against the hash it should be, doesn't mean you know what the right password is to get beyond it.
      Even if you can execute the program, triggering every possible machine state to analyze it is impossible for non trivial programs. And i'm wondering what they could come up with to obfuscate the program logic that would make it computationally very hard to look into the program :).

    3. Re:Good luck with that. by Fwipp · · Score: 5, Informative

      Well, something that is obscure is just something that's hard to read. A password is supposed to be hidden, and not seen at all. "Security through obscurity" is the idea that they'll be able to see your algorithms, just not figure it out.

    4. Re:Good luck with that. by IamTheRealMike · · Score: 5, Informative

      The objective of "mathematically proven security properties" via program obfuscation is definitely not achievable. After all, it's a given security principle of "security through obfuscation" is unsupportable. If an adversary is capable of obtaining the executable of a program, they can also reverse engineer that same executable. It may take a lot of effort, but it is always achievable.

      That is the standard consensus view in the software industry, yes. I'm afraid to tell you though, that it's wrong.

      Last year there was a mathematical breakthrough in the field of what is called "indistinguishability obfuscation". This is a mathematical approach to program obfuscation which has sound theoretical foundations. This line of work could in theory yield programs whose functioning cannot be understood no matter how skilled the reverse engineer is.

      It is important to note here a few important caveats. The first is that iO (to use the cryptographers name) is presently a theoretical technique. A new paper came out literally 5 days ago that claims to discuss an implementation of the technique but I haven't read it yet. Will do so after posting this comment. Indeed, it seems nobody is quite sure how to make it work with practical performance at this time.

      The second caveat is that the most well explored version of it only applies to circuits which can be seen as a kind of pure functional program only. Actually a circuit is closer to a mathematical formula than a real program e.g. you cannot write circuits in C or any other programming language we mortals are familiar with. Researchers are now starting to look at the question of obfuscating "RAM programs" i.e. programs that look like normal imperative programs written in dialects of, say, C. But this work is still quite early.

      The third caveat is that because the techniques apply to pure functions only, they cannot do input or output. This makes them somewhat less than useful for obfuscation of the sort of programs that are processed with commercial obfuscators today like video games.

      Despite those caveats the technique is very exciting and promising for many reasons, none of which have to do with DRM. For example iO could provide a unifying framework for all kinds of existing cryptographic techniques, and enable cryptographic capabilities that were hereto only conjectured. For example timelock crypto can be implemented using and iO obfuscator and Bitcoin.

    5. Re:Good luck with that. by Fwipp · · Score: 3, Informative

      Fortunately, Merriam Webster is not the final and complete authority on the connotations of words, nor on how they are used within specialized disciplines.

    6. Re:Good luck with that. by IamTheRealMike · · Score: 5, Informative

      OK, I read the paper.

      The money quote is at the end:

      The evaluation results from Section 4 show that work still needs to be done before program obfuscation is usable in practice. In fact, the most complex function we obfuscate with meaningful security is a 16-bit point function, which contains just 15 AND gates. Even such a simple function requires about 9 hours to obfuscate and results in an obfuscation of 31.1 GB. Perhaps more importantly (since the obfuscation time is a one-time cost), evaluating the 16-bit point obfuscation on a single input takes around 3.3 hours. However, it is important to note that the fact that we can produce any “useful” obfuscations at all is surprising. Also, both obfuscation and evaluation are embarrassingly parallel and thus would run significantly faster with more cores (the largest machine we experimented on had 32 cores).

      Translated into programmer English, a "16 bit point function" is basically a mathematical function that yields either true or false depending on the input. It would correspond to the following C++ function prototype:

      bool point_function(short input);

      In other words you can hide a 16-bit "password" inside such a function and discover if you got a match or not. Obviously, obfuscating such a function is just a toy to experiment with. "SHA256(x) == y" is also a point function and one that can be implemented in any programming language with ease - short of brute forcing it, there is no way to break such an "obfuscated point function". Thus using this technique doesn't presently make a whole lot of sense. However, it's a great base to build on.

      I should note that the reference to AND gates above doesn't mean that the program is an arbitrary circuit - it means that the "program" which is being obfuscated is in fact a boolean formula. Now you can translate boolean circuits into boolean formulas, but often only at great cost. And regular programs can only be translated into circuits at also a great cost. So you can see how far away from practicality we are. Nonetheless, just last year the entire idea that you could do this at all seemed absurd, so to call the progress so far astonishing would be an understatement. Right now the field of iO is developing so fast that the paper's authors note that whilst they were writing it, new optimisations were researched and published, so there are plenty of improvements left open for future work.

    7. Re:Good luck with that. by disambiguated · · Score: 2

      Without ASLR, return to x exploits are trivial for all x. All that is needed is the address of a function (or any code -- it doesn't have to be the 'official' entry point of a function) that does something useful to the attacker, and a way to clobber the stack.

      This doesn't really have anything to do with libc, except that it is a rich source of well known addresses (without ASLR). So what in the hell are you talking about?

    8. Re:Good luck with that. by ShanghaiBill · · Score: 2

      I imagine that self-altering program code could become incredibly hard to analyze and unravel.

      There are many tricks to make it hard. Some CISC instruction sets have variable length instructions, and allow instructions to start at any byte offset. So you can have the same string of bytes execute a completely different sequence of instructions depending on the offset of the entry point. This can make dis-assembly very challenging. I have heard from my biologist friends that DNA sometimes does the same thing, with the same DNA sequence encoding different proteins depending on the offset of the start codon.

    9. Re:Good luck with that. by bluefoxlucid · · Score: 2

      ASLR changes the security issue from "trivial, undetectable remote access or privilege escalation" to "trivial, deafeningly loud denial-of-service."

      I can write shellcode that hijacks a function, spawns a thread, and creates a controlled jump back to an earlier function, simulating a successful return and allowing the program to continue--a separate thread downloads and mmap()s in a shared object, which provides all the exploit functions without even spawning a new process, even so far as to open a temporary file and then unlink() it and fill it and mmap() the fd so that malware code never goes into a bluntly visible file. This strategy allows httpd or proftpd or any other target with a remote code execution flaw to continue running uninterrupted, while itself hosting a malicious agent.

      If the program spawns fork()ed copies, I'd have to repeatedly crash the program--I'd return to the incorrect location, due to not knowing the stack or library or anonymous segment or heap or main executable base addresses--as I chewed iteratively through millions of connections. This is incredibly visible: dmesg shows millions of segfaults, and a smart HIDS could delay and restrict connections from addresses which had connected (initiated) to any socket handled by the crashing process (parent-child fork()-without-execve() relationship). 64-bit systems take this up to the order of 10^30, an infeasible number of attempts even if the system's HIDS and administrator are inattentive; while programs which replace workers after hundreds of executions are facing independent randomized events.

      ASLR turns failed security and perfect compromise into less-than-perfect security and less-than-perfect compromise. The cost is effectively null: the moment there's an unused CPU nanosecond, the system recovers from the slowdowns imposed by ASLR. It's on the order of pulling an additional number out of memory (a passively-gathered entropy pool kept in a ring buffer).

      One of the biggest benefits of ASLR, heap protection, and stack smash protection is the rapid exposure of security flaws: any attempt has an overwhelming probability of exposing the attack, directing eyes toward the bug. ASLR, double-free, and canary protections often trigger on harmless errors which allow the program to continue running under normal conditions, aren't an attack, and happen during normal work; ASLR is the least tended to do this (because let's face it: if you miss a JMP by a byte, you miss it by a hundred miles), while the other two similar guards are very prone to cause crashes when you fuck up. Likewise, banning the mapping of memory that is simultaneously writable and executable causes errant writes to do this a lot (and makes some exploits flat-out impossible, instead of impossibly unlikely).

  2. security through obscurity by JustNiz · · Score: 2

    I'm amazed that someone who supposedly knows what they are doing would even suggest this.
    Program obfuscation is completely the wrong approach. It is just another mechanism that relies on security through obscurity, which has been proven time and again to be a short-term solution at best.
    When something is actually secure, it's readability should be irrelevant.

    1. Re:security through obscurity by Anonymous Coward · · Score: 2, Informative

      Security through obscurity can work to a point. *IF* you make it hard enough.

      Take for example Raiden II. That game has only recently (in the past month) been 'cracked'. Even though only sorta. There is no encryption. It is all just bundled into a 'cop' chip.

      The point though with their 'security' was not to never be cracked. But just make it a big enough pain in the ass that the bootlegers didnt copy the game for a long time. You could argue it took nearly 20 years to crack. Not bad for security through obscurity.

      The entire emulation scene is basically cracking all those systems. To them a badge of honor is to actually accurately emulate the 'security'.

      With security it is *never* 100%. It is just a matter of time and money.

      Even if I come up with something that 100% provably secure. I am not going to show you how it works. I am not going to make it in any way easier. The obscurity is not for the security. It is mostly to waste your time. If that takes long enough the message may longer be worth decoding.

  3. waiving my consultancy fee today by ne0n · · Score: 2, Funny

    Pro tip for DARPA: use perl, hand out the source. Same end result but probably a few reverser suicides along the way.

    --
    $ :(){ :|:& };:
  4. Thats good by Sla$hPot · · Score: 2, Interesting

    But how do you scan code for back doors, trojans, viruses, malware, bots etc.?

  5. If a compiler can... by carlhaagen · · Score: 3, Insightful

    ...interpret the obfuscated source code, then why wouldn't a human be able to?

  6. Re:Not Security Through Obscurity by gweihir · · Score: 2

    And before I forget: These techniques are excellent to hide backdoors and such, and thereby make software much, much less secure. That may be the real intent. After all, you do not want some vigilante to find the secret government backdoors in everything.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  7. Re:In other word programmer pretty much are right by cryptizard · · Score: 2

    What they are trying to construct (and at least partially succeeding), though, is a cryptographic construct whereby you can feed input in one end and "iterate" the computation, but not know what computation you are actually doing. Imagine that every time you do any operation on two variables, you actually do all possible operations (i.e. multiply, add, shift, etc.) and only one output is stored. The trick is that which one is actually kept is hidden from you cryptographically. That is a very crude metaphor for what they are doing, I suggest reading the paper for the details. It's actually very well written. The point is, however, that this technique is much more complicated and more powerful than obfuscation that people are traditionally familiar with, and it really does have the potential to do what you describe as being impossible.