Slashdot Mirror


Computer Scientists Develop 'Mathematical Jigsaw Puzzles' To Encrypt Software

another random user writes "The claim here is that the encrypted software can be executed, but not reverse-engineered. To quote from the article: 'UCLA computer science professor Amit Sahai and a team of researchers have designed a system to encrypt software so that it only allows someone to use a program as intended while preventing any deciphering of the code behind it. According to Sahai, previously developed techniques for obfuscation presented only a "speed bump," forcing an attacker to spend some effort, perhaps a few days, trying to reverse-engineer the software. The new system, he said, puts up an "iron wall," making it impossible for an adversary to reverse-engineer the software without solving mathematical problems that take hundreds of years to work out on today's computers — a game-change in the field of cryptography.'"

5 of 245 comments (clear)

  1. I Call BS by MightyMartian · · Score: 5, Insightful

    I'm sure they can further obfuscate the actual code, but at the end of the day the processor is going to have to run machine code, and one way or the other you can tap the processor's activity to read the "decrypted" code. Beyond that, I imagine the performance penalties involved will be monstrous. Even normal obfuscation techniques have pretty heavy penalties.

    --
    The world's burning. Moped Jesus spotted on I50. Details at 11.
    1. Re:I Call BS by Anonymous Coward · · Score: 5, Interesting

      Jigsawing is not encryption, it's splitting things up into shared object libraries (.dll's) that the linker then has to "reassemble" as the program is loaded into memory where it is executable ie the program counter can be pointed at it and let run. No the best obsfuscation I have yet seen is X86 assembler and even that doesn't run without a ton of overhead but... disassemblers exist for it. /. is right to scoff A) the write up is contradictory B) the article makes no sense because in order to run the code the processor has to solve the puzzle so you just stick it in an emulator and hit record. C) this looks like a school and or corporate overlords hyping an unpublished paper which many of us recognize as one way peer reviewers are pressured into signing off on snake oil D) who claims that this is the first attempt at obsfuscating code? This thing: http://www.ioccc.org/ is on it's 22nd year.

    2. Re:I Call BS by greg1104 · · Score: 5, Informative

      You can make the algorithm as complex as you'd like, but at the end of the day, you have an input and output(s). You may decide to take a long time to get there, but at the end of the day, I know what you did when the code ran.

      This is referred to as "black-box" reverse engineering in the paper. You know what the code did for one input. And if you injected every possible input to the program, at the end you'd have worked out a complete specifications for the function. But how long will that take? It's not always "the end of the day". For functions with a wide range of inputs, it could take the life of the universe to map them all unless you know what you're looking for in advance.

      Right now, obfuscation approaches for software usually have some small chocking point to attack. It might be an encryption wrapper around the main binary. Bust that with a debugger, you can get to unobfuscated code for the main system, and then really start to work your way through the program.

      But if you have to fight this every step of the way, where all you do is inject inputs and get an output to figure out what the program does, it will take you forever to reverse engineer things. That's the claim of the paper. The code itself is so obfuscated that you can't read it straight and understand it, ever. It looks like random junk. All you can do is run it with an input and see what comes back, and from that reverse engineer what it does. Assemble enough of those and you can see how the program really works. But that black box teardown process--determining possible inputs, injecting them, collecting outputs, and then deriving the function behavior--is time intensive enough that it may not be practical to actually do it anymore. You don't learn anything from reversing any component that speeds handling the next; you have to attack them all like this.

      There's a great line from the seminal paper on this subjectOn the (Im)possibility of Obfuscating Programs: "Any programmer knows that total unintelligibility is the natural state of computer programs (and one must work hard in order to keep a program from deteriorating into this state)" Extracting meaning from source code, being able to predict what some lines of code will do, it's hard. Ideally you'll just be able to read the code, make sense of it, and then reverse from there. Most systems that are thoroughly cracked have some sections where it's hard, but once you get those the remaining code is straightforward to read. And in fact, it's impossible to make something where you cannot reverse it. The question though is how long it will take.

      If no sense is ever made of the code, you can only apply the "black-box" reverse engineering, where you inject inputs, watch outputs, and from there determine what the code does. You can't prevent that, but you can make the box so big that such work is impossible to do. That's what this technique tries to accomplish. You never find an easy part to the code you can read; all you ever find are ones where you have to map every input into an output to figure out what the code does.

  2. Re:Deciphering != Reverse Engineering by wierd_w · · Score: 5, Insightful

    One way around this (for reverse engineering) would simply be to run it inside a VM with a built in stack trace debugger, like Bochs.

    You can peek at the raw instructions percolating through the VM's emulated CPU that way. The application itself is then the decryption key, since the system has to be able to run the application.

    PITA, but I don't see how this is anything more than just a "bigger" speedbump, at least as far as disassembly goes. Now, making said decryption require nasty bits of hardware to thwart using a VM, and clinging tightly to treacherous computing with a TPM and a hardware hypervisor on top of this? That's more dicey.

    The question here is... why do this? Is preventing Cracker Jack from the warez scene from simply disabling your "authenticity checks" so horrifically important? That's the major application for this that I really see, besides making epically horrific malware. (Though if you ask me, they are both the same thing.)

    Seriously, other than making personal computing become something from communist russia, what is the benefit of this?

  3. What this means. I think. by Animats · · Score: 5, Interesting

    This is fascinating, but hard to understand. It's not clear how broad a result this is.

    This seems to apply to "circuits", which are loop-free arrangements of AND, OR and NOT gates. These take in some bit string with a fixed number of bits and emit another bit string with a fixed (but possibly different) number of bits. Many computations can be encoded into this form, but the "circuits" are typically very deep, since this is sort of like loop unwinding. Although this seems kind of limited, you could, for example, make a reasonably sized "circuit" to compute DES, which is a static pattern of Boolean operations.

    "Obfuscation" here means taking in some "circuit" and producing a bit for bit equivalent "circuit" from which little information can be extracted. The obfuscated circuit may be (will be?) more complex than the original, because additional logic has been inserted in a somewhat random way.

    The contribution in this paper seems to be that this might be useful for "functional encryption". This is where you have some values, such as A and B, which are encrypted and are to be protected, but the result of some function f(A,B) is not protected. The idea is to have an implementation of f which combines the decryption and the desired function, an implementation which cannot be reverse engineered to return decrypted versions of A or B. While this has been suggested as a possible feature for databases, it's hard to apply to useful functions, and so far, it's mostly a research idea there. It's been suggested for some complex access control schemes involving mutual mistrust, but that too is a research topic.

    Anyway, this doesn't mean you're going to be able to run your big executable program through some kind of magic obfuscator that will prevent someone from figuring out how it works. Not yet, anyway.