Computer Scientists Develop 'Mathematical Jigsaw Puzzles' To Encrypt Software

← Back to Stories (view on slashdot.org)

Computer Scientists Develop 'Mathematical Jigsaw Puzzles' To Encrypt Software

Posted by samzenpus on Wednesday July 31, 2013 @04:32PM from the looking-for-edges dept.

another random user writes "The claim here is that the encrypted software can be executed, but not reverse-engineered. To quote from the article: 'UCLA computer science professor Amit Sahai and a team of researchers have designed a system to encrypt software so that it only allows someone to use a program as intended while preventing any deciphering of the code behind it. According to Sahai, previously developed techniques for obfuscation presented only a "speed bump," forcing an attacker to spend some effort, perhaps a few days, trying to reverse-engineer the software. The new system, he said, puts up an "iron wall," making it impossible for an adversary to reverse-engineer the software without solving mathematical problems that take hundreds of years to work out on today's computers — a game-change in the field of cryptography.'"

20 of 245 comments (clear)

Min score:

Reason:

Sort:

I Call BS by MightyMartian · 2013-07-31 16:36 · Score: 5, Insightful

I'm sure they can further obfuscate the actual code, but at the end of the day the processor is going to have to run machine code, and one way or the other you can tap the processor's activity to read the "decrypted" code. Beyond that, I imagine the performance penalties involved will be monstrous. Even normal obfuscation techniques have pretty heavy penalties.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
1. Re:I Call BS by MightyMartian · 2013-07-31 16:46 · Score: 4, Insightful
  
  Unless the software can magically tlel that it's running in a debugger or a sandbox and refuse to execute, it's activity; from stack activity to system calls to memory allocations can be traced.
  
  --
  The world's burning. Moped Jesus spotted on I50. Details at 11.
2. Re:I Call BS by 0123456 · 2013-07-31 16:52 · Score: 4, Informative
  
  When I was writing Windows device drivers, we so, really loved stupid games that detected a debugger and refused to run.
  Fortunately most developers stopped doing that once they realised that their customers would have to wait for bug-fixes until they'd reported it to us and then we'd contacted the developers and they'd sent us a special version without the retarded code that stopped us debugging the problem.
3. Re:I Call BS by Anonymous Coward · 2013-07-31 17:04 · Score: 5, Interesting
  
  Jigsawing is not encryption, it's splitting things up into shared object libraries (.dll's) that the linker then has to "reassemble" as the program is loaded into memory where it is executable ie the program counter can be pointed at it and let run. No the best obsfuscation I have yet seen is X86 assembler and even that doesn't run without a ton of overhead but... disassemblers exist for it. /. is right to scoff A) the write up is contradictory B) the article makes no sense because in order to run the code the processor has to solve the puzzle so you just stick it in an emulator and hit record. C) this looks like a school and or corporate overlords hyping an unpublished paper which many of us recognize as one way peer reviewers are pressured into signing off on snake oil D) who claims that this is the first attempt at obsfuscating code? This thing: http://www.ioccc.org/ is on it's 22nd year.
4. Re:I Call BS by phantomfive · 2013-07-31 17:10 · Score: 4, Informative
  
  The title is wrong. It's not talking about encrypting the software, it's talking about obfuscating the software. They put the compiled code through a function that obfuscates it thoroughly. Their function is complex enough that de-obfuscating the code (that is, returning it to the original form) would be computationally expensive. The paper also talks about encryption, but that is a different part of the paper.
  
  Which isn't to say you can't look at some variable in RAM, realize it is the boolean for 'authorized,' and then overwrite it. So it's essentially a complex obfuscation technique, which may or may not be useful. (That is my understanding of the article and abstract, if I am wrong please correct me).
  
  --
  "First they came for the slanderers and i said nothing."
5. Re:I Call BS by MightyMartian · 2013-07-31 17:44 · Score: 4, Insightful
  
  That's because VMs really don't try to hide themselves from the guest. While it might be pretty hard to build a VM that did a good enough job to essentially fool software attempting to identify whether its physical or virtualized hardware that it's running on, we have the source for a number of VMs (ie. KVM and Xen) and if this kind of obfuscated software started showing up on the market, I'm sure there would be a much greater push to make a rock solid virtualized environment that mimicked physical hardware with much more fidelity.
  
  --
  The world's burning. Moped Jesus spotted on I50. Details at 11.
6. Re:I Call BS by MikeBabcock · 2013-07-31 17:59 · Score: 3, Insightful
  
  This exactly -- virus writers have been at the forefront of code that hides and obfuscates itself and VM type systems were developed to essentially run the code to determine its effects without actually running the code.
  So long as the code can be executed, it can be reverse-engineered.
  
  --
  - Michael T. Babcock (Yes, I blog)
7. Re:I Call BS by Half-pint+HAL · 2013-07-31 20:42 · Score: 4, Informative
  
  Not exactly. Reverse engineering starts with the analysis of a running program with the goal of obtaining enough data about the locations of executable code vs data, major variable locations/identifiers etc to allow you to start running automated disassembly of the total code. All you can get out of a run-time analysis is a specific execution path, which does not embody the full code.
  Now yes, "if your CPU can read it, then you can read it", but unfortunately in this situation, your CPU can only read it in certain circumstances, so you'll only be able to read it in those same circumtances: execution or simulated execution, leading back to the situation where you're stuck looking at traces of specific executions...
  
  --
  Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
8. Re:I Call BS by greg1104 · 2013-07-31 20:46 · Score: 5, Informative
  
  You can make the algorithm as complex as you'd like, but at the end of the day, you have an input and output(s). You may decide to take a long time to get there, but at the end of the day, I know what you did when the code ran.
  This is referred to as "black-box" reverse engineering in the paper. You know what the code did for one input. And if you injected every possible input to the program, at the end you'd have worked out a complete specifications for the function. But how long will that take? It's not always "the end of the day". For functions with a wide range of inputs, it could take the life of the universe to map them all unless you know what you're looking for in advance.
  Right now, obfuscation approaches for software usually have some small chocking point to attack. It might be an encryption wrapper around the main binary. Bust that with a debugger, you can get to unobfuscated code for the main system, and then really start to work your way through the program.
  But if you have to fight this every step of the way, where all you do is inject inputs and get an output to figure out what the program does, it will take you forever to reverse engineer things. That's the claim of the paper. The code itself is so obfuscated that you can't read it straight and understand it, ever. It looks like random junk. All you can do is run it with an input and see what comes back, and from that reverse engineer what it does. Assemble enough of those and you can see how the program really works. But that black box teardown process--determining possible inputs, injecting them, collecting outputs, and then deriving the function behavior--is time intensive enough that it may not be practical to actually do it anymore. You don't learn anything from reversing any component that speeds handling the next; you have to attack them all like this.
  There's a great line from the seminal paper on this subjectOn the (Im)possibility of Obfuscating Programs: "Any programmer knows that total unintelligibility is the natural state of computer programs (and one must work hard in order to keep a program from deteriorating into this state)" Extracting meaning from source code, being able to predict what some lines of code will do, it's hard. Ideally you'll just be able to read the code, make sense of it, and then reverse from there. Most systems that are thoroughly cracked have some sections where it's hard, but once you get those the remaining code is straightforward to read. And in fact, it's impossible to make something where you cannot reverse it. The question though is how long it will take.
  If no sense is ever made of the code, you can only apply the "black-box" reverse engineering, where you inject inputs, watch outputs, and from there determine what the code does. You can't prevent that, but you can make the box so big that such work is impossible to do. That's what this technique tries to accomplish. You never find an easy part to the code you can read; all you ever find are ones where you have to map every input into an output to figure out what the code does.
9. Re:I Call BS by msobkow · 2013-07-31 22:51 · Score: 4, Informative
  
  I know a fellow who worked on a system that would take IBM 360 machine code, convert it into graph format, and re-engineer it as C/C++ code. They made a killing on Y2K, because so many companies didn't have the source code for some of their software any more.
  So yes, I call BS as well. Computationally complex, yes, as graph theory often is. But far from impossible to reverse engineer.
  If it can execute, it can be reverse engineered. Only the most obtuse and purposefully warped hand-written assembler can even approach the ability to hide an algorithm from deep inspection, and even that can be largely overcome by applying graph theory to restructure the code graphs.
  
  --
  I do not fail; I succeed at finding out what does not work.
Deciphering != Reverse Engineering by Dynedain · 2013-07-31 16:37 · Score: 3, Informative

Deciphering/Decrypting is not the same thing as Reverse Engineering.
Reverse Engineering is duplicating the functionality without seeing the source code. That should still be possible if you have the ability to run the program.

--
I'm out of my mind right now, but feel free to leave a message.....
1. Re:Deciphering != Reverse Engineering by wierd_w · 2013-07-31 16:52 · Score: 5, Insightful
  
  One way around this (for reverse engineering) would simply be to run it inside a VM with a built in stack trace debugger, like Bochs.
  You can peek at the raw instructions percolating through the VM's emulated CPU that way. The application itself is then the decryption key, since the system has to be able to run the application.
  PITA, but I don't see how this is anything more than just a "bigger" speedbump, at least as far as disassembly goes. Now, making said decryption require nasty bits of hardware to thwart using a VM, and clinging tightly to treacherous computing with a TPM and a hardware hypervisor on top of this? That's more dicey.
  The question here is... why do this? Is preventing Cracker Jack from the warez scene from simply disabling your "authenticity checks" so horrifically important? That's the major application for this that I really see, besides making epically horrific malware. (Though if you ask me, they are both the same thing.)
  Seriously, other than making personal computing become something from communist russia, what is the benefit of this?
2. Re:Deciphering != Reverse Engineering by IamTheRealMike · 2013-07-31 22:30 · Score: 3, Informative
  
  Yes, it is robust. I read the paper a few days ago.
  All these comments about how you can "just look at the CPU instructions" are made by people who haven't been following developments in the field. The program never gets decrypted into CPU instructions. Heck, it was never even compiled into CPU instructions in the first place. It gets compiled into a form of boolean circuit, a mathematical equivalent of an electronic circuit that is composed of AND, NOT, OR, XOR gates and wires between them. Then that circuit is itself again transformed into a series of matrices and at that point I hit the limit of what I could understand without needing to read some of the cited papers.
  This is a very, very complicated technique that builds upon decades of cryptographic research. If they say it's secure in the cryptographic sense, I think it's very likely to be so.
Victory for virus writers by technosean · 2013-07-31 16:38 · Score: 4, Insightful

Who gains from this? Most of all? Virus writers.
Perhaps the real solution is to have the OS never execute code looking anything like this.
Re:Seems improbable by MightyMartian · 2013-07-31 16:44 · Score: 3, Insightful

Yup. At the end of the day, if this is at all useful and the hardware and OSs out there now, it;s still going to have to execute, and if it executes, you can run it through a debugger and watch it.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
The original paper by HatofPig · 2013-07-31 16:56 · Score: 4, Informative

The original paper is available here.
Cryptology ePrint Archive: Report 2013/451
Candidate Indistinguishability Obfuscation and Functional Encryption for all circuits
Sanjam Garg and Craig Gentry and Shai Halevi and Mariana Raykova and Amit Sahai and Brent Waters
Abstract: In this work, we study indistinguishability obfuscation and functional encryption for general circuits:
Indistinguishability obfuscation requires that given any two equivalent circuits C_0 and C_1 of similar size, the obfuscations of C_0 and C_1 should be computationally indistinguishable.
In functional encryption, ciphertexts encrypt inputs x and keys are issued for circuits C. Using the key SK_C to decrypt a ciphertext CT_x = Enc(x), yields the value C(x) but does not reveal anything else about x. Furthermore, no collusion of secret key holders should be able to learn anything more than the union of what they can each learn individually.
We give constructions for indistinguishability obfuscation and functional encryption that supports all polynomial-size circuits. We accomplish this goal in three steps:
- We describe a candidate construction for indistinguishability obfuscation for NC1 circuits. The security of this construction is based on a new algebraic hardness assumption. The candidate and assumption use a simplified variant of multilinear maps, which we call Multilinear Jigsaw Puzzles.
- We show how to use indistinguishability obfuscation for NC1 together with Fully Homomorphic Encryption (with decryption in NC1) to achieve indistinguishability obfuscation for all circuits.
- Finally, we show how to use indistinguishability obfuscation for circuits, public-key encryption, and non-interactive zero knowledge to achieve functional encryption for all circuits. The functional encryption scheme we construct also enjoys succinct ciphertexts, which enables several other applications.
Category / Keywords: public-key cryptography / Obfuscation, Functional Encryption, Multilinear Maps
Date: received 20 Jul 2013, last revised 21 Jul 2013
Contact author: amitsahai at gmail com
Available format(s): PDF | BibTeX Citation

--
Silicon & Charybdis McLuhan Kildall Papert Kay
What this means. I think. by Animats · 2013-07-31 18:22 · Score: 5, Interesting

This is fascinating, but hard to understand. It's not clear how broad a result this is.
This seems to apply to "circuits", which are loop-free arrangements of AND, OR and NOT gates. These take in some bit string with a fixed number of bits and emit another bit string with a fixed (but possibly different) number of bits. Many computations can be encoded into this form, but the "circuits" are typically very deep, since this is sort of like loop unwinding. Although this seems kind of limited, you could, for example, make a reasonably sized "circuit" to compute DES, which is a static pattern of Boolean operations.
"Obfuscation" here means taking in some "circuit" and producing a bit for bit equivalent "circuit" from which little information can be extracted. The obfuscated circuit may be (will be?) more complex than the original, because additional logic has been inserted in a somewhat random way.
The contribution in this paper seems to be that this might be useful for "functional encryption". This is where you have some values, such as A and B, which are encrypted and are to be protected, but the result of some function f(A,B) is not protected. The idea is to have an implementation of f which combines the decryption and the desired function, an implementation which cannot be reverse engineered to return decrypted versions of A or B. While this has been suggested as a possible feature for databases, it's hard to apply to useful functions, and so far, it's mostly a research idea there. It's been suggested for some complex access control schemes involving mutual mistrust, but that too is a research topic.
Anyway, this doesn't mean you're going to be able to run your big executable program through some kind of magic obfuscator that will prevent someone from figuring out how it works. Not yet, anyway.
Other aspects of the paper - health data by Badge+17 · 2013-07-31 18:27 · Score: 3, Interesting

I can't really comment on the slashdot summary, but take a look at the actual abstract: http://eprint.iacr.org/2013/451
"In functional encryption, ciphertexts encrypt inputs x and keys are issued for circuits C. Using the key SK_C to decrypt a ciphertext CT_x = Enc(x), yields the value C(x) but does not reveal anything else about x. Furthermore, no collusion of secret key holders should be able to learn anything more than the union of what they can each learn individually."
In other words, it seems that their technique allows you to encrypt some secret data, then (provably) only release the result of some arbitrary function of that data. It sounds like this means you could (in principle) release health data in encrypted form, then allow researchers to study some ensemble properties of it by giving out the appropriate keys. This aspect of it certainly seems pretty cool.
misunderstanding by Anonymous Coward · 2013-07-31 18:50 · Score: 3, Interesting

either you misunderstand or i do, but my understanding was that this system allows a function to be expressed so that the code will only execute with certain parameters, hence the function can only compute certain evaluations (lets say one for simplicity) of the original function, and the true function can be hidden. so that the function evaluated for other parameters cannot be computed with this expression.
basically a really complex way of designing a look up table ?
now as you say, you can never hide the value of the function evaluated at the parameters for allowable parameters, because you can stick it in an emulator but the emulator tells you nothing more about the function and so you might as well just 'run' the whole procedure cause you cant extract anything other than the answer, to the function evaluated at a predetermined set of values.
this is achieved by making the original universal (or with domain expanded over the single or fixed set or parameters) algorithm into the 'jigsaw' that will only assemble with the particular parameters which are 'allowed', as these parameters direct the fitting of the jigsaw.
so can the jigsaw be fit with the 'allowed' parameters and still evaluate the original algorithm for different values? i believe the point of the algorithm is to achieve the same result as a look up table, and if it doesnt do it more efficiently then there is no point to the algorithm.
Hurry up, Europe is hungry for your fines by c0lo · 2013-07-31 18:59 · Score: 4, Interesting

Sell a program protected like this in Europe and you may end paying hundreds of millions:

(14) A person having a right to use a computer program should not be prevented from performing acts necessary to observe, study or test the functioning of the program, provided that those acts do not infringe the copyright in the program.
(15) [...]Nevertheless, circumstances may exist when such a reproduction of the code and translation of its form are indispensable to obtain the necessary information to achieve the interoperability of an independently created program with other programs.
It has therefore to be considered that, in these limited circumstances only, performance of the acts of reproduction and translation by or on behalf of a person having a right to use a copy of the program is legitimate and compatible with fair practice and must therefore be deemed not to require the authorisation of the rightholder. An objective of this exception is to make it possible to connect all components of a computer system, including those of different manufacturers, so that they can work together. [...].

--
Questions raise, answers kill. Raise questions to stay alive.