Computer Scientists Develop 'Mathematical Jigsaw Puzzles' To Encrypt Software
another random user writes "The claim here is that the encrypted software can be executed, but not reverse-engineered. To quote from the article: 'UCLA computer science professor Amit Sahai and a team of researchers have designed a system to encrypt software so that it only allows someone to use a program as intended while preventing any deciphering of the code behind it. According to Sahai, previously developed techniques for obfuscation presented only a "speed bump," forcing an attacker to spend some effort, perhaps a few days, trying to reverse-engineer the software. The new system, he said, puts up an "iron wall," making it impossible for an adversary to reverse-engineer the software without solving mathematical problems that take hundreds of years to work out on today's computers — a game-change in the field of cryptography.'"
I'm sure they can further obfuscate the actual code, but at the end of the day the processor is going to have to run machine code, and one way or the other you can tap the processor's activity to read the "decrypted" code. Beyond that, I imagine the performance penalties involved will be monstrous. Even normal obfuscation techniques have pretty heavy penalties.
The world's burning. Moped Jesus spotted on I50. Details at 11.
Deciphering/Decrypting is not the same thing as Reverse Engineering.
Reverse Engineering is duplicating the functionality without seeing the source code. That should still be possible if you have the ability to run the program.
I'm out of my mind right now, but feel free to leave a message.....
Clearly the journalist had no idea what they were writing about.
Who gains from this? Most of all? Virus writers.
Perhaps the real solution is to have the OS never execute code looking anything like this.
If they really think it is so good, then they should put their money where their mouth is.
Make it into a crackme, issue a large award for solving it.
Post it online. I give it a few weeks max, if that.
And who is to say it can't still be manipulated once running?
Think of the performance cost.
Either way, I have no faith in an article with little details.
Yup. At the end of the day, if this is at all useful and the hardware and OSs out there now, it;s still going to have to execute, and if it executes, you can run it through a debugger and watch it.
The world's burning. Moped Jesus spotted on I50. Details at 11.
Call the kernel to access files, sockets, etc.
Also unless the developer is super 31337 and likes to write everything I expect shares library calls too.
By watching calls to those interfaces we can figure out what it does.
Cryptology ePrint Archive: Report 2013/451
Candidate Indistinguishability Obfuscation and Functional Encryption for all circuits
Sanjam Garg and Craig Gentry and Shai Halevi and Mariana Raykova and Amit Sahai and Brent Waters
Abstract: In this work, we study indistinguishability obfuscation and functional encryption for general circuits:
Indistinguishability obfuscation requires that given any two equivalent circuits C_0 and C_1 of similar size, the obfuscations of C_0 and C_1 should be computationally indistinguishable.
In functional encryption, ciphertexts encrypt inputs x and keys are issued for circuits C. Using the key SK_C to decrypt a ciphertext CT_x = Enc(x), yields the value C(x) but does not reveal anything else about x. Furthermore, no collusion of secret key holders should be able to learn anything more than the union of what they can each learn individually.
We give constructions for indistinguishability obfuscation and functional encryption that supports all polynomial-size circuits. We accomplish this goal in three steps:
- We describe a candidate construction for indistinguishability obfuscation for NC1 circuits. The security of this construction is based on a new algebraic hardness assumption. The candidate and assumption use a simplified variant of multilinear maps, which we call Multilinear Jigsaw Puzzles.
- We show how to use indistinguishability obfuscation for NC1 together with Fully Homomorphic Encryption (with decryption in NC1) to achieve indistinguishability obfuscation for all circuits.
- Finally, we show how to use indistinguishability obfuscation for circuits, public-key encryption, and non-interactive zero knowledge to achieve functional encryption for all circuits. The functional encryption scheme we construct also enjoys succinct ciphertexts, which enables several other applications.
Category / Keywords: public-key cryptography / Obfuscation, Functional Encryption, Multilinear Maps
Date: received 20 Jul 2013, last revised 21 Jul 2013
Contact author: amitsahai at gmail com
Available format(s): PDF | BibTeX Citation
Silicon & Charybdis McLuhan Kildall Papert Kay
This is fascinating, but hard to understand. It's not clear how broad a result this is.
This seems to apply to "circuits", which are loop-free arrangements of AND, OR and NOT gates. These take in some bit string with a fixed number of bits and emit another bit string with a fixed (but possibly different) number of bits. Many computations can be encoded into this form, but the "circuits" are typically very deep, since this is sort of like loop unwinding. Although this seems kind of limited, you could, for example, make a reasonably sized "circuit" to compute DES, which is a static pattern of Boolean operations.
"Obfuscation" here means taking in some "circuit" and producing a bit for bit equivalent "circuit" from which little information can be extracted. The obfuscated circuit may be (will be?) more complex than the original, because additional logic has been inserted in a somewhat random way.
The contribution in this paper seems to be that this might be useful for "functional encryption". This is where you have some values, such as A and B, which are encrypted and are to be protected, but the result of some function f(A,B) is not protected. The idea is to have an implementation of f which combines the decryption and the desired function, an implementation which cannot be reverse engineered to return decrypted versions of A or B. While this has been suggested as a possible feature for databases, it's hard to apply to useful functions, and so far, it's mostly a research idea there. It's been suggested for some complex access control schemes involving mutual mistrust, but that too is a research topic.
Anyway, this doesn't mean you're going to be able to run your big executable program through some kind of magic obfuscator that will prevent someone from figuring out how it works. Not yet, anyway.
I can't really comment on the slashdot summary, but take a look at the actual abstract: http://eprint.iacr.org/2013/451
"In functional encryption, ciphertexts encrypt inputs x and keys are issued for circuits C. Using the key SK_C to decrypt a ciphertext CT_x = Enc(x), yields the value C(x) but does not reveal anything else about x. Furthermore, no collusion of secret key holders should be able to learn anything more than the union of what they can each learn individually."
In other words, it seems that their technique allows you to encrypt some secret data, then (provably) only release the result of some arbitrary function of that data. It sounds like this means you could (in principle) release health data in encrypted form, then allow researchers to study some ensemble properties of it by giving out the appropriate keys. This aspect of it certainly seems pretty cool.
either you misunderstand or i do, but my understanding was that this system allows a function to be expressed so that the code will only execute with certain parameters, hence the function can only compute certain evaluations (lets say one for simplicity) of the original function, and the true function can be hidden. so that the function evaluated for other parameters cannot be computed with this expression.
basically a really complex way of designing a look up table ?
now as you say, you can never hide the value of the function evaluated at the parameters for allowable parameters, because you can stick it in an emulator but the emulator tells you nothing more about the function and so you might as well just 'run' the whole procedure cause you cant extract anything other than the answer, to the function evaluated at a predetermined set of values.
this is achieved by making the original universal (or with domain expanded over the single or fixed set or parameters) algorithm into the 'jigsaw' that will only assemble with the particular parameters which are 'allowed', as these parameters direct the fitting of the jigsaw.
so can the jigsaw be fit with the 'allowed' parameters and still evaluate the original algorithm for different values? i believe the point of the algorithm is to achieve the same result as a look up table, and if it doesnt do it more efficiently then there is no point to the algorithm.
(14) A person having a right to use a computer program should not be prevented from performing acts necessary to observe, study or test the functioning of the program, provided that those acts do not infringe the copyright in the program.
(15) [...]Nevertheless, circumstances may exist when such a reproduction of the code and translation of its form are indispensable to obtain the necessary information to achieve the interoperability of an independently created program with other programs.
It has therefore to be considered that, in these limited circumstances only, performance of the acts of reproduction and translation by or on behalf of a person having a right to use a copy of the program is legitimate and compatible with fair practice and must therefore be deemed not to require the authorisation of the rightholder. An objective of this exception is to make it possible to connect all components of a computer system, including those of different manufacturers, so that they can work together. [...].
Questions raise, answers kill. Raise questions to stay alive.
You are missing the point of code being data. What it means in this context is that there is a duality between functions and the data they operate on. In other words, you can think of data as operating on the code instead of code operating on data. So your data becomes your maps (in the mathematical sense of the word map) and your functions become the mapped objects.
Any guest worker system is indistinguishable from indentured servitude.
Remember, kids, everything that can be applied for good can be applied for ill. And code that is impossible to decipher and analyze is the holy grail of malware.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
OK as an example what if you have worked out a way to transform an arbitrary program in to a huge bunch of "if else", goto statements with "magic numbers" or worse (e.g.
setaddress(mod(magic1+sha256memoizer(magic2+parm1+parm2+...),dataend))=parm3;
linenumber=mod(magic3+sha256memoizer(magic4+parm5+parm6+...),maxlines);
goto linenumber;
) and some stores and loads so that it's still equivalent and does the same thing but just a bit slower. So when you disassemble it - it's still a big mess that makes it hard to figure out or "improve" - which is the whole purpose of at least some code obfuscation.
So perhaps this bunch have figured out how to do something like this (and maybe better).
Or maybe they're just converting the program to encrypted perl and then compiling it to run on a custom vm ;). Just because you can see the final instructions doesn't mean you'd be much closer to figuring out the code.
For example if someone made a new cryptographic hash function and obfuscates it in one of the ways I mentioned. Would it really be so easy to reverse engineer it, just by seeing the instructions given to the processor? I don't think it would be so simple. More so if you need to reverse engineer it to the extent that you can write a detailed enough description and specification so that someone else could create a clean room implementation.
It sounds like what they are trying to do here is to refactor the algorythm such that f(x)=y produces the same answer but it's not practical to modify because the new function is mathematically scrambled-- the effect of each component of the code is so obscured that it's not so easy to tell how it contributes to the results. It's like using a neural net to implement your solution, the contribution of each neural node to the overall result can be significantly obscure. Won't stop someone from stealing the algorythm as is, but hacking it to produce an alternate result set will be out of the question, as long as they can't just build a truth table of inputs to outputs...
I call BS on the notion that my CPU is smarter than me. Article claims that it takes "hundreds of years" to break this. Obviously then it must take the program "hundreds of years" to run. Otherwise the CPU is using a short-cut. All I have to do is figure out what the short cut is (hint: figure out what the CPU is doing, where it got its instructions from) and it's cracked. If your computer can run it, you can figure it out. It's that simple.
By this argument, PGP, RSA et al must be worthless too, because if it takes hundreds of years to break a 256-bit cypher it must take hundreds of years to run a 256-bit cypher. You know, if your computer can run it, you can figure it out.
I hope you can see the flaw in your argument now.
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
I implemented exactly this circa 1990 to protect a small database of disambiguation rules structured as a hash table. A random value obtained from the security dongle was intermixed into the hash function and hash check condition. This was not done once for each possible lookup as defined in a conventional database. It was done once for each feasible answer for each possible lookup. The code had a statistical model of feasible answers. For some queries the number of feasible answers was excessive (too many dongle interactions) so we created a heuristic that was correct 99% of the time and set aside the 1% for a second pass with an additional data structure. If the dongle wasn't present the set of feasible answers was incorrectly narrowed with the expected statistical distribution. The members of that distribution, however, were entirely wrong.
We built up more complex queries from smaller queries. We were actually building a tree where every path in the tree was a valid answer and the majority of leaf nodes were at depths 2-4. That we hit a leaf node was a bit of metadata from the hash table lookup, which would be wrong if the dongle wasn't installed.
How about a quick forward description. Start with an alphabet of 50 symbols and construct the tree of all strings of length one to six. Every node in the tree has a flag about whether that node terminates a valid string and some additional bits about the correct orthography of the string as expressed in the user input, when typed. Your database is a subtree of this tree with about 100,000 strings (problems were so much smaller 25 years ago) along with a couple of bits of metadata per leaf. It's pretty sparse compared to the 15 billion possible leaf nodes.
The database subtree is actually constructed by elimination. One dongle assisted hash probe tells you whether a descending edge from your current vertex leads to a non-empty subtree (further solutions with your current path in the tree as a proper prefix). In addition, the user input defines another subtree of everything that could possibly matter to the conversion being performed. What you are computing is the intersection of these two subtrees: the tree corresponding to the task at hand and the tree corresponding to all solutions possible. Because the hash table was decomposed on the principles of minimum description length, when the dongle was absent (or corrupted) you still get an answer with much of the expected statistical distribution.
Except for one thing. The hash check was imperfect and you would get some false positives. We set up the rate of false positives so that the set of false positives grew exponentially as you descended to deeper levels. We knew from the statistical structure of the user input that few elements of this phantom solution set would interact negativity in practice even though the phantom set vastly out-numbered the legitimate set. Further, if one tried to enumerate the tree exhaustively using an incorrect dongle hash function, the tree you would reconstruct had no depth limit. It grew exponentially in size forever. We knew there was a depth limit when correctly probed, but this was nowhere expressed in our program code. In fact, this could be used to reverse engineer the correct hash function: only the correct hash function enumerates to a finite set of 100,000 subtrees. Just iterate over the set of all possible hash functions, in some well-structured enumeration order, until you discover this condition. Bingo, you're done.
Not all of the phantom space was harmless
Besides, the NSA is holding all the keys anyway lol.
Seven puppies were harmed during the making of this post.