Computer Scientists Develop 'Mathematical Jigsaw Puzzles' To Encrypt Software
another random user writes "The claim here is that the encrypted software can be executed, but not reverse-engineered. To quote from the article: 'UCLA computer science professor Amit Sahai and a team of researchers have designed a system to encrypt software so that it only allows someone to use a program as intended while preventing any deciphering of the code behind it. According to Sahai, previously developed techniques for obfuscation presented only a "speed bump," forcing an attacker to spend some effort, perhaps a few days, trying to reverse-engineer the software. The new system, he said, puts up an "iron wall," making it impossible for an adversary to reverse-engineer the software without solving mathematical problems that take hundreds of years to work out on today's computers — a game-change in the field of cryptography.'"
I'm sure they can further obfuscate the actual code, but at the end of the day the processor is going to have to run machine code, and one way or the other you can tap the processor's activity to read the "decrypted" code. Beyond that, I imagine the performance penalties involved will be monstrous. Even normal obfuscation techniques have pretty heavy penalties.
The world's burning. Moped Jesus spotted on I50. Details at 11.
Deciphering/Decrypting is not the same thing as Reverse Engineering.
Reverse Engineering is duplicating the functionality without seeing the source code. That should still be possible if you have the ability to run the program.
I'm out of my mind right now, but feel free to leave a message.....
so..the next Stuxnet will be untraceable , cool
Clearly the journalist had no idea what they were writing about.
Who gains from this? Most of all? Virus writers.
Perhaps the real solution is to have the OS never execute code looking anything like this.
If they really think it is so good, then they should put their money where their mouth is.
Make it into a crackme, issue a large award for solving it.
Post it online. I give it a few weeks max, if that.
And who is to say it can't still be manipulated once running?
Think of the performance cost.
Either way, I have no faith in an article with little details.
I can't imagine any circumstance where an execution trace could not be used to reconstruct the original behavior of the code, no matter how obfuscated it is.
and programs will continue to run like it's 1987
Or is there an original unscrambled source code retained somewhere?
From page 7 of the paper: "While our current obfuscation construction runs in polynomialtime, it is likely too inefficient for most practical problems. An important objective is to improve the efficiency for use in practical applications"
Call the kernel to access files, sockets, etc.
Also unless the developer is super 31337 and likes to write everything I expect shares library calls too.
By watching calls to those interfaces we can figure out what it does.
Cryptology ePrint Archive: Report 2013/451
Candidate Indistinguishability Obfuscation and Functional Encryption for all circuits
Sanjam Garg and Craig Gentry and Shai Halevi and Mariana Raykova and Amit Sahai and Brent Waters
Abstract: In this work, we study indistinguishability obfuscation and functional encryption for general circuits:
Indistinguishability obfuscation requires that given any two equivalent circuits C_0 and C_1 of similar size, the obfuscations of C_0 and C_1 should be computationally indistinguishable.
In functional encryption, ciphertexts encrypt inputs x and keys are issued for circuits C. Using the key SK_C to decrypt a ciphertext CT_x = Enc(x), yields the value C(x) but does not reveal anything else about x. Furthermore, no collusion of secret key holders should be able to learn anything more than the union of what they can each learn individually.
We give constructions for indistinguishability obfuscation and functional encryption that supports all polynomial-size circuits. We accomplish this goal in three steps:
- We describe a candidate construction for indistinguishability obfuscation for NC1 circuits. The security of this construction is based on a new algebraic hardness assumption. The candidate and assumption use a simplified variant of multilinear maps, which we call Multilinear Jigsaw Puzzles.
- We show how to use indistinguishability obfuscation for NC1 together with Fully Homomorphic Encryption (with decryption in NC1) to achieve indistinguishability obfuscation for all circuits.
- Finally, we show how to use indistinguishability obfuscation for circuits, public-key encryption, and non-interactive zero knowledge to achieve functional encryption for all circuits. The functional encryption scheme we construct also enjoys succinct ciphertexts, which enables several other applications.
Category / Keywords: public-key cryptography / Obfuscation, Functional Encryption, Multilinear Maps
Date: received 20 Jul 2013, last revised 21 Jul 2013
Contact author: amitsahai at gmail com
Available format(s): PDF | BibTeX Citation
Silicon & Charybdis McLuhan Kildall Papert Kay
If the goal is to make it impractical for someone WITHOUT the ability to monitor the computer including the CPU "from the inside" they may be on to something.
If I can monitor the system at a level of detail where I understand what each "step" does, then I can just put the pieces together and I'll know what's going on. If I'm a debugger that can monitor the CPU and the rest of the system in a way that isn't visible to the code I'm trying to snoop on, it's pretty much game over, I win.
Here's where it gets interesting:
If two computers are collaborating on a task and I have debugger-access to one but I see the other as a "black box" you could design a system in which the fact that I have perfect knowledge of what is going on in computer A doesn't give me a huge amount of insight into what task the two computers are collaborating on and how they are doing it.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
So long as we're able to see the instructions being given to the processor (hint: we pretty much always can), you can reverse engineer the code. Performance penalties or not, the entire thing is pointless if it doesn't achieve what it's set out to do, and you don't have to be an expert in this area to see the hole that's big enough to steer a panamax-class cargo carrier through. Despite that, the problem went wholly unaddressed in the summary and I'm not seeing any comments here indicating otherwise, which means that the article isn't worth consideration except if I have some time to kill (I don't).
I just scanned the original paper, reserving its detailed lecture for another moment. But it is one of those things that make me think: "Damn ! Wish I had thought of this myself..."
Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
Yeah I agree, there is something fundamentally wrong with the claims being made. If I have byte code, I can rebuild loops and conditionals with a decompiler. Sure, I don't have comments or var names, but those can be guessed or worked out in something less than 'several hundred years'.
Suppose you build something that throws in a lot of crazy random jmp calls to make this harder, and I cant be bothered to re-construct a program I want to steal. At some point a single Boolean decision hits the call stack that says, 'Is the app unlocked, yes or no?', and that conditional jmp can be replaced so it is cracked. Unless this guy has come up with a super magic method to keep me out of my own computer, I can crack whatever crackpot protection scheme he cooks up.
DRM doesn't work, on a theoretical level, because you cannot keep people from having access to the data at every single point in the delivery chain. Code is just data. So......QUACK QUACK QUACK.
HA! I just wasted some of your bandwidth with a frivolous sig!
[T]he article isn't worth consideration except if I have some time to kill (I don't).
TL;DR therefore you are able to dismiss it out of hand. Gotcha.
I'm not seeing any comments here indicating otherwise
Now why do you think might that be? I believe you are telling the truth. You really haven't looked at the actual paper.
You run the executable...
You ask kernel to stop executing it...
You dump the memory...
Voila - you have the unencrypted executable...
This process, including writing the tools for it, will take a person who knows what hes doing around 5 minutes... (if the program is large, it might take longer due to disk write speeds)...
Yes, they can obfuscate the assembly, but it still will be the assembly - perfectly human readable.
It might be pain to reverse engineer the whole program, but it can be done. But in most cases I've seen the hacker doesn't want to reverse engineer the whole program, he just wants to alter it a little / extract some crucial information from it (i.e. private keys). Obfuscation doesn't make this harder at all - You find some interesting OS level calls (i.e. socket creation - you cant obfuscate that...) and using debugger/stack traces/assembly/hooks you poke around a bit to find the part that is interesting to you...
From security point of view, assembly encryption (no matter how good it is) is comparable to covering your house with packing paper to prevent thieves from entering...
encryption used for DVDs was believed to be unbreakable, until it was broken. How many companies have released encryption schemes, for DRM or otherwise and it gets cracked within hours of actual release. more than once Microsoft encryption has been cracked before its official release.
Though I haven't studied this particular one, my general impression is that it was not designed by cryptography experts and then vetted the way well known cryptographic algorithms are vetted by other experts.
Eventually those who don't understand maths will be freed of being slaves to those that do. Could this be the first step?
Time is what keeps everything from happening all at once.
It's called Forth
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
Still doesn't change the fact ...
You assume it's a fact. When I was reading science at university (late 70s/early 80s) a fact one of my lecturers impressed upon us was that it would never be possible for computers ever to decipher handwriting or even in anything other than rigorously standardised print, because we all wrote letter ever so slight differently.
The claim here is that there is a way to fool a decompiler and still have a functionally useful program. And the paper will show you how. You can refute it all of course, all you need do is to show us the maths.
This is fascinating, but hard to understand. It's not clear how broad a result this is.
This seems to apply to "circuits", which are loop-free arrangements of AND, OR and NOT gates. These take in some bit string with a fixed number of bits and emit another bit string with a fixed (but possibly different) number of bits. Many computations can be encoded into this form, but the "circuits" are typically very deep, since this is sort of like loop unwinding. Although this seems kind of limited, you could, for example, make a reasonably sized "circuit" to compute DES, which is a static pattern of Boolean operations.
"Obfuscation" here means taking in some "circuit" and producing a bit for bit equivalent "circuit" from which little information can be extracted. The obfuscated circuit may be (will be?) more complex than the original, because additional logic has been inserted in a somewhat random way.
The contribution in this paper seems to be that this might be useful for "functional encryption". This is where you have some values, such as A and B, which are encrypted and are to be protected, but the result of some function f(A,B) is not protected. The idea is to have an implementation of f which combines the decryption and the desired function, an implementation which cannot be reverse engineered to return decrypted versions of A or B. While this has been suggested as a possible feature for databases, it's hard to apply to useful functions, and so far, it's mostly a research idea there. It's been suggested for some complex access control schemes involving mutual mistrust, but that too is a research topic.
Anyway, this doesn't mean you're going to be able to run your big executable program through some kind of magic obfuscator that will prevent someone from figuring out how it works. Not yet, anyway.
I can't really comment on the slashdot summary, but take a look at the actual abstract: http://eprint.iacr.org/2013/451
"In functional encryption, ciphertexts encrypt inputs x and keys are issued for circuits C. Using the key SK_C to decrypt a ciphertext CT_x = Enc(x), yields the value C(x) but does not reveal anything else about x. Furthermore, no collusion of secret key holders should be able to learn anything more than the union of what they can each learn individually."
In other words, it seems that their technique allows you to encrypt some secret data, then (provably) only release the result of some arbitrary function of that data. It sounds like this means you could (in principle) release health data in encrypted form, then allow researchers to study some ensemble properties of it by giving out the appropriate keys. This aspect of it certainly seems pretty cool.
You don't need to emulate the hardware to see a program's output, you can just look at your screen. There is little point in hiding the output of a program from the user who's running it so I don't see that as the point (if one cannot observe the output of a program, it is of little utility).
Which instructions a given program executes depends on the inputs to said program. For any given input, most programs only execute a tiny portion of their code. Therefore, in order to completely reverse engineer a program, you would have to observe the output for all possible inputs. This is almost certainly intractable for any sufficiently complex program. Suppose you wrote a program to iterate over all possible inputs observing the output of each input; things get interesting if you liken this to the halting problem.
either you misunderstand or i do, but my understanding was that this system allows a function to be expressed so that the code will only execute with certain parameters, hence the function can only compute certain evaluations (lets say one for simplicity) of the original function, and the true function can be hidden. so that the function evaluated for other parameters cannot be computed with this expression.
basically a really complex way of designing a look up table ?
now as you say, you can never hide the value of the function evaluated at the parameters for allowable parameters, because you can stick it in an emulator but the emulator tells you nothing more about the function and so you might as well just 'run' the whole procedure cause you cant extract anything other than the answer, to the function evaluated at a predetermined set of values.
this is achieved by making the original universal (or with domain expanded over the single or fixed set or parameters) algorithm into the 'jigsaw' that will only assemble with the particular parameters which are 'allowed', as these parameters direct the fitting of the jigsaw.
so can the jigsaw be fit with the 'allowed' parameters and still evaluate the original algorithm for different values? i believe the point of the algorithm is to achieve the same result as a look up table, and if it doesnt do it more efficiently then there is no point to the algorithm.
You just know that some crack DRM whore like UBISoft will incorporate this as part of their next game release.
Donte Alistair Anderson Roberts - hi son!
Karma: Chameleon
But UCLA doesn't get to claim the credit. MIT was first to present homomorphic encryption: http://web.mit.edu/newsoffice/2013/algorithm-solves-homomorphic-encryption-problem-0610.html
Roughly speaking, homomorphism is a map which preservers the properties pertinent to a category. Now think of data as acting on code instead of code acting on data. Since "acting" is a mapping, acting in a homomorphic way would produce program results which are equivalent but without the decrypting step.
Any guest worker system is indistinguishable from indentured servitude.
can any explain how exactly this will be helpful ....
(14) A person having a right to use a computer program should not be prevented from performing acts necessary to observe, study or test the functioning of the program, provided that those acts do not infringe the copyright in the program.
(15) [...]Nevertheless, circumstances may exist when such a reproduction of the code and translation of its form are indispensable to obtain the necessary information to achieve the interoperability of an independently created program with other programs.
It has therefore to be considered that, in these limited circumstances only, performance of the acts of reproduction and translation by or on behalf of a person having a right to use a copy of the program is legitimate and compatible with fair practice and must therefore be deemed not to require the authorisation of the rightholder. An objective of this exception is to make it possible to connect all components of a computer system, including those of different manufacturers, so that they can work together. [...].
Questions raise, answers kill. Raise questions to stay alive.
Remember, kids, everything that can be applied for good can be applied for ill. And code that is impossible to decipher and analyze is the holy grail of malware.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Well, DUH!
Be honest. Imagine you have something like that. Where would you present it?
At a tech conference, where techs are present who can't throw money at you but can crack it?
Or at a management conference where the second part of the above statement is inverted?
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
OK as an example what if you have worked out a way to transform an arbitrary program in to a huge bunch of "if else", goto statements with "magic numbers" or worse (e.g.
setaddress(mod(magic1+sha256memoizer(magic2+parm1+parm2+...),dataend))=parm3;
linenumber=mod(magic3+sha256memoizer(magic4+parm5+parm6+...),maxlines);
goto linenumber;
) and some stores and loads so that it's still equivalent and does the same thing but just a bit slower. So when you disassemble it - it's still a big mess that makes it hard to figure out or "improve" - which is the whole purpose of at least some code obfuscation.
So perhaps this bunch have figured out how to do something like this (and maybe better).
Or maybe they're just converting the program to encrypted perl and then compiling it to run on a custom vm ;). Just because you can see the final instructions doesn't mean you'd be much closer to figuring out the code.
For example if someone made a new cryptographic hash function and obfuscates it in one of the ways I mentioned. Would it really be so easy to reverse engineer it, just by seeing the instructions given to the processor? I don't think it would be so simple. More so if you need to reverse engineer it to the extent that you can write a detailed enough description and specification so that someone else could create a clean room implementation.
It sounds like what they are trying to do here is to refactor the algorythm such that f(x)=y produces the same answer but it's not practical to modify because the new function is mathematically scrambled-- the effect of each component of the code is so obscured that it's not so easy to tell how it contributes to the results. It's like using a neural net to implement your solution, the contribution of each neural node to the overall result can be significantly obscure. Won't stop someone from stealing the algorythm as is, but hacking it to produce an alternate result set will be out of the question, as long as they can't just build a truth table of inputs to outputs...
I call BS on the notion that my CPU is smarter than me. Article claims that it takes "hundreds of years" to break this. Obviously then it must take the program "hundreds of years" to run. Otherwise the CPU is using a short-cut. All I have to do is figure out what the short cut is (hint: figure out what the CPU is doing, where it got its instructions from) and it's cracked. If your computer can run it, you can figure it out. It's that simple.
Seven puppies were harmed during the making of this post.
I give it a month before someone cracks it.
Seven puppies were harmed during the making of this post.
Where can we buy/download the package?
So long as we're able to see the instructions being given to the processor (hint: we pretty much always can), you can reverse engineer the code.
True. As long as you are able to follow every possible path of execution of the code. Which puts us in the realms of "theoretically possible, but unfeasible in practice".
How do we normally reverse engineer code? We run the code through a debugger/logger to identify contiguous areas of code and data, and to highlight common destination points for branch conditions and data addresses frequently read from/written to. That information is used to inform the automated stages of disassembly. Probablistic models see something that looks like a branch to a common destination, and assume that it is one. They see what looks like a write to a common destination and assume it is one. Encrypt your addresses and hash them to appear different and the hints are removed from the object code, and only identifiable during execution or simulated execution, which brings us back to unfeasible-in-practice exhaustive execution.
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
I call BS on the notion that my CPU is smarter than me. Article claims that it takes "hundreds of years" to break this. Obviously then it must take the program "hundreds of years" to run. Otherwise the CPU is using a short-cut. All I have to do is figure out what the short cut is (hint: figure out what the CPU is doing, where it got its instructions from) and it's cracked. If your computer can run it, you can figure it out. It's that simple.
By this argument, PGP, RSA et al must be worthless too, because if it takes hundreds of years to break a 256-bit cypher it must take hundreds of years to run a 256-bit cypher. You know, if your computer can run it, you can figure it out.
I hope you can see the flaw in your argument now.
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
You're claiming to be able to reverse engineer an entire cookbook just by chemically analysing a cookie from one recipe in that book. You'd have to make every single recipe in the book and analyse it before you had a complete representation of all the recipes.
which is totally what she said
yeah well release a working proof and make 5 billions.
I'm not shitting you, if you can really do even a simple proof of concept that does that then you could bag a lot of money next week.
I can accept that you could do it this system so that it will spit out right answers for right predetermined inputs but nothing else, and even then you could see it do it's thing with that input while it might not provide you the outputs for other inputs, but that's pointless as a program for most any uses a computer program has, except as for hiding data.
it's an extraordinary claim so it needs quite ordinary proof: a regular fucking program that implements the method. conceptually the method should work with any language anyways, so might just as well be in basic.
world was created 5 seconds before this post as it is.
The obivious difference between the obfuscated code and an an PGP-Message is that in the PGP case you need a key to decrypt it. ...
Unless they have a secret key baked into the CPU there is no advantage for the CPU whatsoever
How is this different? The public key for a machine-readable PGP message isn't "baked into the CPU", but that doesn't make PGP trivially easy to break -- far from it.
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
I do see major difference. The patent you cite is based upon DES, the authors' work is not, as they develop their own encoding technique, on the basis of multilinear maps. Moreover, in the patent you cite it is assumed that one function can have n outputs, whereas in the authors' work, each function has only 1 output. I am still going over both the original peer-reviewed paper and the US patent you cite. The latter seems too amateuristic to be taken seriously by any corporation willing to implement an obfuscator, whereas the authors of this paper are researchers renowned in their field.
Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
RTFA is shallow on details, but it sounds very much like the "phantom" self-mutating MS-DOS viruses from the later 90. (The ones which forced anti-virus vendors to effectively implement a debugger/a interpreter for the code to detect the malicious code not by the look but by the actual work it does.)
Otherwise, the main problems with such intrusive obfuscations is that code is never 100% bug free and bugs sometimes trigger undefined/unspecified behaviors. It might work before obfuscation - but would break (in unexplainable, obfuscated fashion) afterwards. (Ditto after system updates.)
All hope abandon ye who enter here.
Back in the days OS's ran off of floppies (Amiga OS for example) there were programs that would compress an executable to save space. This made me think of that using encryption instead. Now in that case on 7Mhz machines. The speed of decompression in that case was close to the time saved by loading less data from the floppy drive. Now with Ghz of speed instead of Mhz and it needing to be run once, I am not convinced the speed loss would be that great. However with VM's and the like it's too easy to stop and look at the executable that way - nothing can stop that.
Besides, the NSA is holding all the keys anyway lol.
Seven puppies were harmed during the making of this post.
No, what you're describing is looking at a finished product and inferring the process of how it was created. That's very difficult.
What GP is referring to is watching someone follow the recipe (without seeing the recipe itself), and then inferring the recipe. Much easier.
By GP's argument, for the CPU to decrypt the message so quickly, it must be using a shortcut. In your example, the shortcut is that the CPU knows the secret key.
Do you honestly think that someone watching the CPU decrypt the message can't access the message themselves? http://en.wikipedia.org/wiki/DeCSS
Normally obfuscation is bad in cryptography - it means that the system is theoretically broken, but that the way to break it is quite well hidden.
This refers to cryptographically secure obfuscation. This is an entirely new field, and hasn't been possible till now. This paper doesn't prove how to do it, but proves that it is possible for a certain subset of operations.
Basicly it boils down to the fact it is possible to make a computer program that, for a given set of inputs a) generates a set of outputs b) in such a way that it wouldn't be possible to modify the program to make it generate a particular output without doing an exhaustive search (ie. try every possible input).
It's similar in principle to a "designer" hash algorithm - ie. I can choose the output for a given input, but after compilation it will not be possible to find out the mapping without knowing the input.
This type of algorithm could enable things people hate (DRM), but also many other new fields of computing, in particular doing computation on untrusted processors.
The difference is that I can very easily decrypt PGP myself which is why people are saying these claims aren't true.
It runs slower.
In the case of PGP / RSA / etc, but short cut is the key. If the CPU can decrypt the message, it has the decryption key.....and therefore, you can obtain the key and have the same shortcut.....and thus, decrypt the message in a lot less than "years".
The problem you present is trying to break encryption without the shortcut.....or trying to put together the jigsaw by looking at the backs.
Yes, for example caches. ;) I think one would more likely run this software on a vm and log syscalls, maybe even inspect registers.
Finally. People have been asking for years for software to become even harder to maintain and debug. For too long, we have tolerated too-high quality and reliability, too-low prices, and projects completing absurdly ahead of deadline. I'm glad to see that someone is finally doing something about it.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
All you have to do is outsource your project to India. The code you'll get back will be virtually indecipherable already.
You might want to think about that again. That I can decrypt a specific message using a public key doesn't mean I have full unfettered access to the cryptosystem. You have to think of execution as communicating withe the program, and therefore think of the version the program executed as a chain of non-independant messages, and I can only decrypt them in the context established by earlier ones.
Either that, or some anonymous coward on slashdot has once again managed to see the "obvious" flaw in rigorously-researched and peer-reviewed cutting-edge advanced information theory that a bunch of mutiple-PhDs have been studying in-depth for years...
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
But we're not trying to "decrypt a message", but to "break the cipher".
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
"decrypt" != "break encryption"
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
You only need to trace whatever you're interested in. If someone releases a revolutionary video coder encrypted with this algorithm, you run it through your tracer, record (and compress!) the raw instruction stream, and you'll have the actual algorithm that the CPU is executing. You're not going to decode all possible failure modes, nor any code that isn't actually executed, but you will get the algorithm that just encoded your video. The compression might even get you individual subroutines..
But it won't be pretty. :)
c++;
I can see the advantages but I can also see great disadvantages - mainly copyrights and being locked out of your own system.
Great! Another genius harvested from a dead culture screaming, "eureka!" OK, without wining, how is this going to:
fix my tractor?
food?
cure some, currently, uncurable illness?
put a new roof on my home?
clothing?
No comment? I figured as much.
Number of (behaviorally distinct) programs possible in X Kb of machine code. Number of intelligibly represented programs using a comparable amount of source code in a typical C like language. I suspect the former is much greater than the latter, and this can be exploited for the purpose of creating a program that works but is hard to reverse engineer.
Another thing to think about is various notions of approximation. If only a small number of inputs to a function are actually used, and this set of inputs can be mathematically determined, then the function can be replaced with something weird which gives the right output on desired inputs but various other stuff for other inputs. If an attacker cannot determine the actual inputs used, the function when analysed will look confusing.
John_Chalisque
OK as an example what if you have worked out a way to transform an arbitrary program in to a huge bunch of "if else", goto statements with "magic numbers" or worse (e.g. setaddress(mod(magic1+sha256memoizer(magic2+parm1+parm2+...),dataend))=parm3; linenumber=mod(magic3+sha256memoizer(magic4+parm5+parm6+...),maxlines); goto linenumber; ) and some stores and loads so that it's still equivalent and does the same thing but just a bit slower. So when you disassemble it - it's still a big mess that makes it hard to figure out or "improve" - which is the whole purpose of at least some code obfuscation.
All this means is that someone will then write a program that will work with a virtualized environment (e.g VMware, VirtualBox, etc) - may be even a device driver that snoops memory - that will be able to capture the run of the program and recypher it to the original, non-obfuscated code during run-time. They'll then use that in order to reverse engineer the program, reapply the cypher and continue on their merry way of hacking the software in the way the TFA's author is trying to prevent.
Now, given the amount of time it would take for the obfuscated program to run in the manner you are speaking, the aforementioned cracking software probably has plenty of CPU cycles to figure out each step and generate an intercept function to overcome the jigsaw obfuscation.
In then end, it just means the cracking tools become better.
Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
Just skimmed the paper...
The scope of this is very limited. It applies primarily to functions that encapsulate a secret (not a general piece of code). From reading the paper, it appears that the goal is to attempt to frustrate folks that that want to disassemble software routines to look for small secrets buried in the code.
Example: Let's say you have a function that decrypts a message from a server with a key and you want to say embed that code in javascript. You don't, however, want someone figure out the key by just looking at the javascript, you want it to be a black box.
This technique allegedly can obfuscate the code so that it is just as hard to figure out how the secret is used as it is to infer the secret from sending in values and building a lookup table. Of course you could just cut-out the obfuscated code and use it as an oracle (because you can still run it), but it continues to effectively hide the secret as best as possible.
As I understand it, the basic techique is to create a function (from a class of functions with certain properties) that cuts the input domain into a large number sets and for each set finds a cheap multi-linear function that maps the correct output values for that small set of the input values but is don't care for the rest of the input values. The "trick" is that the class functions that cut the input domain are not fixed, but depend on the input so it is a hard problem to figure out which maps are used without just searching them all. Thus every obfucation instance of a function that encapsulates the same secret could look different and still compute the same result, or conversely, encapsulate different secrets, yet look very similar, but produce the correct result.
The downside of course is that small multi-linear functions aren't the best implementation for actual encryption functions. For example, something very simple like 128-bit AES, might be considered a function with 2^128 entry 128bit table. Unless I'm mistaken, it will be very hard to chop up a 2^128 bit table using their method (even the authors concede the fact that thier construction isn't efficient enough for practical problems).
However, they make a "bold" assertion, that they can apply their technique to stuff like crippleware by obfuscating the function that enables/disables various program features. I'll go out on a limb and say most folks that hack stuff like that don't bother to figure out ways to turn on/off features, but just go for attempting to bypass calls to such a function or disable the function altogether.
The other "bold" assertion they make is that it might be used by software vendors to deploy a patch without tipping off the black-hats about the nature of the vunerablity before all systems can be patched. This is wishful thinking as functions are generally not vunerable (since they by definiton have no-side effects), but procedures (the function-like modules that render side-effects) are the modules that tend to have the bugs. This technique of theirs really only applies to functions, not procedures so it's hard to see how this would help in many circumstances.
that will be able to capture the run of the program and recypher it to the original, non-obfuscated code during run-time.
Assuming "full" obfuscation if you do that you'd only capture the "program" for generating that one hash alone - one input and its corresponding output. The path would be different for different data/inputs. For a 256 bit hash you'd have to run it 2^256 times.
Of course there might not be full obfuscation. But I see no evidence so far that full obfuscation would be impossible or that difficult. Maybe someone could prove it impractical.
that will be able to capture the run of the program and recypher it to the original, non-obfuscated code during run-time.
Assuming "full" obfuscation if you do that you'd only capture the "program" for generating that one hash alone - one input and its corresponding output. The path would be different for different data/inputs. For a 256 bit hash you'd have to run it 2^256 times.
Of course there might not be full obfuscation. But I see no evidence so far that full obfuscation would be impossible or that difficult. Maybe someone could prove it impractical.
If the point is to merely obfuscate the program and not allow the user to actually run it, then yes you'd be right. That would also be pretty useless.
If the point is to run the program, then you are wrong as the program that must be run must at some point pass through the CPU to do its function. A capture program would be capturing the data as well, and would be able to note what portions of memory were being read/written/manipulated as well, so you'd also capture the any changes that temporarily decoded a section of memory, then executed that block before re-encoding it.
What you reference only works for encrypt/decrypt. However, between those two points something else has to happen - a portion of the real program has to run. The cracking program only needs to decipher where the boundaries between the encryption/decryption are so as to capture the exposed real program while it is operating - which has to happen for the user to actually be able to use the program, and monitoring the CPU instructions and cache would prove to be a very effective and efficient way to do it - e.g. they may be relying on tricks in the CPU to mask instructions before running them - e.g. XOR/AND/NOT etc - whereby it stays in the CPU cache only - however, even that is fully capturable by the kind of program/cracking I'm specifying be run in conjuction with a virtualized environment.
In other words, as I and many others have pointed out, as long as the program - not the obfuscator - actually has to run through some kind of execution unit, it can be captured. If it doesn't, then it's no better than putting a program into a ZIP file and saying is obfuscated.
Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
That's an entirely useless statement in this context. If the intention is to keep the data out of the hands of the end user then it doesn't matter if they decrypt "as the designer intended" and snoop the data or if they break the encryption algorithm. There is still a failure in the process.
You seem to be assuming a naive method of obfuscation. In theory you can obfuscate a program so there is no single "exposed real program" with just a few possible paths. There would be an exposed obfuscated program with say 2^256 possible different paths depending on your inputs.
Capturing one run will give you a different sequence of instructions from capturing another run, and each run the instructions you capture will only work for a particular set of inputs and state.
If you are copying the entire obfuscated program to support those 2^256 different paths you're not reverse engineering it.
To give an example to help make it clearer imagine converting a function to one that has zillions of "if then else" statements. The path and sequence of instructions will be different depending on what the parameters and state are. Figuring out the "real" source function from those statements would be tricky if it is a nontrivial function.
A real obfuscated program probably won't have zillions of statements - it's just an example to try to help you understand.
As for saying it only works of encrypt/decrypt, such obfuscation in conjunction with copyright law is good enough for preventing unauthorized 3rd party clients/software from interoperating with your products. If the only way they can create or verify signed messages to/from your services/other software is by copying your obfuscated signing module then they breach copyright law.
Maybe some countries have laws making it illegal to do such obfuscation to prevent interoperability, but not all of them.
As for saying it only works of encrypt/decrypt, such obfuscation in conjunction with copyright law is good enough for preventing unauthorized 3rd party clients/software from interoperating with your products. If the only way they can create or verify signed messages to/from your services/other software is by copying your obfuscated signing module then they breach copyright law.
Interoperability is a recognized reason by US Copyright Law for reverse engineering, no matter how much obfuscation or encyprtion is done.
Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)