Computer Scientists Develop 'Mathematical Jigsaw Puzzles' To Encrypt Software

I Call BS by MightyMartian · 2013-07-31 16:36 · Score: 5, Insightful

I'm sure they can further obfuscate the actual code, but at the end of the day the processor is going to have to run machine code, and one way or the other you can tap the processor's activity to read the "decrypted" code. Beyond that, I imagine the performance penalties involved will be monstrous. Even normal obfuscation techniques have pretty heavy penalties.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.

Re:I Call BS by MightyMartian · 2013-07-31 16:46 · Score: 4, Insightful

Unless the software can magically tlel that it's running in a debugger or a sandbox and refuse to execute, it's activity; from stack activity to system calls to memory allocations can be traced.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:I Call BS by 0123456 · 2013-07-31 16:52 · Score: 4, Informative

When I was writing Windows device drivers, we so, really loved stupid games that detected a debugger and refused to run.
Fortunately most developers stopped doing that once they realised that their customers would have to wait for bug-fixes until they'd reported it to us and then we'd contacted the developers and they'd sent us a special version without the retarded code that stopped us debugging the problem.
Re:I Call BS by Darinbob · 2013-07-31 17:01 · Score: 2

You slice off the top of the chip and probe it. Sure it's expensive, but the people this level of security is intended for have the resources.
Re:I Call BS by Anonymous Coward · 2013-07-31 17:04 · Score: 5, Interesting

Jigsawing is not encryption, it's splitting things up into shared object libraries (.dll's) that the linker then has to "reassemble" as the program is loaded into memory where it is executable ie the program counter can be pointed at it and let run. No the best obsfuscation I have yet seen is X86 assembler and even that doesn't run without a ton of overhead but... disassemblers exist for it. /. is right to scoff A) the write up is contradictory B) the article makes no sense because in order to run the code the processor has to solve the puzzle so you just stick it in an emulator and hit record. C) this looks like a school and or corporate overlords hyping an unpublished paper which many of us recognize as one way peer reviewers are pressured into signing off on snake oil D) who claims that this is the first attempt at obsfuscating code? This thing: http://www.ioccc.org/ is on it's 22nd year.
Re:I Call BS by phantomfive · 2013-07-31 17:10 · Score: 4, Informative

The title is wrong. It's not talking about encrypting the software, it's talking about obfuscating the software. They put the compiled code through a function that obfuscates it thoroughly. Their function is complex enough that de-obfuscating the code (that is, returning it to the original form) would be computationally expensive. The paper also talks about encryption, but that is a different part of the paper.

Which isn't to say you can't look at some variable in RAM, realize it is the boolean for 'authorized,' and then overwrite it. So it's essentially a complex obfuscation technique, which may or may not be useful. (That is my understanding of the article and abstract, if I am wrong please correct me).

--
"First they came for the slanderers and i said nothing."
Re:I Call BS by MightyMartian · 2013-07-31 17:30 · Score: 2

Yeah, but once they add the GUI interface using VB, it'll be uncrackable!

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:I Call BS by MightyMartian · 2013-07-31 17:44 · Score: 4, Insightful

That's because VMs really don't try to hide themselves from the guest. While it might be pretty hard to build a VM that did a good enough job to essentially fool software attempting to identify whether its physical or virtualized hardware that it's running on, we have the source for a number of VMs (ie. KVM and Xen) and if this kind of obfuscated software started showing up on the market, I'm sure there would be a much greater push to make a rock solid virtualized environment that mimicked physical hardware with much more fidelity.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:I Call BS by MikeBabcock · 2013-07-31 17:59 · Score: 3, Insightful

This exactly -- virus writers have been at the forefront of code that hides and obfuscates itself and VM type systems were developed to essentially run the code to determine its effects without actually running the code.
So long as the code can be executed, it can be reverse-engineered.

--
- Michael T. Babcock (Yes, I blog)
Re:I Call BS by Half-pint+HAL · 2013-07-31 20:42 · Score: 4, Informative

Not exactly. Reverse engineering starts with the analysis of a running program with the goal of obtaining enough data about the locations of executable code vs data, major variable locations/identifiers etc to allow you to start running automated disassembly of the total code. All you can get out of a run-time analysis is a specific execution path, which does not embody the full code.
Now yes, "if your CPU can read it, then you can read it", but unfortunately in this situation, your CPU can only read it in certain circumstances, so you'll only be able to read it in those same circumtances: execution or simulated execution, leading back to the situation where you're stuck looking at traces of specific executions...

--
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
Re:I Call BS by greg1104 · 2013-07-31 20:46 · Score: 5, Informative

You can make the algorithm as complex as you'd like, but at the end of the day, you have an input and output(s). You may decide to take a long time to get there, but at the end of the day, I know what you did when the code ran.
This is referred to as "black-box" reverse engineering in the paper. You know what the code did for one input. And if you injected every possible input to the program, at the end you'd have worked out a complete specifications for the function. But how long will that take? It's not always "the end of the day". For functions with a wide range of inputs, it could take the life of the universe to map them all unless you know what you're looking for in advance.
Right now, obfuscation approaches for software usually have some small chocking point to attack. It might be an encryption wrapper around the main binary. Bust that with a debugger, you can get to unobfuscated code for the main system, and then really start to work your way through the program.
But if you have to fight this every step of the way, where all you do is inject inputs and get an output to figure out what the program does, it will take you forever to reverse engineer things. That's the claim of the paper. The code itself is so obfuscated that you can't read it straight and understand it, ever. It looks like random junk. All you can do is run it with an input and see what comes back, and from that reverse engineer what it does. Assemble enough of those and you can see how the program really works. But that black box teardown process--determining possible inputs, injecting them, collecting outputs, and then deriving the function behavior--is time intensive enough that it may not be practical to actually do it anymore. You don't learn anything from reversing any component that speeds handling the next; you have to attack them all like this.
There's a great line from the seminal paper on this subjectOn the (Im)possibility of Obfuscating Programs: "Any programmer knows that total unintelligibility is the natural state of computer programs (and one must work hard in order to keep a program from deteriorating into this state)" Extracting meaning from source code, being able to predict what some lines of code will do, it's hard. Ideally you'll just be able to read the code, make sense of it, and then reverse from there. Most systems that are thoroughly cracked have some sections where it's hard, but once you get those the remaining code is straightforward to read. And in fact, it's impossible to make something where you cannot reverse it. The question though is how long it will take.
If no sense is ever made of the code, you can only apply the "black-box" reverse engineering, where you inject inputs, watch outputs, and from there determine what the code does. You can't prevent that, but you can make the box so big that such work is impossible to do. That's what this technique tries to accomplish. You never find an easy part to the code you can read; all you ever find are ones where you have to map every input into an output to figure out what the code does.
Re:I Call BS by msobkow · 2013-07-31 22:51 · Score: 4, Informative

I know a fellow who worked on a system that would take IBM 360 machine code, convert it into graph format, and re-engineer it as C/C++ code. They made a killing on Y2K, because so many companies didn't have the source code for some of their software any more.
So yes, I call BS as well. Computationally complex, yes, as graph theory often is. But far from impossible to reverse engineer.
If it can execute, it can be reverse engineered. Only the most obtuse and purposefully warped hand-written assembler can even approach the ability to hide an algorithm from deep inspection, and even that can be largely overcome by applying graph theory to restructure the code graphs.

--
I do not fail; I succeed at finding out what does not work.
Re:I Call BS by rgbatduke · 2013-07-31 23:38 · Score: 2

Even better than the grocery list is Searle's Chinese Room. A man sits in a sealed room. He doesn't speak or read chinese, but every now and then somebody pokes a slip of paper in under the door with some Chinese on it. The man picks it up and goes to a vast filing cabinet, matches the symbols on the paper, and pulls out a paper with other symbols on it which he pushes out the door. In this way the entire room can act like a "speaker of Chinese" even though the man in the room doesn't recognize a word of it:
http://en.wikipedia.org/wiki/Chinese_room
The interesting question is whether or not one can LEARN Chinese if one is the man in the room by monitoring the messages. And this, in turn, is a matter of information theory and complexity -- to be able to learn Chinese by monitoring a flow of symbols like this utterly without context might well be very, very difficult. The cabinet drawers, after all, might have not a single message in them but a stack of messages of similar form but equivalent meaning, shuffled. The man might have to look up the drawer to open not just on the basis of the current message but on the basis of the last ten, or hundred, messages. The drawers themselves might periodically rearrange themselves so that the man has to constantly do a tedious calculation in order to determine the current location of the right drawer in order to get the current top of stack message on the basis of the last twenty exchanges.
Now imagine that the actual conversation isn't one such room, it is thirty, or fifty, or five hundred. The first man gets the input message, goes to a drawer and gets a response, but that is not the REAL response, it is just a message to the man in the second room. That man looks up a message for the third room, etc, and it is only the man in the last room that looks up the actual response and returns it.
One can imagine lots of crypto-class transformations one can insert into the lookup process -- each man can use e.g. a one way hash of the message they receive (known to the sender as well) to permute the cabinet drawers after each response; the next message even with the same "content" might then be directions to a completely different drawer. Even a comparatively few rooms filled with many constantly whirling drawers would make it pretty hard to learn Chinese, especially the Chinese associated with rarely used drawers or contexts. And learning it at all would require SOME ability to connect at least the input and output symbols with actual contexts. Even if one mapped the location and content of every drawer and reverse engineered the entire algorithm for doing the shuffling, one still has to assemble the result into a full semantic map. Not easy, although as noted not impossible.
rgb

--
Even when the experts all agree, they may well be mistaken. --- Bertrand Russell.

Deciphering != Reverse Engineering by Dynedain · 2013-07-31 16:37 · Score: 3, Informative

Deciphering/Decrypting is not the same thing as Reverse Engineering.

Reverse Engineering is duplicating the functionality without seeing the source code. That should still be possible if you have the ability to run the program.

--
I'm out of my mind right now, but feel free to leave a message.....

Re:Deciphering != Reverse Engineering by wierd_w · 2013-07-31 16:52 · Score: 5, Insightful

One way around this (for reverse engineering) would simply be to run it inside a VM with a built in stack trace debugger, like Bochs.
You can peek at the raw instructions percolating through the VM's emulated CPU that way. The application itself is then the decryption key, since the system has to be able to run the application.
PITA, but I don't see how this is anything more than just a "bigger" speedbump, at least as far as disassembly goes. Now, making said decryption require nasty bits of hardware to thwart using a VM, and clinging tightly to treacherous computing with a TPM and a hardware hypervisor on top of this? That's more dicey.
The question here is... why do this? Is preventing Cracker Jack from the warez scene from simply disabling your "authenticity checks" so horrifically important? That's the major application for this that I really see, besides making epically horrific malware. (Though if you ask me, they are both the same thing.)
Seriously, other than making personal computing become something from communist russia, what is the benefit of this?
Re:Deciphering != Reverse Engineering by IamTheRealMike · 2013-07-31 22:30 · Score: 3, Informative

Yes, it is robust. I read the paper a few days ago.
All these comments about how you can "just look at the CPU instructions" are made by people who haven't been following developments in the field. The program never gets decrypted into CPU instructions. Heck, it was never even compiled into CPU instructions in the first place. It gets compiled into a form of boolean circuit, a mathematical equivalent of an electronic circuit that is composed of AND, NOT, OR, XOR gates and wires between them. Then that circuit is itself again transformed into a series of matrices and at that point I hit the limit of what I could understand without needing to read some of the cited papers.
This is a very, very complicated technique that builds upon decades of cryptographic research. If they say it's secure in the cryptographic sense, I think it's very likely to be so.

Gotta love articles without details by Anonymous Coward · 2013-07-31 16:37 · Score: 2, Funny

Clearly the journalist had no idea what they were writing about.

Victory for virus writers by technosean · 2013-07-31 16:38 · Score: 4, Insightful

Who gains from this? Most of all? Virus writers.

Perhaps the real solution is to have the OS never execute code looking anything like this.

Re:Victory for virus writers by MightyMartian · 2013-07-31 16:42 · Score: 2

Heh. No fucking shit. It would be a malware writer's holy grail; software that has no identifiable signature.
I still think it's BS, at least on any kind of processor or operating system we're running now.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:Victory for virus writers by TheLink · 2013-07-31 19:57 · Score: 2

100% AV/malware detection is arguably harder than solving the halting problem - since it's not certain you're given the full program and its inputs ;). In practice malware is not so obfuscated and while it seems to be a losing battle, I don't have to outrun the "bear", I just have to outrun the average user...

That said it would be safer if OS makers solved it by better sandboxing and better sandboxing infra/UI. Sandboxing is like "solving" the halting problem by setting a time limit.

In fact you're still in a better position even if you allow the program itself to request/suggest the limits upfront for its sandbox. Because if the program asks for too much you know it's up to no good.

Halting problem analogy: if a program asks for a time limit of infinity you know it might not halt (it could still halt) whereas if it asks for a time limit of 60 seconds whether it halts or not is immaterial since the OS is going to limit it to a max of 60 seconds ;).

3rd parties could audit the sandbox request templates and sign them, so you could lock down a machine as much as you want or not at all. You could set up a machine so that Aunt May could install any harmless software (no access to private data, network, etc), or any software signed by a trusted party.
--
- Too many replies beneath your current threshold

make a crackme by ZeroNullVoid · 2013-07-31 16:41 · Score: 2, Insightful

If they really think it is so good, then they should put their money where their mouth is.
Make it into a crackme, issue a large award for solving it.
Post it online. I give it a few weeks max, if that.
And who is to say it can't still be manipulated once running?
Think of the performance cost.

Either way, I have no faith in an article with little details.

Re:Seems improbable by MightyMartian · 2013-07-31 16:44 · Score: 3, Insightful

Yup. At the end of the day, if this is at all useful and the hardware and OSs out there now, it;s still going to have to execute, and if it executes, you can run it through a debugger and watch it.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.

The program will have to DO something by Sean · 2013-07-31 16:53 · Score: 2

Call the kernel to access files, sockets, etc.

Also unless the developer is super 31337 and likes to write everything I expect shares library calls too.

By watching calls to those interfaces we can figure out what it does.

The original paper by HatofPig · 2013-07-31 16:56 · Score: 4, Informative

The original paper is available here.

Cryptology ePrint Archive: Report 2013/451

Candidate Indistinguishability Obfuscation and Functional Encryption for all circuits

Sanjam Garg and Craig Gentry and Shai Halevi and Mariana Raykova and Amit Sahai and Brent Waters

Abstract: In this work, we study indistinguishability obfuscation and functional encryption for general circuits:

Indistinguishability obfuscation requires that given any two equivalent circuits C_0 and C_1 of similar size, the obfuscations of C_0 and C_1 should be computationally indistinguishable.

In functional encryption, ciphertexts encrypt inputs x and keys are issued for circuits C. Using the key SK_C to decrypt a ciphertext CT_x = Enc(x), yields the value C(x) but does not reveal anything else about x. Furthermore, no collusion of secret key holders should be able to learn anything more than the union of what they can each learn individually.

We give constructions for indistinguishability obfuscation and functional encryption that supports all polynomial-size circuits. We accomplish this goal in three steps:

- We describe a candidate construction for indistinguishability obfuscation for NC1 circuits. The security of this construction is based on a new algebraic hardness assumption. The candidate and assumption use a simplified variant of multilinear maps, which we call Multilinear Jigsaw Puzzles.

- We show how to use indistinguishability obfuscation for NC1 together with Fully Homomorphic Encryption (with decryption in NC1) to achieve indistinguishability obfuscation for all circuits.

- Finally, we show how to use indistinguishability obfuscation for circuits, public-key encryption, and non-interactive zero knowledge to achieve functional encryption for all circuits. The functional encryption scheme we construct also enjoys succinct ciphertexts, which enables several other applications.

Category / Keywords: public-key cryptography / Obfuscation, Functional Encryption, Multilinear Maps

Date: received 20 Jul 2013, last revised 21 Jul 2013

Contact author: amitsahai at gmail com

Available format(s): PDF | BibTeX Citation

--
Silicon & Charybdis McLuhan Kildall Papert Kay

What this means. I think. by Animats · 2013-07-31 18:22 · Score: 5, Interesting

This is fascinating, but hard to understand. It's not clear how broad a result this is.

This seems to apply to "circuits", which are loop-free arrangements of AND, OR and NOT gates. These take in some bit string with a fixed number of bits and emit another bit string with a fixed (but possibly different) number of bits. Many computations can be encoded into this form, but the "circuits" are typically very deep, since this is sort of like loop unwinding. Although this seems kind of limited, you could, for example, make a reasonably sized "circuit" to compute DES, which is a static pattern of Boolean operations.

"Obfuscation" here means taking in some "circuit" and producing a bit for bit equivalent "circuit" from which little information can be extracted. The obfuscated circuit may be (will be?) more complex than the original, because additional logic has been inserted in a somewhat random way.

The contribution in this paper seems to be that this might be useful for "functional encryption". This is where you have some values, such as A and B, which are encrypted and are to be protected, but the result of some function f(A,B) is not protected. The idea is to have an implementation of f which combines the decryption and the desired function, an implementation which cannot be reverse engineered to return decrypted versions of A or B. While this has been suggested as a possible feature for databases, it's hard to apply to useful functions, and so far, it's mostly a research idea there. It's been suggested for some complex access control schemes involving mutual mistrust, but that too is a research topic.

Anyway, this doesn't mean you're going to be able to run your big executable program through some kind of magic obfuscator that will prevent someone from figuring out how it works. Not yet, anyway.

Other aspects of the paper - health data by Badge+17 · 2013-07-31 18:27 · Score: 3, Interesting

I can't really comment on the slashdot summary, but take a look at the actual abstract: http://eprint.iacr.org/2013/451

"In functional encryption, ciphertexts encrypt inputs x and keys are issued for circuits C. Using the key SK_C to decrypt a ciphertext CT_x = Enc(x), yields the value C(x) but does not reveal anything else about x. Furthermore, no collusion of secret key holders should be able to learn anything more than the union of what they can each learn individually."

In other words, it seems that their technique allows you to encrypt some secret data, then (provably) only release the result of some arbitrary function of that data. It sounds like this means you could (in principle) release health data in encrypted form, then allow researchers to study some ensemble properties of it by giving out the appropriate keys. This aspect of it certainly seems pretty cool.

misunderstanding by Anonymous Coward · 2013-07-31 18:50 · Score: 3, Interesting

either you misunderstand or i do, but my understanding was that this system allows a function to be expressed so that the code will only execute with certain parameters, hence the function can only compute certain evaluations (lets say one for simplicity) of the original function, and the true function can be hidden. so that the function evaluated for other parameters cannot be computed with this expression.

basically a really complex way of designing a look up table ?

now as you say, you can never hide the value of the function evaluated at the parameters for allowable parameters, because you can stick it in an emulator but the emulator tells you nothing more about the function and so you might as well just 'run' the whole procedure cause you cant extract anything other than the answer, to the function evaluated at a predetermined set of values.

this is achieved by making the original universal (or with domain expanded over the single or fixed set or parameters) algorithm into the 'jigsaw' that will only assemble with the particular parameters which are 'allowed', as these parameters direct the fitting of the jigsaw.

so can the jigsaw be fit with the 'allowed' parameters and still evaluate the original algorithm for different values? i believe the point of the algorithm is to achieve the same result as a look up table, and if it doesnt do it more efficiently then there is no point to the algorithm.

Hurry up, Europe is hungry for your fines by c0lo · 2013-07-31 18:59 · Score: 4, Interesting

Sell a program protected like this in Europe and you may end paying hundreds of millions:

(14) A person having a right to use a computer program should not be prevented from performing acts necessary to observe, study or test the functioning of the program, provided that those acts do not infringe the copyright in the program.

(15) [...]Nevertheless, circumstances may exist when such a reproduction of the code and translation of its form are indispensable to obtain the necessary information to achieve the interoperability of an independently created program with other programs.
It has therefore to be considered that, in these limited circumstances only, performance of the acts of reproduction and translation by or on behalf of a person having a right to use a copy of the program is legitimate and compatible with fair practice and must therefore be deemed not to require the authorisation of the rightholder. An objective of this exception is to make it possible to connect all components of a computer system, including those of different manufacturers, so that they can work together. [...].

--
Questions raise, answers kill. Raise questions to stay alive.

Re:Just doesn't work... by superwiz · 2013-07-31 19:07 · Score: 2

You are missing the point of code being data. What it means in this context is that there is a duality between functions and the data they operate on. In other words, you can think of data as operating on the code instead of code operating on data. So your data becomes your maps (in the mathematical sense of the word map) and your functions become the mapped objects.

--
Any guest worker system is indistinguishable from indentured servitude.

Malware in 3, 2, 1... by Opportunist · 2013-07-31 19:20 · Score: 2

Remember, kids, everything that can be applied for good can be applied for ill. And code that is impossible to decipher and analyze is the holy grail of malware.

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.

Re:Attack of the D-K Zombies by TheLink · 2013-07-31 19:40 · Score: 2

OK as an example what if you have worked out a way to transform an arbitrary program in to a huge bunch of "if else", goto statements with "magic numbers" or worse (e.g. setaddress(mod(magic1+sha256memoizer(magic2+parm1+parm2+...),dataend))=parm3; linenumber=mod(magic3+sha256memoizer(magic4+parm5+parm6+...),maxlines); goto linenumber;) and some stores and loads so that it's still equivalent and does the same thing but just a bit slower. So when you disassemble it - it's still a big mess that makes it hard to figure out or "improve" - which is the whole purpose of at least some code obfuscation.

So perhaps this bunch have figured out how to do something like this (and maybe better).

Or maybe they're just converting the program to encrypted perl and then compiling it to run on a custom vm ;). Just because you can see the final instructions doesn't mean you'd be much closer to figuring out the code.

For example if someone made a new cryptographic hash function and obfuscates it in one of the ways I mentioned. Would it really be so easy to reverse engineer it, just by seeing the instructions given to the processor? I don't think it would be so simple. More so if you need to reverse engineer it to the extent that you can write a detailed enough description and specification so that someone else could create a clean room implementation.

--

Too many replies beneath your current threshold

Encrypting the algorithm, not the code? by Kazoo+the+Clown · 2013-07-31 20:09 · Score: 2

It sounds like what they are trying to do here is to refactor the algorythm such that f(x)=y produces the same answer but it's not practical to modify because the new function is mathematically scrambled-- the effect of each component of the code is so obscured that it's not so easy to tell how it contributes to the results. It's like using a neural net to implement your solution, the contribution of each neural node to the overall result can be significantly obscure. Won't stop someone from stealing the algorythm as is, but hacking it to produce an alternate result set will be out of the question, as long as they can't just build a truth table of inputs to outputs...

Re:Attack of the D-K Zombies by Half-pint+HAL · 2013-07-31 21:06 · Score: 2

I call BS on the notion that my CPU is smarter than me. Article claims that it takes "hundreds of years" to break this. Obviously then it must take the program "hundreds of years" to run. Otherwise the CPU is using a short-cut. All I have to do is figure out what the short cut is (hint: figure out what the CPU is doing, where it got its instructions from) and it's cracked. If your computer can run it, you can figure it out. It's that simple.

By this argument, PGP, RSA et al must be worthless too, because if it takes hundreds of years to break a 256-bit cypher it must take hundreds of years to run a 256-bit cypher. You know, if your computer can run it, you can figure it out.

I hope you can see the flaw in your argument now.

--
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'

Re:Just doesn't work... by epine · 2013-07-31 22:37 · Score: 2

Great, so all you have to do is replace that conditional so it always evaluates to true, no? When you actually do this, the program happily writes an answer to the screen every time. The only problem is, if you provided an invalid security key at the beginning, the answer it writes is complete nonsense. You see, it's secretly already tested the security key, and if it was wrong, the answer ends up being wrong too.

I implemented exactly this circa 1990 to protect a small database of disambiguation rules structured as a hash table. A random value obtained from the security dongle was intermixed into the hash function and hash check condition. This was not done once for each possible lookup as defined in a conventional database. It was done once for each feasible answer for each possible lookup. The code had a statistical model of feasible answers. For some queries the number of feasible answers was excessive (too many dongle interactions) so we created a heuristic that was correct 99% of the time and set aside the 1% for a second pass with an additional data structure. If the dongle wasn't present the set of feasible answers was incorrectly narrowed with the expected statistical distribution. The members of that distribution, however, were entirely wrong.

We built up more complex queries from smaller queries. We were actually building a tree where every path in the tree was a valid answer and the majority of leaf nodes were at depths 2-4. That we hit a leaf node was a bit of metadata from the hash table lookup, which would be wrong if the dongle wasn't installed.

How about a quick forward description. Start with an alphabet of 50 symbols and construct the tree of all strings of length one to six. Every node in the tree has a flag about whether that node terminates a valid string and some additional bits about the correct orthography of the string as expressed in the user input, when typed. Your database is a subtree of this tree with about 100,000 strings (problems were so much smaller 25 years ago) along with a couple of bits of metadata per leaf. It's pretty sparse compared to the 15 billion possible leaf nodes.

The database subtree is actually constructed by elimination. One dongle assisted hash probe tells you whether a descending edge from your current vertex leads to a non-empty subtree (further solutions with your current path in the tree as a proper prefix). In addition, the user input defines another subtree of everything that could possibly matter to the conversion being performed. What you are computing is the intersection of these two subtrees: the tree corresponding to the task at hand and the tree corresponding to all solutions possible. Because the hash table was decomposed on the principles of minimum description length, when the dongle was absent (or corrupted) you still get an answer with much of the expected statistical distribution.

Except for one thing. The hash check was imperfect and you would get some false positives. We set up the rate of false positives so that the set of false positives grew exponentially as you descended to deeper levels. We knew from the statistical structure of the user input that few elements of this phantom solution set would interact negativity in practice even though the phantom set vastly out-numbered the legitimate set. Further, if one tried to enumerate the tree exhaustively using an incorrect dongle hash function, the tree you would reconstruct had no depth limit. It grew exponentially in size forever. We knew there was a depth limit when correctly probed, but this was nowhere expressed in our program code. In fact, this could be used to reverse engineer the correct hash function: only the correct hash function enumerates to a finite set of 100,000 subtrees. Just iterate over the set of all possible hash functions, in some well-structured enumeration order, until you discover this condition. Bingo, you're done.

Not all of the phantom space was harmless

Re:Attack of the D-K Zombies by Dunbal · 2013-07-31 22:57 · Score: 2

Besides, the NSA is holding all the keys anyway lol.

--
Seven puppies were harmed during the making of this post.

Slashdot Mirror

Computer Scientists Develop 'Mathematical Jigsaw Puzzles' To Encrypt Software

35 of 245 comments (clear)