Building Deception Into Encryption Software
holy_calamity writes "MIT Technology Review reports on a new cryptosystem designed to protect stolen data against attempts to break encryption by brute force guessing of the password or key. Honey Encryption serves up plausible fake data in response to every incorrect guess of the password. If the attacker does eventually guess correctly, the real data should be lost amongst the crowd of spoof data. Ari Juels, who invented the technique and was previously chief scientist at RSA, is working on software to protect password managers using the technique."
I speak in Navajo with a Southern accent.
Voyager episode on honey from 1998
http://en.wikipedia.org/wiki/H...
TFA was murky, but generating bogus data? If one is brute forcing a data blob, how can it make stuff up? Authentication is another story.
Are they meaning to make a system similar to Phonebookfs? This is an interesting filesystem used with FUSE. You have different layers over the same directory, so one encryption key may allow you to grab one set of files, another key, a different set. Then there is chaff present that cannot be decrypted under any circumstances and provides plausible deniability.
Is something like phonebookfs what they are intending?
So you decrypt something and it *looks* like real data.
So it would have to be a function that produces 'good' results and 'bad results' but the bad results look like good ones.
Would have to be careful that the 'bad' results do not do things like open the lock though. For instant in the case of login list breaches.
TFA was murky, but generating bogus data? If one is brute forcing a data blob, how can it make stuff up?
Actually, it wasn't murky. That it cannot work for arbitrary data types is spelled out towards the end. This is for data of which the encryption system knows the data type well enough to fake it, and the encryption system has to be built to target the specific data type. The examples given are credit card numbers or passwords.
For instance imagine a password manager that, for every decryption attempt with a wrong master password, returns a different set of bogus but plausible passwords. How would a brute force attack automatically determine which one is the "real" set of passwords of the user, even if it can guess the right password?
Great so long as he isn't also willing to take the NSA paycheck.
I guess it DOES have some benefit, huh?
TFA was murky, but generating bogus data? If one is brute forcing a data blob, how can it make stuff up? Authentication is another story.
It didn't seem all that murky:
. But he notes that not every type of data will be easy to protect this way since it’s not always possible to know the encrypted data in enough detail to produce believable fakes. “Not all authentication or encryption systems yield themselves to being ‘honeyed.’”
So it only works with data where it can generate believable fake data -- like credit card numbers or passwords.
I'd been looking into this in a slightly different context. Recently, at Hacker Dojo, someone demonstrated an Android mod to me which dealt with applications that demand too many privileges. It has the usual "disable privileges" option, but for apps that won't run with privileges disabled, it sends fake info.
The demo showed generation of fake phone serial numbers and such. That's easy. Apps that improperly try to upload your address book, though, require generation of a plausible, but fake, address book. That's wasn't in the demo, but it's worth doing. Location data should probably be sent as a random walk from some random starting point.
If enough people do this, it will garbage marketing databases.
This works provided you don't have a known cleartext to test against. So if I had a known credit card or password in the database (by signing up legitimately for a website that uses th is) then I have a method of determining the dataset to be decrypted.
If the software is detecting that the key is bad then all the attacker has to do is use software that doesn't do this. This assumes that the attacker has direct access to the file. If not, then well known throttling techniques apply and the new wrinkle doesn't buy much.
Making bogus data come out without requiring specific software for decryption seems like a very hard problem. Every data type will need, not just unique software but unique encryption algorithms that are both secure and not trivial extensions to known algorithms.
Why would an attacker be using the enemy-provided 'honey' program to try to brute force the decryption?
Surely he'd use a program that isn't known for serving up fake results.
No sig today...
The crooks would use their own decoder to get at the internal encryption algorithm, skipping the "oop, fail, generate plausible password" wrapper.
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
Are you talking about [Open]PDroid? I ask because if there's another mod that does the same thing, I might want to look into it. :)
You could set up the encryption so that any form of operation returns a viable result, ie. base the encryption on a dictionary of words or phrases, markov chains etc. I think of it in terms of this. https://www.cs.cmu.edu/~odonnell/hits09/gentry-homomorphic-encryption.pdf or ECC
There is XPrivacy, which uses the XPosed framework. That doesn't disable permissions, but rather sends fake data to the app.
Sure I sold you robot insurance. But you were attacked by a cyborg. Not covered.
I do lots of similar work when generating test data. It's pretty common to have libraries for things like "make up a plausible address" or "randomly generate a credit card number." Extreme cases can generate whole narratives, even intentionally injecting spelling and grammar errors at varying rates in order to fool packages that use lexical analysis to detect robot text.
The work that spammers have done attempting to fool email filters is probably directly applicable to this effort.
Oh, that looks interesting. If my Nexus 5 ever gets here (stupid winter storms...) I'll definitely want to be read up on that.
Thanks!
I see it as a plausible approach, since a automated brute force attack will stop whenever it thinks it hits the right pw, requiring human intervention to determine it is bogus. After a few thousand bogus hits, I think that human might just get tired of hitting the NEXT button.
No, the idea is that the protection is built into the algorithm itself. Rolling your own decryptor would spit out the same fake info for the same key. To balance this out, the algorithm works only for limited types of data.
If you are setting it up, you could have discreet records individually encrypted. One row correctly encrypted and (some number) of rows of fake data encrypted with a false key. Without knowing which row is good and which row is bad, an attacker would get (some number) of potential keys that return realistic but wrong data. They wouldn't need to run my decryption routines to generate these false positives.
I think the point is that the encryption algorithms themselves are incapable of producing anything that does not look like a 'real' result. For instance, if you have a credit card number you could encypt is as just a series of characters. But that makes it easy to determine which keys are wrong, because decrypting with them would return something other than a string of 16 digits. But what if you treated those 16 digits as a number and encrypted it as such? Then, no matter what key was tried, you would also get back a valid number which could be represented as 16 digits, so you have no way of knowing which is the real answer.
The goal stated in the article is similar to that of deniable encryption. Whereas "honey encryption" works through a piece of dedicated software, deniable encryption works by constructing a block of ciphertext in such a way that different plausible plaintexts can be recovered depending on which symmetric key is used for decryption. Of course, only the user knows how many different plaintexts are actually buried in the ciphertext, and under duress (rubber hose, point of a gun, etc.) he can relinquish the non-incriminating plaintext and claim innocence.
Your assumption also means that the underlying encryption is an unknown algorithm and you have no access to the actual encrypted data. There has to be something else to keep you from attempting to decrypt the data without using the password manager it was made with.
Consider a case of a credit card number. A CC# consists of 15 digits plus a check digit for 16 digits total.
Now, in encrypting, validate the check digit and then drop it. Take the remaining 15 digits and express them as a binary value. It should be around 50 bits. XOR it against a 50-bit mask, and that will be your ciphertext value.
To decrypt, XOR against that same value and recompute the check digit.
Any incorrect value will produce a number that passes basic validation (as long as it doesn't exceed 2^15).
For bonus points, you can probably encode the first digit in only 2 bits, because most cards begin with 3, 4, 5 or 6, depending on the issuer.
Now, is this a good encryption scheme? Maybe not, but it does at least demonstrate the concept.
www.wavefront-av.com
Many a years ago I had a phone that included a password storage application. You gave it a 4 digit pin and it would show you a checkword, then list all your passwords (key->value). If the pin was wrong, it would still give you a checkword, but different from what your correct word get and then list all the same keys, but different passwords.
Was a pretty nice application, but can't remember the make of the phone, probably a Sony-Ericsson.
Verbing of nouns weirds the language.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
No. Consider this: today encryption algorithms work on binary data (bytes). Suppose I generate a random block of binary data, and encrypt is using whatever well-known algorithm you tell me to use. I give you the encrypted output. Can you tell what key was used to perform the encryption, or tell me what the original data was? No, because no matter what key you use you will always get back a random block of data, so which is the 'correct' data?
Now suppose, instead of using an algorithm that can encrypt (and thus decrypt) and random binary data I use an algorithm that can only encrypt/decrypt a credit card number or a password. No matter what key you try to use to decode, you will always get something that looks like a credit card number or password. You can know the algoritm, and you can have the encrypted data, and you still have no way of knowing which key is correct because all the results look the same.
Fake data is an old idea: "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
Excuse me, but please get off my Pennisetum Clandestinum, eh!
Bad form, I know, to respond to your own posting. But it occurs to me that data specific compression could accomplish the goal. Credit card numbers have structure. If you create a mapping between only valid card numbers and the minimum number of bits then encrypt that then it doesn't matter how the data is decrypted. It always produces valid looking credit card numbers. The catch, though, is that the bit mapping needs to be exact. If the total possible credit card numbers is not a power of 2 then there will always be decryption failures that produces detectably invalid results.
> Making bogus data come out without requiring specific software for decryption seems like a very hard problem.
I can think of ways right off. First, you can encrypt decoy data along with the actual plaintext. It's not that the decryption software CREATES the decoy data, the decoy is already there. Decryption uses part of the key to separate the real data from the decoy.
Given that the data format is known, there's another easy way - packing. Assume the proper format is like social security numbers:
000-00-0000
What you want to avoid is letting the attacker recognize from the format that 571-22-5557 is correct, while jdksfs8fgh is not. To do that, before encryption, pack the data. That means you strip out the hyphens and save the numbers as one byte per pair of digits (or even 6 bits per digit). So the actual decryption returns unformatted, random looking bytes. Those are then formatted using something like printf() to make the actual source plaintext. If the right key is used, they'll be unformatted bytes until printf is run. If the wrong key is used, again you get unformatted bytes until printf formats them. The output comes out in the same format whether the key is correct or not. In incorrect key will result in invalid data that looks just like valid data.
Possibly with a credit card a 16 digit number all decryption returns a 16 digit number however credit cards aren't random numbers there are rules to how they are formed. Just like alpha numeric passwords are still likely to resemble a word or phrase.
The focus of that research is to allow operations on data that remain encrypted, and where the actual content of the manipulated data is not explicitly known.
That might work for something composed of tables of numbers, bank data, Phone call pen register logs, or passwords as the GP suggests, but not for text.
Humans are very good at determining gibberish from prose, or fragments of color from images. Plausible, but bogus, is a tough nut to crack
where human evaluation is involved.
Sig Battery depleted. Reverting to safe mode.
Because awesome.
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
"Honey Encryption" to Bamboozle Attackers with Fake Secrets
Tom Simonite writes at MIT Technology Review that security researcher Ari Juels says that trickery is the missing component from the cryptography protecting sensitive data and proposes a new encryption system with a devious streak. It gives encrypted data an additional layer of protection by serving up fake data in response to every incorrect guess of the password or encryption key. If the attacker does eventually guess correctly, the real data should be lost amongst the crowd of spoof data. The new approach could be valuable given how frequently large encrypted stashes of sensitive data fall into the hands of criminals. Some 150 million usernames and passwords were taken from Adobe servers in October 2013, for example. If an attacker uses software to make 10,000 attempts to decrypt a credit card number, for example, they would get back 10,000 different fake credit card numbers. "Each decryption is going to look plausible," says Juels. "The attacker has no way to distinguish a priori which is correct." Juels previously worked with Ron Rivest, the "R" in RSA, to develop a system called Honey Words to protect password databases by also stuffing them with false passwords. Juels says that by now enough password dumps have leaked online to make it possible to create fakes that accurately mimic collections of real passwords and is currently working on creating the fake password vault generator needed for Honey Encryption to be used to protect password managers. This generator will draw on data from a small collection of leaked password manager vaults, several large collections of leaked passwords, and a model of real-world password use built into a powerful password cracker. "Honeywords and honey-encryption represent some of the first steps toward the principled use of decoys, a time-honored and increasingly important defense in a world of frequent, sophisticated, and damaging security breaches."
One way we do it is to return a "fake" only occasionally. The person who gets their password wrong is very unlikely to see a fake. On the other hand, a bad guy who is trying out 100,000 possible keys will get 50 fakes.
This works especially well if the bad guy doesn't know it's designed to occasionally generate fakes. He thinks he actually did decrypt passwords, but the list he has is nolonger valid. Maybe it's out of date, he thinks, or maybe they are stored backward, or maybe we KNOW he stole the list and therefore we've changed all of the passwords. It was entertaining to read the cracker message boards when we first introduced that feature.
Now, the crackers who keep themselves informed know that we generate fakes, and it annoys them greatly. They don't yet know that we do TWO levels of fakes. A certain percentage of the fakes pass their extra level of checking they now have to do to weed out the fakes. In other words, they THINK they are weeding out the fakes, but they are actually only weeding out the level 1 fakes.
As was pointed out, this requires the encryption/decryption process to know how a credit card works and what makes one valid or invalid.
...and/or for the cleartext to include many more fake but valid-looking results than valid results, among which only the correct key picks the valid results and invalid keys pick incorrect results. That requires that invalid but plausible results are either easy to fabricate or can be fabricated offline rather than in realtime. Under certain circumstances, partial exposure of correct results, mixed in fake results, might even be desireable
Someone had to do it.
Troffle blent murper humph flempto gretch fooma irf pwenty eb wertoo bakk empo flbe ilffy fez mulok.
Table-ized A.I.
That is not the point though. The point is that domain-specific encryption could be harder to brute-force than generic encryption that can produce results which are obviously false. So a credit card isn't just 16 random digits - so what? What is not random - the issuer ID and the check digit (there well be more, it does not matter). So treat those 'non-random' parts differently. Don't even bother storing the checksum, recalculate it when returing the result. The checksum will always be right then. There are more issuer IDs than issuers? So store the issuer ID as something other than the actual number.
As for passwords, this is not to prevent poor password choices. If good passwords are enforced, then the 'likely to resemble a word or phrase' doesn't apply.
There are no 'fakes'. Every decryption produces a number that could be a credit card number (ie has the right format and passes the tests for number correctness). This is not a pre-generated 'fake', it is just the result of running the decryption algorithm with the wrong key.
As an example I gave elsewhere - generate a random block of data (/dev/random or whatever). Encrypt it with whatever algorithm you want. Now decrypt it with the wrong key. Can you tell (without looking at the cleartext) that the decryption used the wrong key? No, of course not, random data looks like random data. That is because no matter what key you use valid-looking data will be returned. Now use an algorithm that can only encrypt/decrypt credit card numbers, and you have the same thing. Every decryption will look like a credit card number.
Note that some of the generated numbers may actually BE valid credit card numbers. That does not matter as long as it is not the credit card number associated with the name on the card, etc.
But even that requires that the human have an easy way to tell whether the result returned was valid. In the case of a password, this may be simple, but relatively time consuming...or it may be difficult. Of course, in other circumstances, it could automatically try the result against a test, and quickly determine whether or not it was correct.
I think we've pushed this "anyone can grow up to be president" thing too far.
As a broad theory it works but in practice.... there is a lot to take into account we will just have to wait and see.
That might work for something composed of tables of numbers, bank data, Phone call pen register logs, or passwords as the GP suggests, but not for text.
Humans are very good at determining gibberish from prose, or fragments of color from images. Plausible, but bogus, is a tough nut to crack
where human evaluation is involved.
The whole point of brute-forcing is that you don't need human evaluation. Are you really planning on a human evaluating the results from all 2^128 possible keys? How many universe lifetimes do you have to crack this thing?
"City hall" in German is "Rathaus" Kinda explains a few things......
But don't you suppose that computer systems that can distinguish gibberish composed of valid words from meaningful sentences?
There is more than a little research in this area.
Sig Battery depleted. Reverting to safe mode.
I see only one problem - Assuming I got the PIN wrong, how am I supposed to make out if it is showing me the 'wrong'/'fake' passwords ?
The only cases in which this approach can work are those where the distribution of plaintext is known in advance.
Since the algorithms used to generate CCN are largely public one can map the class of apparently valid CCNs (suppose it has n members) bijectively into the integers mod n and assuming the CCNs are uniformly distributed over the apparently valid CCNs (likely) their images in the integers mod n are uniformly distributed. Assume that ENC_k is any standard encryption function (public or private) with key k operating on inputs from the integers mod z >= n-1 (usually a power of 2 for a symmetric encryption function). Given a CCN c map it to a value c' in mod n arithmetic and generate a random value 0 = r z. We can now encrypt our CCN as the pair r + c' mod n, ENC_k(r).
This will ensure that no statistical test will be able to distinguish a correct choice of the key from an incorrect one.
This is useless, however, for data like english text or names which don't have an easily describable distribution. The construction above relied on our ability to select an information theoretic optimal compression function for CCNs, i.e., one which bijectively maps the message into a uniform distribution on the integers mod n for some n. This is impossible for things like proper names or english text.
If you liked this thought maybe you would find my blog nice too:
Think of it as ROT-n encryption of random data, where n is the key
If you choose the wrong n, you'll still get blob of random data back, just not the correct one.
Now, the tricky part is in making sure the incorrect keys returns data that's hard to distinguish (meaning it can't be done automatically and/or quickly) from the correct plain text, when the plain text ISN'T random looking, but something like passwords, SSN, credit card numbers.
Be wary of any facts that confirm your opinion.
The checkword, you put in your pin when you set the application up, it shows you "banana" (or car or pink or whatever maps to your pin - obviously the wordlist needs to be shuffled during install so no two installs has the same mapping).
Next time you use it and you put in a wrong pin, it will say "apple". You know it's supposed to say banana so you know you put your pin wrong, your adversary (the guy just stole your phone), doesn't know the checkword so he wont know if it's right or wrong.
OK, that's a good idea.
It doesn't need to stand up to human scrutiny to foil a brute force or dictionary attack though. The whole point of those attacks is to try a large number of keys quickly, and human oversight would make the process too slow to be of any use.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
... but I have a life.. ^_^ but i've used it already :(
Couple of points: (I'm thinking less in an individual file encryption issue than to a larger set of encrypted data structure)
In a totally non-knowledgeable fashion would the algorithm or the program access the algorithm create the bogus data?
To me one flaw would be random generation of data. If it was random then using the same false password twice would result in 2 different results (so that password can be ignored) defeating the point of the bogus data.
The way to work it is to use procedural generation of some type, It should be easy to generate a random looking but believable indevidual files or directory structure (and fill it with plausible files related to the folder names *i.e. avi,mov,mpg in video folder and doc,xls,xdoc etc... in Documents folder.) You could go as far as to generate appropriate file herders and meta data to fool some automated checking.
With procedural generation the same false password would generate the same false data every time so it is harder to differentiate between bogus and actual data.
Laters Sol "Have you found the secrets of the universe? Asked Zebade "I'm sure I left them here somewhere"
The two solutions (algorithmically ensconced or via companion fake data) are basically equivalent, because everything that is not part of the true cleartext is part of the algorithm. This is an implementation detail.
Someone had to do it.
I put up the oversimplified example because people were not even getting the undlying concept behind the undlying concept behind the undlying concept behind the undlying concept.
www.wavefront-av.com