Building Deception Into Encryption Software
holy_calamity writes "MIT Technology Review reports on a new cryptosystem designed to protect stolen data against attempts to break encryption by brute force guessing of the password or key. Honey Encryption serves up plausible fake data in response to every incorrect guess of the password. If the attacker does eventually guess correctly, the real data should be lost amongst the crowd of spoof data. Ari Juels, who invented the technique and was previously chief scientist at RSA, is working on software to protect password managers using the technique."
Voyager episode on honey from 1998
http://en.wikipedia.org/wiki/H...
TFA was murky, but generating bogus data? If one is brute forcing a data blob, how can it make stuff up? Authentication is another story.
Are they meaning to make a system similar to Phonebookfs? This is an interesting filesystem used with FUSE. You have different layers over the same directory, so one encryption key may allow you to grab one set of files, another key, a different set. Then there is chaff present that cannot be decrypted under any circumstances and provides plausible deniability.
Is something like phonebookfs what they are intending?
TFA was murky, but generating bogus data? If one is brute forcing a data blob, how can it make stuff up?
Actually, it wasn't murky. That it cannot work for arbitrary data types is spelled out towards the end. This is for data of which the encryption system knows the data type well enough to fake it, and the encryption system has to be built to target the specific data type. The examples given are credit card numbers or passwords.
For instance imagine a password manager that, for every decryption attempt with a wrong master password, returns a different set of bogus but plausible passwords. How would a brute force attack automatically determine which one is the "real" set of passwords of the user, even if it can guess the right password?
So you decrypt something and it *looks* like real data.
So it would have to be a function that produces 'good' results and 'bad results' but the bad results look like good ones.
Would have to be careful that the 'bad' results do not do things like open the lock though. For instant in the case of login list breaches.
If randomly generated "fake" data matches someone else's password (or whatever is being encrypted), that other person didn't use a strong enough password. This system is just acting like a hash function -- criminal tries password A and he decrypts the data to some string, then he tries password B and the data gets decrypted to another string. If those randomly generated strings happen to match someone elses password on the system, the criminal could have saved himself some time by generating the password guesses himself.
TFA was murky, but generating bogus data? If one is brute forcing a data blob, how can it make stuff up? Authentication is another story.
It didn't seem all that murky:
. But he notes that not every type of data will be easy to protect this way since it’s not always possible to know the encrypted data in enough detail to produce believable fakes. “Not all authentication or encryption systems yield themselves to being ‘honeyed.’”
So it only works with data where it can generate believable fake data -- like credit card numbers or passwords.
I'd been looking into this in a slightly different context. Recently, at Hacker Dojo, someone demonstrated an Android mod to me which dealt with applications that demand too many privileges. It has the usual "disable privileges" option, but for apps that won't run with privileges disabled, it sends fake info.
The demo showed generation of fake phone serial numbers and such. That's easy. Apps that improperly try to upload your address book, though, require generation of a plausible, but fake, address book. That's wasn't in the demo, but it's worth doing. Location data should probably be sent as a random walk from some random starting point.
If enough people do this, it will garbage marketing databases.
This works provided you don't have a known cleartext to test against. So if I had a known credit card or password in the database (by signing up legitimately for a website that uses th is) then I have a method of determining the dataset to be decrypted.
If the software is detecting that the key is bad then all the attacker has to do is use software that doesn't do this. This assumes that the attacker has direct access to the file. If not, then well known throttling techniques apply and the new wrinkle doesn't buy much.
Making bogus data come out without requiring specific software for decryption seems like a very hard problem. Every data type will need, not just unique software but unique encryption algorithms that are both secure and not trivial extensions to known algorithms.
Why would an attacker be using the enemy-provided 'honey' program to try to brute force the decryption?
Surely he'd use a program that isn't known for serving up fake results.
No sig today...
I guess it DOES have some benefit, huh?
People misunderstand what "security through obscurity" means. Most (all?) encryption relies on security through obscurity at some level.
Hiding your house key under a loose floorboard in your back deck is the kind of security through obscurity that can really work, assuming that there are no other clues that lead to the hiding place. However, hiding the prybar that you use to pry up the floorboard under the belief that hiding the method of access makes your key safer is not the kind of obscurity that works because if the attacker can find your hiding place, he can figure another way to get to the key.
Similarly, hiding or not writing down your password is security through obscurity that works. But trying to hide the implementation details of your cipher algorithm does not, because cryptoanalysis can break your encryption even without access to your encryption algorithm.
So, obscuring your real password among an endless number of fake passwords is the kind of obscurity that can work -- even if the attacker knows that your password is somewhere among the billions of fake ones, unless he has some clue to tell him what your real password looks like, just knowing that fakes are there doesn't help him.
The crooks would use their own decoder to get at the internal encryption algorithm, skipping the "oop, fail, generate plausible password" wrapper.
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
Are you sure it isn't Southern with a Navajo accent? They sound very similar.
Sleep your way to a whiter smile...date a dentist!
There is XPrivacy, which uses the XPosed framework. That doesn't disable permissions, but rather sends fake data to the app.
Sure I sold you robot insurance. But you were attacked by a cyborg. Not covered.
So, obscuring your real password among an endless number of fake passwords is the kind of obscurity that can work -- even if the attacker knows that your password is somewhere among the billions of fake ones, unless he has some clue to tell him what your real password looks like, just knowing that fakes are there doesn't help him.
Like hiding a needle in a needlestack.
I, for one, like the concept, and am anxious to see what impact it could have on modern cryptography.
An enigma, wrapped in a riddle, shrouded in bacon and cheese
No, the idea is that the protection is built into the algorithm itself. Rolling your own decryptor would spit out the same fake info for the same key. To balance this out, the algorithm works only for limited types of data.
So you decrypt something and it *looks* like real data.
So it would have to be a function that produces 'good' results and 'bad results' but the bad results look like good ones.
Would have to be careful that the 'bad' results do not do things like open the lock though. For instant in the case of login list breaches.
If randomly generated "fake" data matches someone else's password (or whatever is being encrypted), that other person didn't use a strong enough password. This system is just acting like a hash function -- criminal tries password A and he decrypts the data to some string, then he tries password B and the data gets decrypted to another string. If those randomly generated strings happen to match someone elses password on the system, the criminal could have saved himself some time by generating the password guesses himself.
What's the goal here - to make the returned data "not my data", or "incorrect data"? There is a world of difference between these two. "Not my data" is a simple thing to generate, but could still be correct data. IE, if the data protected is a card number, and the generated number matches someone else's card, then do we care or not? The criminal doesn't care, as long as their goal is met (get a valid card - it doesn't have to be yours). If we're talking about "invalid" data, then we need some mechanism to validate the generated data before it's returned. While this wouldn't meet the criminal's goal, it could open a possible DDOS attack vector on the validation service (ie, a brute force becomes a magnified reflection attack).
They aren't going to store a big database of valid credit card numbers so they can return someone else's card number, they'll just generate a random number that looks like it could be a real credit card number and passes the checksum test.
Yes, a criminal could take the credit card numbers from each decryption attempt and test them, but if he's willing to test millions of card numbers to look for a valid one, he could just generate the card numbers directly and not attempt the decryption in the first place.
I think the point is that the encryption algorithms themselves are incapable of producing anything that does not look like a 'real' result. For instance, if you have a credit card number you could encypt is as just a series of characters. But that makes it easy to determine which keys are wrong, because decrypting with them would return something other than a string of 16 digits. But what if you treated those 16 digits as a number and encrypted it as such? Then, no matter what key was tried, you would also get back a valid number which could be represented as 16 digits, so you have no way of knowing which is the real answer.
The criminal doesn't care, as long as their goal is met (get a valid card - it doesn't have to be yours). If we're talking about "invalid" data, then we need some mechanism to validate the generated data before it's returned.
If you are worried about a random credit card generating algorithm generating real credit card numbers via this method, you should be just as worried about attackers using the same random number generator on their own!
Consider a case of a credit card number. A CC# consists of 15 digits plus a check digit for 16 digits total.
Now, in encrypting, validate the check digit and then drop it. Take the remaining 15 digits and express them as a binary value. It should be around 50 bits. XOR it against a 50-bit mask, and that will be your ciphertext value.
To decrypt, XOR against that same value and recompute the check digit.
Any incorrect value will produce a number that passes basic validation (as long as it doesn't exceed 2^15).
For bonus points, you can probably encode the first digit in only 2 bits, because most cards begin with 3, 4, 5 or 6, depending on the issuer.
Now, is this a good encryption scheme? Maybe not, but it does at least demonstrate the concept.
www.wavefront-av.com
Many a years ago I had a phone that included a password storage application. You gave it a 4 digit pin and it would show you a checkword, then list all your passwords (key->value). If the pin was wrong, it would still give you a checkword, but different from what your correct word get and then list all the same keys, but different passwords.
Was a pretty nice application, but can't remember the make of the phone, probably a Sony-Ericsson.
No. Consider this: today encryption algorithms work on binary data (bytes). Suppose I generate a random block of binary data, and encrypt is using whatever well-known algorithm you tell me to use. I give you the encrypted output. Can you tell what key was used to perform the encryption, or tell me what the original data was? No, because no matter what key you use you will always get back a random block of data, so which is the 'correct' data?
Now suppose, instead of using an algorithm that can encrypt (and thus decrypt) and random binary data I use an algorithm that can only encrypt/decrypt a credit card number or a password. No matter what key you try to use to decode, you will always get something that looks like a credit card number or password. You can know the algoritm, and you can have the encrypted data, and you still have no way of knowing which key is correct because all the results look the same.
The focus of that research is to allow operations on data that remain encrypted, and where the actual content of the manipulated data is not explicitly known.
That might work for something composed of tables of numbers, bank data, Phone call pen register logs, or passwords as the GP suggests, but not for text.
Humans are very good at determining gibberish from prose, or fragments of color from images. Plausible, but bogus, is a tough nut to crack
where human evaluation is involved.
Sig Battery depleted. Reverting to safe mode.
One way we do it is to return a "fake" only occasionally. The person who gets their password wrong is very unlikely to see a fake. On the other hand, a bad guy who is trying out 100,000 possible keys will get 50 fakes.
This works especially well if the bad guy doesn't know it's designed to occasionally generate fakes. He thinks he actually did decrypt passwords, but the list he has is nolonger valid. Maybe it's out of date, he thinks, or maybe they are stored backward, or maybe we KNOW he stole the list and therefore we've changed all of the passwords. It was entertaining to read the cracker message boards when we first introduced that feature.
Now, the crackers who keep themselves informed know that we generate fakes, and it annoys them greatly. They don't yet know that we do TWO levels of fakes. A certain percentage of the fakes pass their extra level of checking they now have to do to weed out the fakes. In other words, they THINK they are weeding out the fakes, but they are actually only weeding out the level 1 fakes.