Distinguishing Encrypted Data From Random Data?

Re:iieorjoeghoiuhtr by Anonymous Coward · 2010-09-19 07:53 · Score: 4, Funny

Trick question! It is random text that's been encrypted!

Ignore the person holding the phone book. by Suki+I · 2010-09-19 07:55 · Score: 2, Insightful

After a few whacks on the head with the NYC Yellow Pages (old school, print edition) I think someone could find out which file is encrypted and which is garbage.

--
Home of The Suki Series

Re:Ignore the person holding the phone book. by parlancex · 2010-09-19 08:14 · Score: 5, Insightful

I think you're missing the point. Of course after they know that you have some encrypted data on your disk the strength of the encryption becomes moot because they can just drug / beat you until you tell them the key, but what this question is about is hiding encrypted data in unencrypted data so prying eyes can't tell if anything is even there at all.

For example, there may come a day when airport security could demand you disclose your passwords when they find you are carrying storage with encrypted content using the aforementioned techniques, but they aren't going to drug / beat every single person coming onto an airplane or going across a border. If your jpgs look like everybody elses jpgs both visually and under close analytical scrutiny they aren't going to bother you. Another example is there may come a day when any traffic on the Internet that cannot be positively identified as a common protocol with statistically "normal" contents is simply rejected. Maybe not here, maybe not right now, but this kind of idea is still very useful.
Re:Ignore the person holding the phone book. by sjames · 2010-09-19 08:34 · Score: 2, Interesting

That's why the deniability matters. They only have so many people available to whack people with the NYC Yellow Pages. You want them to believe there is a low probability that you have any secret to give up under "questioning".
Re:Ignore the person holding the phone book. by Suki+I · 2010-09-19 08:43 · Score: 2, Funny

I see a market in in automated phone book whacking gadgets! Look for them soon on ThinkGeek.

--
Home of The Suki Series
Re:Ignore the person holding the phone book. by John+Hasler · 2010-09-19 08:45 · Score: 3, Insightful

Try to get your head around the idea that they might have possession of your hard disk but not have possession of you. Or they don't even know who you are. Or they are honest cops, trying to determine if you have violated the rules. They've asked you if there is encrypted data on the laptop, you said no, and they are doing a routine check to verify that. Contrary to popular opinion, "The Man" is not always ready, willing, and able to administer a beating.
Then there is the possibility that your opponent is not "the Man" but some sort of furtive criminal...

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:Ignore the person holding the phone book. by sjames · 2010-09-19 08:47 · Score: 2, Funny

They will need to give it a significant civilian use, so it should come with an attachment that lets you beat the marketing department and PHBs to death with a paper towel roller.
Re:Ignore the person holding the phone book. by dcollins · 2010-09-19 08:50 · Score: 3, Insightful

"Did I miss the point or do we need the drugs and wrench?"
You missed the point. The primary question of the OP is this: "...is it possible to prove there is encrypted data where you claim there's not?"
Hint: Include the likelihood of false-positives and false-negatives in your "wrench-based" analysis.

--
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
Re:Ignore the person holding the phone book. by M.+Baranczak · 2010-09-19 09:36 · Score: 5, Funny

they aren't going to drug / beat every single person coming onto an airplane
If you fly US Airways, there's a $25 service charge if you want to get beaten and drugged before boarding. I remember when that shit used to be included in the base ticket price.
Re:Ignore the person holding the phone book. by Jeremi · 2010-09-19 09:54 · Score: 4, Funny

If your jpgs look like everybody elses jpgs both visually and under close analytical scrutiny they aren't going to bother you.
I've developed a fascinating algorithm for encoding hidden data by slightly modulating breast sizes, but this comment is too small to contain it.

--

I don't care if it's 90,000 hectares. That lake was not my doing.
Re:Ignore the person holding the phone book. by John+Hasler · 2010-09-19 10:06 · Score: 3, Informative

Hell, when your traveling between nations your in legal terms outside of all law...
Not true. You are subject to the jurisdiction of the nation of registry of your craft.

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.

It's all about entropy by cpghost · 2010-09-19 07:58 · Score: 5, Insightful

Encrypted files have maximum entropy, just like absolutely random files. Basically, you can't tell which one is which. However, absolute random noise on a disk isn't all that usual, so any encrypted file (or pure random file) will stand like a sore thumb: it will be highly visible. But, again, you can't tell the difference.

--
cpghost at Cordula's Web.

Re:It's all about entropy by Omnifarious · 2010-09-19 08:05 · Score: 4, Informative

Doesn't compressed data look random?
As an ideal, yes. But compressed data is still pretty distinguishable from random data. In particular, many compression formats have small markers in various places so that the decompressor can attempt to recover a corrupted file. Also, no compression technique is perfect, so even without these the data is still distinguishable.

--
Need a Python, C++, Unix, Linux develop
Re:It's all about entropy by mlyle · 2010-09-19 08:17 · Score: 5, Insightful

Not exactly.
The problem with steg'ing inside known container formats, compressed container formats, is this:
Each implementation of the compression algorithm has its nuances. If the majority of an MP3 looks like it was compressed by the iTunes implementation, but then there's a range of output iTunes would not generate (particularly if the input file is known), that's very suspect. Ditto if things like PSNR change, even subtly, for the portion where steganography is in play. Even though compressed data has a great deal of entropy, it IS significantly constrained over random data in that A) known decompression programs must return specified output from it, and B) known compression programs generated this data as output from possibly-known input data.
If your adversary is the local police or one of your buddies, this stuff doesn't matter. If it's intelligence agencies or research organizations, good luck. Steganography is hard.
Re:It's all about entropy by bytesex · 2010-09-19 08:18 · Score: 4, Interesting

make it compressed header-less audio. Give 'em a decoder (which will produce noise), and claim you're a scientist and this is you recording Jupiter.

--
Religion is what happens when nature strikes and groupthink goes wrong.
Re:It's all about entropy by LihTox · 2010-09-19 08:22 · Score: 2, Interesting

Isn't the point of steganography that you add the encrypted data on top of some other data, like a photograph or video, so that it looks like normal noise? I agree that carrying a thumb drive around, filled with random 0s and 1s, would be rather suspicious....
Re:It's all about entropy by biryokumaru · 2010-09-19 08:26 · Score: 2, Funny

From now on, whenever I go on a flight I'm bringing several DVDs of random data.

--
When you're afraid to download music illegally in your own home, then the terrorists have won!
Re:It's all about entropy by gnasher719 · 2010-09-19 08:27 · Score: 2, Interesting

Each implementation of the compression algorithm has its nuances. If the majority of an MP3 looks like it was compressed by the iTunes implementation, but then there's a range of output iTunes would not generate (particularly if the input file is known), that's very suspect.

Record same LPs to uncompressed audio files. That recording will be pretty unique. Encrypt your data any way you like, then store the encrypted data in the lowest bit of the 16 bit samples. Compress with Apple Lossless or FLAC or whatever you have.
Re:It's all about entropy by v1 · 2010-09-19 09:20 · Score: 5, Insightful

However, absolute random noise on a disk isn't all that usual,
Actually, nowadays, it's extremely unusual. Blocks are all zero'd from the factory, and anything you save over them that's later marked free will almost certainly be far from random. (like pieces of pictures, documents, applications, etc)
Really, statistically speaking, if you wanted to look on a hard drive for encrypted data, your best bet would be to go looking for blocks of high entropy data.
The only defense against this would be if you did a random wipe of your hard drive when you bought it, and then reinstalled, and patched your OS to automatically random-wipe files before deleting or updating/moving them. But then you get into the area of "this person is obviously going to a lot of work to make it easy to hide something from us", which by itself raises an eyebrow.
And on that note, I'm a little surprised now that I think about it, that I can't come up with a single example anywhere of a native or add-on OS feature for any OS, that does random-wipe-on-delete. OS X has "erase free space" built into disk utility, and you can find an app to do this for other OSs, but obviously zero'd blocks are not what we need to be creating. And the fact that you have to do this step manually, and it takes HOURS to run usually, is also surprising. I don't know offhand if OS X's "secure empty trash" zeros or randoms, but you're not likely to do that for EVERYTHING you throw away since it takes time, and since a lot of files get moved/deleted by the OS automatically without doing this. (end problem: anyone with a clue knows you can't hide anything in a bunch of zero'd blocks)

--
I work for the Department of Redundancy Department.
Re:It's all about entropy by melikamp · 2010-09-19 09:42 · Score: 5, Interesting

I've been working on this very problem for a while now. An easier version, even: how to encrypt a single file in a way that makes it indistinguishable from random data? The algorithm must allow for a short password (dozens of bytes), and should be able to encrypt very large files. Optimally, an attacker may see the algorithm and may suspect correctly what the plaintext is, but should still be unable to prove that the given cyphertext is the output of the algorithm. That is, the only way to "prove" that should be by a brute-force password search, whereas finding a working password of a few dozen bytes is proof enough. This is good enough because a brute-force search over 60^30 passwords is kind of slow.
I further simplified the problem by saying that the size of a file needs not to be hidden: it's a separate task, and a much easier one.
I have a reason to approach the problem this way. If I have on my computer a file named "one-time-pad.bin", and it looks like a one time pad, then it must be a one time pad. The very existence of an encrypted partition should be enough to convince anyone that there is encrypted data. If a multi-sheaf algorithm is used, then there is a reasonable suspicion that there are multiple sheafs. Either way, the owner seems to be hiding something. Burying data in JPG and similar tricks are also sketchy, as it is almost certainly possible to distinguish (statistically) a benign JPG from the one steganographically altered, although this can be avoided by hiding very little data in very large files. Here, at least, there is an expensive solution.
I can think of at least one other way to do it, here goes my original description on the internet. Say, we want to use passwords with length up to B bits and encrypt files with length up to M bits. Fix forever B random binary strings of length M each, call them N = {n_1, n_2, ... , n_M}. The set of 2^B passwords is in a bijective correspondence with the set of subsets of N, for example a password like 110101... will select the subset {n_1, n_2, n_4, n_6, ...}. Treat n_i in that subset as integers and add them. Threat the plaintext as an integer and add it to (or XOR with) the result. One can think of it as of constructing a one time pad (one of 2^B) and XORing with it. Even if the attacker knows n_i for each i, and the plaintext (without loss of generality, all zero), and the cyphertext, she still has to decompose the cyphertext as a sum of a subset of N, and even deciding whether or not it can be done is np-hard. The complexity will be exponential as long as both M and B are large, which they are in expected applications.
The nicest feature here is that with a non-trivial password, the cyphertext will look as random as they get! It will be a sum of carefully pre-selected random numbers, padded with the plaintext.
One obvious limitation is that each password can only be used once, since similar plaintexts will produce similar hypertexts, but that could be remedied. A bigger problem, IMHO, is that this algorithm requires B random binary strings of length M each to be built-in. Just to give you an idea, if you want to encrypt files of size up to 1 GiB with passwords of size up to 512 bits, then you need to keep around 512 GiB of pad. Either that, or be able to generate really really fast 512 random reals (random here meaning, the same every time, but completely unrelated), which is very sketchy: the reals could easily be so related that the subset sum will allow for a sub-exponential solution.
I would be very interested to hear from anyone about this idea.
I may have another way of solving the same hiding problem, and it has to do with a completely different, yet, IMHO, also very fascinating way of turning a short binary string into a very long and random-looking binary string in a one-way fashion. I decided that I won't implement the subset sum solution unless I am totally sure that I cannot find something more elegant, so feel free to steal my idea above and code it in.
Re:It's all about entropy by Kjella · 2010-09-19 09:44 · Score: 4, Insightful

Well, the problem is that it doesn't really apply to compressed data. Compression schemes try packing things as efficiently as possible, so there's relatively little you can add without making it obvious the compression is tampered with. You could try embedding it as some sort of watermark into the photo/video before compression, but that too is difficult and won't hide very much. And most people don't carry tons of BMPs, WAVs and uncompressed AVIs..
So far it seems most people agree the best way to hide encrypted data is within other encrypted data. You don't have to be super-paranoid to use encryption, my last workplace used full disk encryption and I don't think anyone can seriously accuse you of anything if you just say that "I feared by computer would get stolen, and I could be exposed to identity theft or have my family photos posted online" or something like that.
The best solutions I have seen work like this:
1) If you enter both your "normal" password and your "secret password" => access to the normal disk and it'll seamlessly move around any secret data as long as there is room.
2) If you enter only your "secret" password => access to your secret data.
3) If you're under duress, you give just the "normal" password and you get just the normal disk. Your hidden data can get overwritten since the encryption software doesn't know about it, but there's no way to prove that there is a secret container or a secret password.

--
Live today, because you never know what tomorrow brings
Re:It's all about entropy by parlancex · 2010-09-19 09:53 · Score: 2, Interesting

Steganography is hard if you demand high density. That is, a higher ratio of your content vs the content it is being inserted into. It really depends how much encrypted data you need to hide and how much unencrypted data you have to hide it in. If you're hiding less than a kilobyte of encrypted high entropy data in a 6MB high entropy mp3, and your algorithm is intelligent to distribute it evenly into many areas of the file, that's much harder to detect. That said, I didn't say modifying a complex compressed file format while leaving it functionally intact would be easy, that's the hard part.
Re:It's all about entropy by AK+Marc · 2010-09-19 10:17 · Score: 2, Insightful

It is both irrelevant and directly addresses the point. The answer is: "There is no such thing as random data that isn't encrypted, so the question need never be asked. If it's 'random' it is encrypted, even if indistinguishable from truly random data." That may not be true in all cases, but true enough for law enforcement to make your life a living hell for having wiped your HDD with a randomizer.

--
Learn to love Alaska
Re:It's all about entropy by linuxrocks123 · 2010-09-19 10:17 · Score: 3, Informative

You (and the submitter) might want to have a look at http://eprint.iacr.org/2009/531 which talks about "known-key attacks" on "AES-like permutations". The goal of these types of attacks is to distinguish AES-encrypted data from random noise. Right now, all they can do is break 8 rounds of AES-128, so the answer to OP's question is "right now, AES-encrypted data is indistinguishable from noise".
---linuxrocks123

--
vi ~/.emacs # I'm probably going to Hell for this.
Re:It's all about entropy by AK+Marc · 2010-09-19 10:20 · Score: 2, Funny

When they discover you aren't a scientist, live in your mothers basement, and have never held a job, they'll arrest you for obstruction of justice.

--
Learn to love Alaska
Re:It's all about entropy by Anonymous Coward · 2010-09-19 10:31 · Score: 2, Informative

And on that note, I'm a little surprised now that I think about it, that I can't come up with a single example anywhere of a native or add-on OS feature for any OS, that does random-wipe-on-delete. OS X has "erase free space" built into disk utility
Open the finder then go to the finder preferences; click on the advanced button and check 'empty trash securely'.
Re:It's all about entropy by xombo · 2010-09-19 10:38 · Score: 3, Informative

"Secure Empty Trash overwrites your data with digital gibberish"
From apple's site about Secure Empty Trash feature @ http://www.apple.com/pro/tips/empty_trash.html
Re:It's all about entropy by moonbender · 2010-09-19 10:56 · Score: 2, Insightful

I'm not so sure high entropy data is all that rare. While the container format makes them distinguishable from completely random data, compressed audio and video files do have very high entropy, I think. And much of the space of a drive will probably be used for movies and music.

--
Switch back to Slashdot's D1 system.
Re:It's all about entropy by melikamp · 2010-09-19 11:00 · Score: 2, Insightful

I won't make a prediction about a proportion, but it seems to me that orphaned blocks of compressed files would seem pretty darn random, and almost everyone has those.
Also, in GNU/Linux at least, there is shred utility that does what it sounds like: overwrites files with patterns (optionally, with zeroes) before erasing them. May be it works on OS X too?
Re:It's all about entropy by tftp · 2010-09-19 12:39 · Score: 3, Interesting

When they discover you aren't a scientist
The GP should record some LP, at 24 bits per sample, at 96 kSa/s, in stereo. It wouldn't be too unusual, especially if he picks a well known music. Classical music will be particularly good here. A typical opera, in .WAV, will be about 4 GB, and there will be at least 8 lower bits that are yours to play with (they are noise from the turntable.)
Re:It's all about entropy by chgros · 2010-09-19 14:00 · Score: 3, Informative

I suggest you read up on cryptography.
Encryption, in general, is attempting exactly what you're attempting: make plaintext look random.
What you're trying to defend against is known as a "known-plaintext attack".
You can use any standard cryptographic approach such as AES-CBC as suggested above.
For a password-based approach, there are also standard key generation algorithms such as PKCS #5.
Note that your claim that your approach gives "as random as it gets" data is not true; once you've fixed for all time a set of random numbers, they're no longer "random".
As for generating random-like numbers deterministically, that's what stream ciphers (e.g. RC4) do.
Re:It's all about entropy by Anonymous Coward · 2010-09-19 15:40 · Score: 3, Insightful

As another poster pointed, what you're talking about is the entire point of encryption. And yes, with good cryptography, it's pretty much impossible to decrypt except by brute forcing with all possible key combinations.
Good cryptography is hard, though, which is why it's generally best to leave it up to the experts. Not saying you can't have fun thinking about it, but realize that you're probably not going to come up with anything really new in the field.
I'd suggest started with Applied Cryptography for a primer.
Re:It's all about entropy by Menkhaf · 2010-09-19 19:02 · Score: 2, Funny

Better yet: Make it compressed headerless video. Claim you're recording Uranus.

--
A proud member of the Onion-in-Hand alliance

No by trigeek · 2010-09-19 07:59 · Score: 2, Interesting

Properly encrypted data is indistinguishable from random data. However, just the presence of random files on the system could be incriminating. Perhaps it's better to hide the data in another type of file? Perhaps using the lsb of a bitmap file?

--
Sometimes I doubt your committment to SparkleMotion!

Re:No by sjames · 2010-09-19 08:06 · Score: 2, Insightful

It would be best to precondition the media by writing random data over the entire thing. For added fun, encrypt the text of various childrens books and write the result to the drive.
Re:No by owski · 2010-09-19 09:10 · Score: 2, Interesting

Or, actually use the encrypted data as a one time pad and then when pressed use it to decrypt some ordinary data.
Re:No by butlerm · 2010-09-19 10:22 · Score: 3, Informative

There is no question is is computationally difficult, just not computationally impossible. The reason for that lies in computational complexity theory. You can get a basic summary of the theory here.
In summary, if the data being encrypted is compressible by practically any algorithm whatsoever, it has computational complexity less than its bit length, i.e. a smaller bit string can be used to recover the larger one. Likewise, the computational complexity of any encryption key is at most the length of the key in bits.
Suppose you are encrypting a 256K bit string that can be compressed by a factor of two by an _ideal_ compressor. And then you have a maximally random 1K bit key. The maximum Kolmogorov complexity of any finite deterministic function of those two inputs that is known to the attacker is 129 Kbits. Where the maximum computational complexity of a truly random string of the same length as the input is 256 Kbits.
The difference between those figures opens the door to a statistical attack, because the data is not _really_ random. It just looks that way, sort of like the output of a pseudo random number generator, which isn't really random at all. If you encrypt a string of zeroes with a significantly shorter key, the output of a pseudo random number generator is exactly what you will get, a pattern that is maximally vulnerable to attack.
The lesson to be learned here is if you want to minimize the risk of attacks from folks with far more computer power than you have available, compress it using the best available compression algorithm first. Then the computational complexity of the input string will approach the theoretical maximum for a string of that length, depending on how good the compressor is. Throughout I mean complexity relative to what the attacker knows (such as the encryption algorithm) of course.

Re:Well by Kjella · 2010-09-19 08:00 · Score: 5, Insightful

As far as I know finding patterns in the output is tightly linked to reducing the number of possible keys, so good encryption algorithms should not create patterns. Of course if your encryption software writes some kind of header - which wouldn't affect the security of the encrypted contents - then it will be obvious to anyone looking that you have an encrypted container. So this is 99% about implementation and 1% about encryption algorithms.

--
Live today, because you never know what tomorrow brings

It depends.... by TrumpetPower! · 2010-09-19 08:00 · Score: 4, Insightful

If I give someone one file containing random data and another containing data encrypted with AES, will he be able to tell which is which?

Does the person to whom you give these two files have a rubber hose? Is he a member of the “extraordinary rendition” team?

The point of steganography is to not get caught in the first place. If you need plausible deniability, you’ve already lost.

Cheers,

b&

--
All but God can prove this sentence true.

Re:Lifting the Lid on the Guilty Yid by ian(at)union.io · 2010-09-19 08:01 · Score: 2, Funny

Let me guess... Random!.. No, wait, too obvious. Encrypted!

Wrong question to ask? by mclearn · 2010-09-19 08:02 · Score: 2, Interesting

Perhaps the question is incorrect. If i have a volume with data and a volume with encrypted data, then the encrypted data can be discerned from the non-encrypted data by virtue that there will be patterns detectable in the non-encrypted volume. So technically if you have a drive and there is random data on it but no discernible patterns, then there is either encrypted data on it, or it is an empty drive. It is likely not even factory default since that it likely to have some structure imposed upon it as well. What is the point of carrying around an encrypted volume with the ability for plausible deniability if that plausible deniability requires you to have random data as a volume? The existence of random data will render your plausible deniability claim useless since, by definition, your claim is no longer plausible.

Re:iieorjoeghoiuhtr by bennomatic · 2010-09-19 08:08 · Score: 2, Funny

Nice. "All your base are belong". You purposely left off the last two words to give a smaller sample to review and potentially recognize patterns.

--
The CB App. What's your 20?

Re:Well by bennomatic · 2010-09-19 08:10 · Score: 5, Funny

Weird. I guess I there's a bug in my ROT13 implementation. If I run my text through twice, I just get the original message.

--
The CB App. What's your 20?

Shouldn't by dachshund · 2010-09-19 08:13 · Score: 4, Informative

AES is designed to be a pseudo-random function (meaning it's evaluated against that criteria). What this means is that /when used properly/ AES encrypted data should be indistinguishable from random data, at least for a distinguisher running in bounded time. If anyone discovers an efficient algorithm that can distinguish this, it'll be a big nail in AES's coffin (and yes, at the very theoretical level I realize that there already are some known weaknesses in AES, but for the moment you're in good shape).

Re:Shouldn't by dachshund · 2010-09-19 08:36 · Score: 2, Informative

Erp, I meant Pseudo-Random Permutation, which is indistinguishable from a PRF if the amount of data is realistic.
Re:Shouldn't by linuxrocks123 · 2010-09-19 10:13 · Score: 2, Insightful

Yes. People do.
We know you can brute-force AES. We also know that if you had a computer the size of the Earth where every piece of matter the size of a grain of sand was an ALU, you wouldn't be able to do it in thousands of years. The only hope attackers have is more sophisticated cryptanalysis techniques. This may or may not happen within 30 years.
---linuxrocks123

--
vi ~/.emacs # I'm probably going to Hell for this.
Re:Shouldn't by ras · 2010-09-19 14:15 · Score: 3, Insightful

AES encrypted data should be indistinguishable from random data
Nope. This assertion has been made here over and over again, and it is out and out wrong . See: http://opensource.dyc.edu/random-vs-encrypted
In essence, encrypted data sticks out like dogs balls because of its high entropy, yet there are enough patterns in it to make it obvious to an expert it isn't just random data. Even if it did look like random data who in the hell is going to believe you are carrying around gig's of data you can trivially generate as needed from /dev/urandom? Nobody.
So, the problem you have to solve is how you are going to plausibly explain away gig's of what is clearly encrypted crap. Forget TrueCrypt, or any special tools that don't normally come with your Operating System. Their very presence screams "liar!". Forget large encrypted files that don't have any conceivable use, even if they aren't named "my-porn-collection.zip.gpg". After all, its your laptop so a program you use must have put them there, so some program should break if you move them out of the way.
And finally, once you come up with a way of hiding your encrypted crap, don't go blasting it over the internet. If it became common knowledge the men with rubber hoses may hear of it, rendering your lovely invention useless.
Some evidently don't agree with this last piece of advise because they have posted their solutions to the problem right here, on one the largest megaphones on the 'net. Fortunately for them, Slashdot has in typical Slashdot fashion come to their rescue. Unlike the piece of miss-information I am responding to which is rated "5, informative", these insightful and informative posts are rated 1. Probably because they necessarily involve long complex commands which are utterly beyond your average slashdotter, which probably means they will rarely be used, which probably means they are right - my last piece of advise is alarmist.
Re:Shouldn't by dachshund · 2010-09-20 14:11 · Score: 2, Interesting

He dismisses LUKS and TrueCrypt because they don't offer plausible deniability - because of the headers, as you say. He then moves onto Linux's loopback device in crypto mode. It doesn't write headers. He then comes up with a technique of comparing the raw encrypted data with random text. Turns out using his techniques it is easy to spot the difference. And that is the point of the paper: even without headers or any other tell tale signs, there is no way to hide the fact that you have encrypted data on your disk.
From what I can tell, the paper makes several points:
1. Implementations that write headers leak info on what's encrypted
2. Poor crypto implementations (i.e., not using modes of operation correctly) leak info on what's encrypted
3. Encrypted data stands out when compared with non-random background
4. If you can look at the way a filesystem changes over time, you'll spot people writing encrypted data
I think (1) and (2) are fairly obvious and irrelevant --- if you do things wrong, then of course you'll get caught. I think we've addressed (3) already in this thread. So it remains for us both to agree that you can learn things by observing the way a filesystem changes over time. That doesn't exactly correspond to "using his techniques it is easy to spot the difference [between encrypted data and random text]". In fact, it sounds to me like a whole different flavor of attack, though certainly a powerful one that people should be careful of.
Here you are playing with words. "Encrypted data looks like random data" in this case means in this case "looks identical to the novice, but an expert will find it easy to distinguish the two". But no one would take that meaning from your post. It was poor communication at best.
I would argue that the attacker's expertise is irrelevant to this discussion. If an attacker can obtain periodic snapshots of a user's system, they don't have to be an expert --- they're just blessed with an unusually rich amount of data. No cryptographic technique (short of completely re-randomizing the noisy portions of the disk) is going to hide the fact that you're making suspicious changes to the data. I would further claim that, absent such a history, encrypted data does look like random data. But you are welcome to disagree.
It's been a good long time since I took part in a nice, angry Slashdot flame war, particularly in my PhD subject area. It's been fun.

Re:Well by bytesex · 2010-09-19 08:16 · Score: 5, Insightful

It depends what you call an 'encryption algorithm'. If you mean 'DES', then no - DES is nowadays considered a weaker algorithm. If you mean 'AES-256', then still no - you need to *apply* AES-256 before it's any good, because AES is a block-cipher and will re-encrypt identical blocks of plain-text with the same key to identical blocks of ciphertext. If you mean 'AES-256 in CBC mode with random IV and SHA-256 HMAC authentication', then that's an algorithm that can be safely used. Under certain real-world circumstances.

--
Religion is what happens when nature strikes and groupthink goes wrong.

Re:At the risk of doing someone's homework in a fo by mysidia · 2010-09-19 08:17 · Score: 2, Informative

Perhaps. But if you use cryptsetup with LUKS, there is a readable header for the encrypted file, you don't need the key to determine encryption has been used. In fact, you can set multiple passphrases that have the authority to decrypt the partition.

GPG Encrypted data is also distinguishable, regardless of whether you use ASCII armoring or binary .GPG files. There are headers in the encrypted output that can be recognized without having the key to decrypt anything.

Now if you run 'openssl' from the command line, and choose 'aes-256-cbc', supply a true random key, and enter data bits interspersed with random 'padding bits'. It will be probably impossible for anyone to determine from the output whether there are any data bits or not, without knowing the key.

crypto is hard by Tom · 2010-09-19 08:21 · Score: 3, Informative

Hard to say from your question, but if you haven't done already, get yourself some crypto knowledge. Crypto is hard, there is a reason that you are laughed out of the room if you say you've invented a new crypto algorithm and you don't already have strong credentials.

Randomness is one of the harder computer problems. Especially in steganography, many implementations have been defeated by creating not enough or too much randomness. If you want to hide your message in something, it doesn't matter if your output is distinguishable from randomness, it matters if it is distinguishable from what should be there. Simple approaches like LSB tricks have often fallen because those happen to be not random in many input data.

--
Assorted stuff I do sometimes: Lemuria.org

Re:Well by SeanTobin · 2010-09-19 08:31 · Score: 5, Funny

Weird. I guess I there's a bug in my ROT13 implementation. If I run my text through twice, I just get the original message.

Just do what they did with DES... use 3rot13 and you're much more secure than the original implementation.

--
Karma: SELECT `karma` FROM `users` WHERE `userid`=138474;

chaining is essential by pedantic+bore · 2010-09-19 08:32 · Score: 3, Informative

If you use AES in ECB mode, then the answer is that it's usually painfully obvious that the original data was structured.

If you do use chaining (CBC, or something similar), then it will look quite random.

Excellent example here: http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Electronic_codebook_.28ECB.29

--
Am I part of the core demographic for Swedish Fish?

Re:At the risk of doing someone's homework in a fo by butlerm · 2010-09-19 08:47 · Score: 2, Informative

You cannot distinguish between the two.

This is categorically not true, unless the key is as long or longer than the data file (and never used again). There is indeed an attack vector against any encrypted data file if the key length is small by comparison. Statistical analysis plus the slightest idea of what type of data is being encrypted is more than adequate to mount a successful attack (given sufficient computational resources) unless the key is _much_ longer than what is typical today. The lack of computational resources is the only thing that keeps typical encrypted data secure.

Re:iieorjoeghoiuhtr by Volante3192 · 2010-09-19 08:54 · Score: 3, Funny

Looks Welsh...

Unfoilable Steganography in LSB Plane of Imagery by mirkurius · 2010-09-19 08:56 · Score: 5, Interesting

Steganographic attempts are considered foiled if someone can detect that there is a secret message, they don't need to be able to retrieve the message in order for the attempt to be considered a failure. I did my Master's project on hiding data in the least significant bitplane of imagery. The trick is to "randomly" scatter your secret message throughout this plane. I showed methods that would allow you to do this so that the data was indistinguishable. You should always encrypt your secret message first so that it looks random, or better yet, shape the statistics of your encoded message to match the noise characteristics that were in the original LSB plane. If you use an image created from a very noisy source, such as a digital camera, and you encrypt the embedded message and scatter it using a reversible algorithm, and iteratively ensure that the statistics of the altered LSB plane look the same as the original LSB plane, I proved that it is not possible for someone to tell that there is a secret message hidden there. However, you need to be careful to use an original image you created yourself, and to destroy the original, because if someone ever compared the original to the one with the embedded message, they could definitely tell there was something altered by comparing the LSB planes.

Actually, you've spotted the problem by bagofbeans · 2010-09-19 09:12 · Score: 3, Interesting

All the investigators need to do is run some fake but seemingly complex program that looks at the file under inspection and says "yes, stenography in use". Then the full weight of the law comes down, because now the suspect has to prove the negative - impossible of course.

So actually what is needed is a suspect's right that investigators prove any assertion that files have been hidden if that assertion/analysis is used as evidence in court.

Statistics by Spazmania · 2010-09-19 09:14 · Score: 3, Interesting

If you find a file on my hard drive with data you can't readily decode, is it:

A) Compressed with an unknown compressor
B) Encrypted with an unknown encryptor
C) Random bytes used for an encryption process
D) Random bytes used for something else

I can't prove that answer D is wrong... but I don't have to because I know that 99% of the time, it's one of the other answers.

If you want to hide your data, the file must ostensibly have some other purpose... something that isn't obviously a lie. That's what steganography is about. For example, you might download as much of the 1 meter-resolution Google Maps satellite image as fits on your hard disk, save it uncompressed and then store encrypted data in the low-order bit of each byte (3 bytes to a pixel). Coupled with a map application that can display the imagery, it would appear to be one thing (a map) while really being another (a container for encrypted information).

At that point, unless you capture the encryption software it becomes hard to suspect that there is encrypted data, let alone prove it.

--
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.

NSA? Bah. by denzacar · 2010-09-19 09:20 · Score: 3, Funny

I don't work for any 3-letter agency and even I could easily get the information needed.
With the right tools.

--
Mit der Dummheit kämpfen Götter selbst vergebens

Re:NSA? Bah. by Shakrai · 2010-09-19 18:24 · Score: 2, Insightful

See, the "drug him" part I have an issue with. I have personal experience with both mind altering drugs and truecrypt. Let me assure you that drugs do not help you to remember a complex pass phrase.... ;)

--
I want peace on earth and goodwill toward man.
We are the United States Government! We don't do that sort of thing.
Re:NSA? Bah. by denzacar · 2010-09-19 22:24 · Score: 2, Insightful

Not that kind of drug.
Also, drugs are to be used in sequence with the beatings - not simultaneously.
No point in beating up someone who can't feel anything just for the sake of beating him up.
Leave the personal enjoyment for later.

--
Mit der Dummheit kämpfen Götter selbst vergebens

Re:perhaps... by Goaway · 2010-09-19 09:23 · Score: 2, Informative

Exactly. If you do not already know the answer to this question, there is no way on earth you will write a program that is at all secure.

Back to the books and study.

Re:One more level... by netsharc · 2010-09-19 09:34 · Score: 2, Interesting

Presumably a simple XOR would make them be able to come up with that sentence... hell, any sentence thinkable in the world! "Look, if we apply these bytes, the secret message says [...]!"

--
What time is it/will be over there? Check with my iPhone app!

Re:iieorjoeghoiuhtr by flyingkillerrobots · 2010-09-19 09:40 · Score: 2, Interesting

Neither. It's readily visible the way you just mashed your keyboard, in a rather nonrandom fashion. Dividing the left handed keystrokes from the right handed ones, you get: erg ergerg erergerg greg erererg and jpoijpoij hoihoiuh nnuhoihh poiuhiuhoihh hhoiuhih The 'erg' pattern is near universal with slight variations, and the combination of poiujh (in that order), usually missing one or two of the letters, describes well the vast mojority of the keystrokes with your right hand.

--
"It is a good thing for an uneducated man to read books of quotations..." -Winston Churchill

Have a look at Rubberhose by PhunkySchtuff · 2010-09-19 09:44 · Score: 3, Informative

Rubberhose (Pronounced Marutukku) is transparently deniable encryption, developed by (among others) Julian Assange.
This seems to do exactly what you're trying to do, so even if you want to go ahead and implement it yourself from scratch, it's worth reading up on what they've done to get some ideas and avoid some potential pitfalls.

Rubberhose is a computer program which both transparently encrypts data on a storage device, such as a hard drive, and allows you to hide that encrypted data. Unlike conventional disk encryption systems, Rubberhose is the first successful, freely available, practical program of deniable cryptography in the world. It was released in an earlier form in 1997, but has undergone significant changes since that time. The design goal has been to make Rubberhose the most efficient conventional disk encryption system, while also offering the new feature of information hiding.
Rubberhose is a type of deniable cryptography package. Deniable cryptography gives a person not wanting to disclose the plaintext data corresponding to their encrypted material the ability to show that there is more than one interpretation of the encrypted data. What deniable crypto means in the Rubberhose context is this: if someone grabs your Rubberhose-encrypted hard drive, he or she will know there is encrypted material on it, but not how much -- thus allowing you to hide the existence of some of your data.

--
Specialist Mac support for creative pros, Melbourne

Re:iieorjoeghoiuhtr by fucket · 2010-09-19 09:49 · Score: 2

Looks more Qwghlmian to me...

No, you ALL miss the point. by Doctor+O · 2010-09-19 09:51 · Score: 2, Informative

No, you ALL miss the point. How are you going to explain having a HDD or partition full of "garbage"? Nobody with half a brain will believe you there's nothing encrypted in the noise.

(Yeah, an entropy file would be easy to explain, but entropy files usually don't come in sizes big enough to hide data in, PLUS, who apart from us here understands what an entropy file is? A judge sure doesn't.)

Steganography, OTOH, would be very useful. I have around 50 GB of family photos on my machine, that would make for a nice data storage.

--
Who is General Failure and why is he reading my hard disk?

Re:No, you ALL miss the point. by cetialphav · 2010-09-19 13:00 · Score: 5, Insightful

You tell them you just visited your cousin Jim, who had an old hard drive he didn't want anymore, and you needed a spare so he gave it to you, but not before he ran "dd if=/dev/urandom of=/dev/sda1" because he didn't want you having his old tax documents.
And now you have just fallen victim to a classic interrogation technique. They have just gotten you to tell a story that then can investigate and determine its credibility. They will talk to your cousin Jim; they will look for signs of an OS installation at the date and time you said. They then ask more follow up questions (for which they already know the true answer) to get you to dig a bigger grave for yourself. Then they show you that they know you are lying and inform you of the penalty for that crime and offer you a "deal" to tell the truth.
The fact is that when you are dealing with good interrogators, you cannot lie your way out of it. If you have a huge file full of random data, that is suspicious and there is nothing you can say to change that. The whole point of steganography is to hide the data in something innocent so that no one ever asks you anything. The goal is to blend in and give them no reason to give you a second though.
Re:No, you ALL miss the point. by pipedwho · 2010-09-19 13:01 · Score: 2, Funny

You tell them you just visited your cousin Jim, who had an old hard drive he didn't want anymore, and you needed a spare so he gave it to you, but not before he ran "dd if=/dev/urandom of=/dev/sda1" because he didn't want you having his old tax documents. All you've done with it since is install the OS...
...and a copy of Truecrypt into Program Files.
Re:No, you ALL miss the point. by sco08y · 2010-09-19 23:32 · Score: 2, Interesting

Well of course you would want to *have* a cousin Jim who is willing to say he had given you a spare hard drive. Or replace "cousin Jim" with "friend Steve" or whoever. It's not *that* hard.
And you're going to rehearse the stories together? Now you're conspiring, so you've added another charge. And there are more points of failure, as all your stories have to match, which they won't.

Re:Well by swillden · 2010-09-19 10:08 · Score: 3, Informative

DES is nowadays considered a weaker algorithm

DES is considered too weak for many uses due to its small key size.

Nonetheless, if you can find a way to reliably distinguish DES output from random bits, without knowledge of the key and with remotely-practical efficiency, you can publish a paper that will gain you substantial name recognition among the world's cryptographic elite.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.

Re:Well by nospam007 · 2010-09-19 10:21 · Score: 2, Interesting

There is no file with random data.
Why would one want to create a file with random data?
The point of creating a file is that it ain't random data what you are saving, otherwise you wouldn't save it.

group effort can provide plausible deniability by John_Sauter · 2010-09-19 10:41 · Score: 3, Interesting

If you find a file on my hard drive with data you can't readily decode, is it:

A) Compressed with an unknown compressor
B) Encrypted with an unknown encryptor
C) Random bytes used for an encryption process
D) Random bytes used for something else

I can't prove that answer D is wrong... but I don't have to because I know that 99% of the time, it's one of the other answers.....

OK, let's, as a community, add an (E). Everyone create a file on your laptop, in your home directory, named random.bin, as follows:

dd if=/dev/urandom of=random.bin bs=4096 count=10000

The actual value of the count isn't important, as long as it is large enough to create lots of random bits. If lots of people do this, we have “(E) Random bytes because Slashdot told me to”, providing plausible deniability for anyone who needs to use that file to encrypt something important.

Re:Well by Omnifarious · 2010-09-19 10:49 · Score: 2

Nonetheless, if you can find a way to reliably distinguish DES output from random bits, without knowledge of the key and with remotely-practical efficiency, you can publish a paper that will gain you substantial name recognition among the world's cryptographic elite.

If you have enough DES output you can do this. Someone already mentioned that if you use a strong cipher, even AES-256 in ECB (electronic codebook mode) then the output is nearly trivially distinguishable because repeated plaintext patterns of the size of the block the block cipher algorithm uses will encrypt to identical ciphertexts.

Even if you use CTR mode or CBC mode, patterns in the plaintext show up in the output if you encrypt enough data.

For example, if you by chance end up with the same ciphertext output block in CBC mode you can obtain the XOR of the corresponding plaintext blocks by XORing the two immediately preceeding ciphertext blocks. If you encrypt enough blocks, the laws of statistics favor two blocks ending up being identical by chance. And this XOR equality allows you to determine if the data is encrypted because it's generally relatively easy to tell if the XOR is the result of XORing two pieces of unecrypted data.

CTR mode has a different sort of relationship that can be exploited. You know that the XOR of any two ciphertext blocks is not equal to the XOR of the corresponding two plaintext blocks. This can eventually leak information about the plaintext with enough blocks to work with. But using the inequality to determine if the bunch of data is truly random or encrypted data takes a lot fewer blocks.

And, of course, if your blocks are larger you need many more of them for the statistics to work out in your favor for attacking the cipher. This is why block ciphers should be re-keyed periodically when encrypting a lot of data. It's also why it's much easier to write a distinguisher for DES (64-bit blocks) than AES (128-bit blocks) that distinguishes between encrypted and random data.

--
Need a Python, C++, Unix, Linux develop

Re:One more level... by lgw · 2010-09-19 11:40 · Score: 4, Funny

But a good defense attorney would apply the same principle to show that the prosecution's legal submissions were really steganography hiding insults to the judge's mother.

--
Socialism: a lie told by totalitarians and believed by fools.

On The Practical Side by BoRegardless · 2010-09-19 11:57 · Score: 2, Insightful

What happens if you use the old "torn sheet of paper" routine?

Each drive or device moving from A to B goes with a different courier/ISP/method and no "piece" contains enough information to be identifiable or usable.

All the pieces need to arrive at the destination to be able to be re-constructed back into usable form.

Any time you send a complete message in one burp, one hard drive or one CD or one image, there is a chance for decryption by any number of accidents or threat of death to all your family members one person at a time while you watch.

No encryption was used in the creation of this message...thus I have deniability.

Re:iieorjoeghoiuhtr by Mitchell314 · 2010-09-19 13:07 · Score: 5, Funny

Dammit, I finally get cthulu back to sleep and some jackass wakes him up again.

--
I read TFA and all I got was this lousy cookie

Re:One more level... by Mitchell314 · 2010-09-19 13:12 · Score: 4, Interesting

It's super easy to make up a key. XOR = key.

--
I read TFA and all I got was this lousy cookie

Slashdot Mirror

Distinguishing Encrypted Data From Random Data?

78 of 467 comments (clear)