Distinguishing Encrypted Data From Random Data?

← Back to Stories (view on slashdot.org)

Distinguishing Encrypted Data From Random Data?

Posted by timothy on Sunday September 19, 2010 @07:50AM from the is-this-a-solved-problem? dept.

gust5av writes "I'm working on a little script to provide very simple and easy to use steganography. I'm using bash together with cryptsetup (without LUKS), and the plausible deniability lies in writing to different parts of a container file. On decryption you specify the offset of the hidden data. Together with a dynamically expanding filesystem, this makes it possible to have an arbitrary number of hidden volumes in a file. It is implausible to reveal the encrypted data without the password, but is it possible to prove there is encrypted data where you claim there's not? If I give someone one file containing random data and another containing data encrypted with AES, will he be able to tell which is which?"

22 of 467 comments (clear)

Min score:

Reason:

Sort:

Re:iieorjoeghoiuhtr by Anonymous Coward · 2010-09-19 07:53 · Score: 4, Funny

Trick question! It is random text that's been encrypted!
It's all about entropy by cpghost · 2010-09-19 07:58 · Score: 5, Insightful

Encrypted files have maximum entropy, just like absolutely random files. Basically, you can't tell which one is which. However, absolute random noise on a disk isn't all that usual, so any encrypted file (or pure random file) will stand like a sore thumb: it will be highly visible. But, again, you can't tell the difference.

--
cpghost at Cordula's Web.
1. Re:It's all about entropy by Omnifarious · 2010-09-19 08:05 · Score: 4, Informative
  
  Doesn't compressed data look random?
  As an ideal, yes. But compressed data is still pretty distinguishable from random data. In particular, many compression formats have small markers in various places so that the decompressor can attempt to recover a corrupted file. Also, no compression technique is perfect, so even without these the data is still distinguishable.
  
  --
  Need a Python, C++, Unix, Linux develop
2. Re:It's all about entropy by mlyle · 2010-09-19 08:17 · Score: 5, Insightful
  
  Not exactly.
  The problem with steg'ing inside known container formats, compressed container formats, is this:
  Each implementation of the compression algorithm has its nuances. If the majority of an MP3 looks like it was compressed by the iTunes implementation, but then there's a range of output iTunes would not generate (particularly if the input file is known), that's very suspect. Ditto if things like PSNR change, even subtly, for the portion where steganography is in play. Even though compressed data has a great deal of entropy, it IS significantly constrained over random data in that A) known decompression programs must return specified output from it, and B) known compression programs generated this data as output from possibly-known input data.
  If your adversary is the local police or one of your buddies, this stuff doesn't matter. If it's intelligence agencies or research organizations, good luck. Steganography is hard.
3. Re:It's all about entropy by bytesex · 2010-09-19 08:18 · Score: 4, Interesting
  
  make it compressed header-less audio. Give 'em a decoder (which will produce noise), and claim you're a scientist and this is you recording Jupiter.
  
  --
  Religion is what happens when nature strikes and groupthink goes wrong.
4. Re:It's all about entropy by v1 · 2010-09-19 09:20 · Score: 5, Insightful
  
  However, absolute random noise on a disk isn't all that usual,
  Actually, nowadays, it's extremely unusual. Blocks are all zero'd from the factory, and anything you save over them that's later marked free will almost certainly be far from random. (like pieces of pictures, documents, applications, etc)
  Really, statistically speaking, if you wanted to look on a hard drive for encrypted data, your best bet would be to go looking for blocks of high entropy data.
  The only defense against this would be if you did a random wipe of your hard drive when you bought it, and then reinstalled, and patched your OS to automatically random-wipe files before deleting or updating/moving them. But then you get into the area of "this person is obviously going to a lot of work to make it easy to hide something from us", which by itself raises an eyebrow.
  And on that note, I'm a little surprised now that I think about it, that I can't come up with a single example anywhere of a native or add-on OS feature for any OS, that does random-wipe-on-delete. OS X has "erase free space" built into disk utility, and you can find an app to do this for other OSs, but obviously zero'd blocks are not what we need to be creating. And the fact that you have to do this step manually, and it takes HOURS to run usually, is also surprising. I don't know offhand if OS X's "secure empty trash" zeros or randoms, but you're not likely to do that for EVERYTHING you throw away since it takes time, and since a lot of files get moved/deleted by the OS automatically without doing this. (end problem: anyone with a clue knows you can't hide anything in a bunch of zero'd blocks)
  
  --
  I work for the Department of Redundancy Department.
5. Re:It's all about entropy by melikamp · 2010-09-19 09:42 · Score: 5, Interesting
  
  I've been working on this very problem for a while now. An easier version, even: how to encrypt a single file in a way that makes it indistinguishable from random data? The algorithm must allow for a short password (dozens of bytes), and should be able to encrypt very large files. Optimally, an attacker may see the algorithm and may suspect correctly what the plaintext is, but should still be unable to prove that the given cyphertext is the output of the algorithm. That is, the only way to "prove" that should be by a brute-force password search, whereas finding a working password of a few dozen bytes is proof enough. This is good enough because a brute-force search over 60^30 passwords is kind of slow.
  I further simplified the problem by saying that the size of a file needs not to be hidden: it's a separate task, and a much easier one.
  I have a reason to approach the problem this way. If I have on my computer a file named "one-time-pad.bin", and it looks like a one time pad, then it must be a one time pad. The very existence of an encrypted partition should be enough to convince anyone that there is encrypted data. If a multi-sheaf algorithm is used, then there is a reasonable suspicion that there are multiple sheafs. Either way, the owner seems to be hiding something. Burying data in JPG and similar tricks are also sketchy, as it is almost certainly possible to distinguish (statistically) a benign JPG from the one steganographically altered, although this can be avoided by hiding very little data in very large files. Here, at least, there is an expensive solution.
  I can think of at least one other way to do it, here goes my original description on the internet. Say, we want to use passwords with length up to B bits and encrypt files with length up to M bits. Fix forever B random binary strings of length M each, call them N = {n_1, n_2, ... , n_M}. The set of 2^B passwords is in a bijective correspondence with the set of subsets of N, for example a password like 110101... will select the subset {n_1, n_2, n_4, n_6, ...}. Treat n_i in that subset as integers and add them. Threat the plaintext as an integer and add it to (or XOR with) the result. One can think of it as of constructing a one time pad (one of 2^B) and XORing with it. Even if the attacker knows n_i for each i, and the plaintext (without loss of generality, all zero), and the cyphertext, she still has to decompose the cyphertext as a sum of a subset of N, and even deciding whether or not it can be done is np-hard. The complexity will be exponential as long as both M and B are large, which they are in expected applications.
  The nicest feature here is that with a non-trivial password, the cyphertext will look as random as they get! It will be a sum of carefully pre-selected random numbers, padded with the plaintext.
  One obvious limitation is that each password can only be used once, since similar plaintexts will produce similar hypertexts, but that could be remedied. A bigger problem, IMHO, is that this algorithm requires B random binary strings of length M each to be built-in. Just to give you an idea, if you want to encrypt files of size up to 1 GiB with passwords of size up to 512 bits, then you need to keep around 512 GiB of pad. Either that, or be able to generate really really fast 512 random reals (random here meaning, the same every time, but completely unrelated), which is very sketchy: the reals could easily be so related that the subset sum will allow for a sub-exponential solution.
  I would be very interested to hear from anyone about this idea.
  I may have another way of solving the same hiding problem, and it has to do with a completely different, yet, IMHO, also very fascinating way of turning a short binary string into a very long and random-looking binary string in a one-way fashion. I decided that I won't implement the subset sum solution unless I am totally sure that I cannot find something more elegant, so feel free to steal my idea above and code it in.
6. Re:It's all about entropy by Kjella · 2010-09-19 09:44 · Score: 4, Insightful
  
  Well, the problem is that it doesn't really apply to compressed data. Compression schemes try packing things as efficiently as possible, so there's relatively little you can add without making it obvious the compression is tampered with. You could try embedding it as some sort of watermark into the photo/video before compression, but that too is difficult and won't hide very much. And most people don't carry tons of BMPs, WAVs and uncompressed AVIs..
  So far it seems most people agree the best way to hide encrypted data is within other encrypted data. You don't have to be super-paranoid to use encryption, my last workplace used full disk encryption and I don't think anyone can seriously accuse you of anything if you just say that "I feared by computer would get stolen, and I could be exposed to identity theft or have my family photos posted online" or something like that.
  The best solutions I have seen work like this:
  1) If you enter both your "normal" password and your "secret password" => access to the normal disk and it'll seamlessly move around any secret data as long as there is room.
  2) If you enter only your "secret" password => access to your secret data.
  3) If you're under duress, you give just the "normal" password and you get just the normal disk. Your hidden data can get overwritten since the encryption software doesn't know about it, but there's no way to prove that there is a secret container or a secret password.
  
  --
  Live today, because you never know what tomorrow brings
Re:Well by Kjella · 2010-09-19 08:00 · Score: 5, Insightful

As far as I know finding patterns in the output is tightly linked to reducing the number of possible keys, so good encryption algorithms should not create patterns. Of course if your encryption software writes some kind of header - which wouldn't affect the security of the encrypted contents - then it will be obvious to anyone looking that you have an encrypted container. So this is 99% about implementation and 1% about encryption algorithms.

--
Live today, because you never know what tomorrow brings
It depends.... by TrumpetPower! · 2010-09-19 08:00 · Score: 4, Insightful

If I give someone one file containing random data and another containing data encrypted with AES, will he be able to tell which is which?
Does the person to whom you give these two files have a rubber hose? Is he a member of the “extraordinary rendition” team?
The point of steganography is to not get caught in the first place. If you need plausible deniability, you’ve already lost.
Cheers,
b&

--
All but God can prove this sentence true.
Re:Well by bennomatic · 2010-09-19 08:10 · Score: 5, Funny

Weird. I guess I there's a bug in my ROT13 implementation. If I run my text through twice, I just get the original message.

--
The CB App. What's your 20?
Shouldn't by dachshund · 2010-09-19 08:13 · Score: 4, Informative

AES is designed to be a pseudo-random function (meaning it's evaluated against that criteria). What this means is that /when used properly/ AES encrypted data should be indistinguishable from random data, at least for a distinguisher running in bounded time. If anyone discovers an efficient algorithm that can distinguish this, it'll be a big nail in AES's coffin (and yes, at the very theoretical level I realize that there already are some known weaknesses in AES, but for the moment you're in good shape).
Re:Ignore the person holding the phone book. by parlancex · 2010-09-19 08:14 · Score: 5, Insightful

I think you're missing the point. Of course after they know that you have some encrypted data on your disk the strength of the encryption becomes moot because they can just drug / beat you until you tell them the key, but what this question is about is hiding encrypted data in unencrypted data so prying eyes can't tell if anything is even there at all.

For example, there may come a day when airport security could demand you disclose your passwords when they find you are carrying storage with encrypted content using the aforementioned techniques, but they aren't going to drug / beat every single person coming onto an airplane or going across a border. If your jpgs look like everybody elses jpgs both visually and under close analytical scrutiny they aren't going to bother you. Another example is there may come a day when any traffic on the Internet that cannot be positively identified as a common protocol with statistically "normal" contents is simply rejected. Maybe not here, maybe not right now, but this kind of idea is still very useful.
Re:Well by bytesex · 2010-09-19 08:16 · Score: 5, Insightful

It depends what you call an 'encryption algorithm'. If you mean 'DES', then no - DES is nowadays considered a weaker algorithm. If you mean 'AES-256', then still no - you need to *apply* AES-256 before it's any good, because AES is a block-cipher and will re-encrypt identical blocks of plain-text with the same key to identical blocks of ciphertext. If you mean 'AES-256 in CBC mode with random IV and SHA-256 HMAC authentication', then that's an algorithm that can be safely used. Under certain real-world circumstances.

--
Religion is what happens when nature strikes and groupthink goes wrong.
Re:Well by SeanTobin · 2010-09-19 08:31 · Score: 5, Funny

Weird. I guess I there's a bug in my ROT13 implementation. If I run my text through twice, I just get the original message.
Just do what they did with DES... use 3rot13 and you're much more secure than the original implementation.

--
Karma: SELECT `karma` FROM `users` WHERE `userid`=138474;
Unfoilable Steganography in LSB Plane of Imagery by mirkurius · 2010-09-19 08:56 · Score: 5, Interesting

Steganographic attempts are considered foiled if someone can detect that there is a secret message, they don't need to be able to retrieve the message in order for the attempt to be considered a failure. I did my Master's project on hiding data in the least significant bitplane of imagery. The trick is to "randomly" scatter your secret message throughout this plane. I showed methods that would allow you to do this so that the data was indistinguishable. You should always encrypt your secret message first so that it looks random, or better yet, shape the statistics of your encoded message to match the noise characteristics that were in the original LSB plane. If you use an image created from a very noisy source, such as a digital camera, and you encrypt the embedded message and scatter it using a reversible algorithm, and iteratively ensure that the statistics of the altered LSB plane look the same as the original LSB plane, I proved that it is not possible for someone to tell that there is a secret message hidden there. However, you need to be careful to use an original image you created yourself, and to destroy the original, because if someone ever compared the original to the one with the embedded message, they could definitely tell there was something altered by comparing the LSB planes.
Re:Ignore the person holding the phone book. by M.+Baranczak · 2010-09-19 09:36 · Score: 5, Funny

they aren't going to drug / beat every single person coming onto an airplane
If you fly US Airways, there's a $25 service charge if you want to get beaten and drugged before boarding. I remember when that shit used to be included in the base ticket price.
Re:Ignore the person holding the phone book. by Jeremi · 2010-09-19 09:54 · Score: 4, Funny

If your jpgs look like everybody elses jpgs both visually and under close analytical scrutiny they aren't going to bother you.
I've developed a fascinating algorithm for encoding hidden data by slightly modulating breast sizes, but this comment is too small to contain it.

--

I don't care if it's 90,000 hectares. That lake was not my doing.
Re:One more level... by lgw · 2010-09-19 11:40 · Score: 4, Funny

But a good defense attorney would apply the same principle to show that the prosecution's legal submissions were really steganography hiding insults to the judge's mother.

--
Socialism: a lie told by totalitarians and believed by fools.
Re:No, you ALL miss the point. by cetialphav · 2010-09-19 13:00 · Score: 5, Insightful

You tell them you just visited your cousin Jim, who had an old hard drive he didn't want anymore, and you needed a spare so he gave it to you, but not before he ran "dd if=/dev/urandom of=/dev/sda1" because he didn't want you having his old tax documents.
And now you have just fallen victim to a classic interrogation technique. They have just gotten you to tell a story that then can investigate and determine its credibility. They will talk to your cousin Jim; they will look for signs of an OS installation at the date and time you said. They then ask more follow up questions (for which they already know the true answer) to get you to dig a bigger grave for yourself. Then they show you that they know you are lying and inform you of the penalty for that crime and offer you a "deal" to tell the truth.
The fact is that when you are dealing with good interrogators, you cannot lie your way out of it. If you have a huge file full of random data, that is suspicious and there is nothing you can say to change that. The whole point of steganography is to hide the data in something innocent so that no one ever asks you anything. The goal is to blend in and give them no reason to give you a second though.
Re:iieorjoeghoiuhtr by Mitchell314 · 2010-09-19 13:07 · Score: 5, Funny

Dammit, I finally get cthulu back to sleep and some jackass wakes him up again.

--
I read TFA and all I got was this lousy cookie
Re:One more level... by Mitchell314 · 2010-09-19 13:12 · Score: 4, Interesting

It's super easy to make up a key. XOR = key.

--
I read TFA and all I got was this lousy cookie