Distinguishing Encrypted Data From Random Data?
gust5av writes "I'm working on a little script to provide very simple and easy to use steganography. I'm using bash together with cryptsetup (without LUKS), and the plausible deniability lies in writing to different parts of a container file. On decryption you specify the offset of the hidden data. Together with a dynamically expanding filesystem, this makes it possible to have an arbitrary number of hidden volumes in a file. It is implausible to reveal the encrypted data without the password, but is it possible to prove there is encrypted data where you claim there's not? If I give someone one file containing random data and another containing data encrypted with AES, will he be able to tell which is which?"
make it compressed header-less audio. Give 'em a decoder (which will produce noise), and claim you're a scientist and this is you recording Jupiter.
Religion is what happens when nature strikes and groupthink goes wrong.
Steganographic attempts are considered foiled if someone can detect that there is a secret message, they don't need to be able to retrieve the message in order for the attempt to be considered a failure. I did my Master's project on hiding data in the least significant bitplane of imagery. The trick is to "randomly" scatter your secret message throughout this plane. I showed methods that would allow you to do this so that the data was indistinguishable. You should always encrypt your secret message first so that it looks random, or better yet, shape the statistics of your encoded message to match the noise characteristics that were in the original LSB plane. If you use an image created from a very noisy source, such as a digital camera, and you encrypt the embedded message and scatter it using a reversible algorithm, and iteratively ensure that the statistics of the altered LSB plane look the same as the original LSB plane, I proved that it is not possible for someone to tell that there is a secret message hidden there. However, you need to be careful to use an original image you created yourself, and to destroy the original, because if someone ever compared the original to the one with the embedded message, they could definitely tell there was something altered by comparing the LSB planes.
All the investigators need to do is run some fake but seemingly complex program that looks at the file under inspection and says "yes, stenography in use". Then the full weight of the law comes down, because now the suspect has to prove the negative - impossible of course.
So actually what is needed is a suspect's right that investigators prove any assertion that files have been hidden if that assertion/analysis is used as evidence in court.
If you find a file on my hard drive with data you can't readily decode, is it:
A) Compressed with an unknown compressor
B) Encrypted with an unknown encryptor
C) Random bytes used for an encryption process
D) Random bytes used for something else
I can't prove that answer D is wrong... but I don't have to because I know that 99% of the time, it's one of the other answers.
If you want to hide your data, the file must ostensibly have some other purpose... something that isn't obviously a lie. That's what steganography is about. For example, you might download as much of the 1 meter-resolution Google Maps satellite image as fits on your hard disk, save it uncompressed and then store encrypted data in the low-order bit of each byte (3 bytes to a pixel). Coupled with a map application that can display the imagery, it would appear to be one thing (a map) while really being another (a container for encrypted information).
At that point, unless you capture the encryption software it becomes hard to suspect that there is encrypted data, let alone prove it.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
I've been working on this very problem for a while now. An easier version, even: how to encrypt a single file in a way that makes it indistinguishable from random data? The algorithm must allow for a short password (dozens of bytes), and should be able to encrypt very large files. Optimally, an attacker may see the algorithm and may suspect correctly what the plaintext is, but should still be unable to prove that the given cyphertext is the output of the algorithm. That is, the only way to "prove" that should be by a brute-force password search, whereas finding a working password of a few dozen bytes is proof enough. This is good enough because a brute-force search over 60^30 passwords is kind of slow.
I further simplified the problem by saying that the size of a file needs not to be hidden: it's a separate task, and a much easier one.
I have a reason to approach the problem this way. If I have on my computer a file named "one-time-pad.bin", and it looks like a one time pad, then it must be a one time pad. The very existence of an encrypted partition should be enough to convince anyone that there is encrypted data. If a multi-sheaf algorithm is used, then there is a reasonable suspicion that there are multiple sheafs. Either way, the owner seems to be hiding something. Burying data in JPG and similar tricks are also sketchy, as it is almost certainly possible to distinguish (statistically) a benign JPG from the one steganographically altered, although this can be avoided by hiding very little data in very large files. Here, at least, there is an expensive solution.
I can think of at least one other way to do it, here goes my original description on the internet. Say, we want to use passwords with length up to B bits and encrypt files with length up to M bits. Fix forever B random binary strings of length M each, call them N = {n_1, n_2, ... , n_M}. The set of 2^B passwords is in a bijective correspondence with the set of subsets of N, for example a password like 110101... will select the subset {n_1, n_2, n_4, n_6, ...}. Treat n_i in that subset as integers and add them. Threat the plaintext as an integer and add it to (or XOR with) the result. One can think of it as of constructing a one time pad (one of 2^B) and XORing with it. Even if the attacker knows n_i for each i, and the plaintext (without loss of generality, all zero), and the cyphertext, she still has to decompose the cyphertext as a sum of a subset of N, and even deciding whether or not it can be done is np-hard. The complexity will be exponential as long as both M and B are large, which they are in expected applications.
The nicest feature here is that with a non-trivial password, the cyphertext will look as random as they get! It will be a sum of carefully pre-selected random numbers, padded with the plaintext.
One obvious limitation is that each password can only be used once, since similar plaintexts will produce similar hypertexts, but that could be remedied. A bigger problem, IMHO, is that this algorithm requires B random binary strings of length M each to be built-in. Just to give you an idea, if you want to encrypt files of size up to 1 GiB with passwords of size up to 512 bits, then you need to keep around 512 GiB of pad. Either that, or be able to generate really really fast 512 random reals (random here meaning, the same every time, but completely unrelated), which is very sketchy: the reals could easily be so related that the subset sum will allow for a sub-exponential solution.
I would be very interested to hear from anyone about this idea.
I may have another way of solving the same hiding problem, and it has to do with a completely different, yet, IMHO, also very fascinating way of turning a short binary string into a very long and random-looking binary string in a one-way fashion. I decided that I won't implement the subset sum solution unless I am totally sure that I cannot find something more elegant, so feel free to steal my idea above and code it in.
If you find a file on my hard drive with data you can't readily decode, is it:
A) Compressed with an unknown compressor
B) Encrypted with an unknown encryptor
C) Random bytes used for an encryption process
D) Random bytes used for something else
I can't prove that answer D is wrong... but I don't have to because I know that 99% of the time, it's one of the other answers.....
OK, let's, as a community, add an (E). Everyone create a file on your laptop, in your home directory, named random.bin, as follows:
dd if=/dev/urandom of=random.bin bs=4096 count=10000
The actual value of the count isn't important, as long as it is large enough to create lots of random bits. If lots of people do this, we have “(E) Random bytes because Slashdot told me to”, providing plausible deniability for anyone who needs to use that file to encrypt something important.
When they discover you aren't a scientist
The GP should record some LP, at 24 bits per sample, at 96 kSa/s, in stereo. It wouldn't be too unusual, especially if he picks a well known music. Classical music will be particularly good here. A typical opera, in .WAV, will be about 4 GB, and there will be at least 8 lower bits that are yours to play with (they are noise from the turntable.)
It's super easy to make up a key. XOR = key.
I read TFA and all I got was this lousy cookie