Distinguishing Encrypted Data From Random Data?
gust5av writes "I'm working on a little script to provide very simple and easy to use steganography. I'm using bash together with cryptsetup (without LUKS), and the plausible deniability lies in writing to different parts of a container file. On decryption you specify the offset of the hidden data. Together with a dynamically expanding filesystem, this makes it possible to have an arbitrary number of hidden volumes in a file. It is implausible to reveal the encrypted data without the password, but is it possible to prove there is encrypted data where you claim there's not? If I give someone one file containing random data and another containing data encrypted with AES, will he be able to tell which is which?"
Trick question! It is random text that's been encrypted!
After a few whacks on the head with the NYC Yellow Pages (old school, print edition) I think someone could find out which file is encrypted and which is garbage.
Home of The Suki Series
Encrypted files have maximum entropy, just like absolutely random files. Basically, you can't tell which one is which. However, absolute random noise on a disk isn't all that usual, so any encrypted file (or pure random file) will stand like a sore thumb: it will be highly visible. But, again, you can't tell the difference.
cpghost at Cordula's Web.
Properly encrypted data is indistinguishable from random data. However, just the presence of random files on the system could be incriminating. Perhaps it's better to hide the data in another type of file? Perhaps using the lsb of a bitmap file?
Sometimes I doubt your committment to SparkleMotion!
As far as I know finding patterns in the output is tightly linked to reducing the number of possible keys, so good encryption algorithms should not create patterns. Of course if your encryption software writes some kind of header - which wouldn't affect the security of the encrypted contents - then it will be obvious to anyone looking that you have an encrypted container. So this is 99% about implementation and 1% about encryption algorithms.
Live today, because you never know what tomorrow brings
Does the person to whom you give these two files have a rubber hose? Is he a member of the “extraordinary rendition” team?
The point of steganography is to not get caught in the first place. If you need plausible deniability, you’ve already lost.
Cheers,
b&
All but God can prove this sentence true.
Let me guess... Random!.. No, wait, too obvious. Encrypted!
Perhaps the question is incorrect. If i have a volume with data and a volume with encrypted data, then the encrypted data can be discerned from the non-encrypted data by virtue that there will be patterns detectable in the non-encrypted volume. So technically if you have a drive and there is random data on it but no discernible patterns, then there is either encrypted data on it, or it is an empty drive. It is likely not even factory default since that it likely to have some structure imposed upon it as well. What is the point of carrying around an encrypted volume with the ability for plausible deniability if that plausible deniability requires you to have random data as a volume? The existence of random data will render your plausible deniability claim useless since, by definition, your claim is no longer plausible.
Nice. "All your base are belong". You purposely left off the last two words to give a smaller sample to review and potentially recognize patterns.
The CB App. What's your 20?
Weird. I guess I there's a bug in my ROT13 implementation. If I run my text through twice, I just get the original message.
The CB App. What's your 20?
AES is designed to be a pseudo-random function (meaning it's evaluated against that criteria). What this means is that /when used properly/ AES encrypted data should be indistinguishable from random data, at least for a distinguisher running in bounded time. If anyone discovers an efficient algorithm that can distinguish this, it'll be a big nail in AES's coffin (and yes, at the very theoretical level I realize that there already are some known weaknesses in AES, but for the moment you're in good shape).
It depends what you call an 'encryption algorithm'. If you mean 'DES', then no - DES is nowadays considered a weaker algorithm. If you mean 'AES-256', then still no - you need to *apply* AES-256 before it's any good, because AES is a block-cipher and will re-encrypt identical blocks of plain-text with the same key to identical blocks of ciphertext. If you mean 'AES-256 in CBC mode with random IV and SHA-256 HMAC authentication', then that's an algorithm that can be safely used. Under certain real-world circumstances.
Religion is what happens when nature strikes and groupthink goes wrong.
Perhaps. But if you use cryptsetup with LUKS, there is a readable header for the encrypted file, you don't need the key to determine encryption has been used. In fact, you can set multiple passphrases that have the authority to decrypt the partition.
GPG Encrypted data is also distinguishable, regardless of whether you use ASCII armoring or binary .GPG files.
There are headers in the encrypted output that can be recognized without having the key to decrypt anything.
Now if you run 'openssl' from the command line, and choose 'aes-256-cbc', supply a true random key, and enter data bits interspersed with random 'padding bits'. It will be probably impossible for anyone to determine from the output whether there are any data bits or not, without knowing the key.
Hard to say from your question, but if you haven't done already, get yourself some crypto knowledge. Crypto is hard, there is a reason that you are laughed out of the room if you say you've invented a new crypto algorithm and you don't already have strong credentials.
Randomness is one of the harder computer problems. Especially in steganography, many implementations have been defeated by creating not enough or too much randomness. If you want to hide your message in something, it doesn't matter if your output is distinguishable from randomness, it matters if it is distinguishable from what should be there. Simple approaches like LSB tricks have often fallen because those happen to be not random in many input data.
Assorted stuff I do sometimes: Lemuria.org
Weird. I guess I there's a bug in my ROT13 implementation. If I run my text through twice, I just get the original message.
Just do what they did with DES... use 3rot13 and you're much more secure than the original implementation.
Karma: SELECT `karma` FROM `users` WHERE `userid`=138474;
If you use AES in ECB mode, then the answer is that it's usually painfully obvious that the original data was structured.
If you do use chaining (CBC, or something similar), then it will look quite random.
Excellent example here: http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Electronic_codebook_.28ECB.29
Am I part of the core demographic for Swedish Fish?
You cannot distinguish between the two.
This is categorically not true, unless the key is as long or longer than the data file (and never used again). There is indeed an attack vector against any encrypted data file if the key length is small by comparison. Statistical analysis plus the slightest idea of what type of data is being encrypted is more than adequate to mount a successful attack (given sufficient computational resources) unless the key is _much_ longer than what is typical today. The lack of computational resources is the only thing that keeps typical encrypted data secure.
Looks Welsh...
Steganographic attempts are considered foiled if someone can detect that there is a secret message, they don't need to be able to retrieve the message in order for the attempt to be considered a failure. I did my Master's project on hiding data in the least significant bitplane of imagery. The trick is to "randomly" scatter your secret message throughout this plane. I showed methods that would allow you to do this so that the data was indistinguishable. You should always encrypt your secret message first so that it looks random, or better yet, shape the statistics of your encoded message to match the noise characteristics that were in the original LSB plane. If you use an image created from a very noisy source, such as a digital camera, and you encrypt the embedded message and scatter it using a reversible algorithm, and iteratively ensure that the statistics of the altered LSB plane look the same as the original LSB plane, I proved that it is not possible for someone to tell that there is a secret message hidden there. However, you need to be careful to use an original image you created yourself, and to destroy the original, because if someone ever compared the original to the one with the embedded message, they could definitely tell there was something altered by comparing the LSB planes.
All the investigators need to do is run some fake but seemingly complex program that looks at the file under inspection and says "yes, stenography in use". Then the full weight of the law comes down, because now the suspect has to prove the negative - impossible of course.
So actually what is needed is a suspect's right that investigators prove any assertion that files have been hidden if that assertion/analysis is used as evidence in court.
If you find a file on my hard drive with data you can't readily decode, is it:
A) Compressed with an unknown compressor
B) Encrypted with an unknown encryptor
C) Random bytes used for an encryption process
D) Random bytes used for something else
I can't prove that answer D is wrong... but I don't have to because I know that 99% of the time, it's one of the other answers.
If you want to hide your data, the file must ostensibly have some other purpose... something that isn't obviously a lie. That's what steganography is about. For example, you might download as much of the 1 meter-resolution Google Maps satellite image as fits on your hard disk, save it uncompressed and then store encrypted data in the low-order bit of each byte (3 bytes to a pixel). Coupled with a map application that can display the imagery, it would appear to be one thing (a map) while really being another (a container for encrypted information).
At that point, unless you capture the encryption software it becomes hard to suspect that there is encrypted data, let alone prove it.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
I don't work for any 3-letter agency and even I could easily get the information needed.
With the right tools.
Mit der Dummheit kämpfen Götter selbst vergebens
Exactly. If you do not already know the answer to this question, there is no way on earth you will write a program that is at all secure.
Back to the books and study.
Presumably a simple XOR would make them be able to come up with that sentence... hell, any sentence thinkable in the world! "Look, if we apply these bytes, the secret message says [...]!"
What time is it/will be over there? Check with my iPhone app!
Neither. It's readily visible the way you just mashed your keyboard, in a rather nonrandom fashion. Dividing the left handed keystrokes from the right handed ones, you get: erg ergerg erergerg greg erererg and jpoijpoij hoihoiuh nnuhoihh poiuhiuhoihh hhoiuhih The 'erg' pattern is near universal with slight variations, and the combination of poiujh (in that order), usually missing one or two of the letters, describes well the vast mojority of the keystrokes with your right hand.
"It is a good thing for an uneducated man to read books of quotations..." -Winston Churchill
Rubberhose (Pronounced Marutukku) is transparently deniable encryption, developed by (among others) Julian Assange.
This seems to do exactly what you're trying to do, so even if you want to go ahead and implement it yourself from scratch, it's worth reading up on what they've done to get some ideas and avoid some potential pitfalls.
Specialist Mac support for creative pros, Melbourne
Looks more Qwghlmian to me...
No, you ALL miss the point. How are you going to explain having a HDD or partition full of "garbage"? Nobody with half a brain will believe you there's nothing encrypted in the noise.
(Yeah, an entropy file would be easy to explain, but entropy files usually don't come in sizes big enough to hide data in, PLUS, who apart from us here understands what an entropy file is? A judge sure doesn't.)
Steganography, OTOH, would be very useful. I have around 50 GB of family photos on my machine, that would make for a nice data storage.
Who is General Failure and why is he reading my hard disk?
DES is nowadays considered a weaker algorithm
DES is considered too weak for many uses due to its small key size.
Nonetheless, if you can find a way to reliably distinguish DES output from random bits, without knowledge of the key and with remotely-practical efficiency, you can publish a paper that will gain you substantial name recognition among the world's cryptographic elite.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
There is no file with random data.
Why would one want to create a file with random data?
The point of creating a file is that it ain't random data what you are saving, otherwise you wouldn't save it.
If you find a file on my hard drive with data you can't readily decode, is it:
A) Compressed with an unknown compressor
B) Encrypted with an unknown encryptor
C) Random bytes used for an encryption process
D) Random bytes used for something else
I can't prove that answer D is wrong... but I don't have to because I know that 99% of the time, it's one of the other answers.....
OK, let's, as a community, add an (E). Everyone create a file on your laptop, in your home directory, named random.bin, as follows:
dd if=/dev/urandom of=random.bin bs=4096 count=10000
The actual value of the count isn't important, as long as it is large enough to create lots of random bits. If lots of people do this, we have “(E) Random bytes because Slashdot told me to”, providing plausible deniability for anyone who needs to use that file to encrypt something important.
Nonetheless, if you can find a way to reliably distinguish DES output from random bits, without knowledge of the key and with remotely-practical efficiency, you can publish a paper that will gain you substantial name recognition among the world's cryptographic elite.
If you have enough DES output you can do this. Someone already mentioned that if you use a strong cipher, even AES-256 in ECB (electronic codebook mode) then the output is nearly trivially distinguishable because repeated plaintext patterns of the size of the block the block cipher algorithm uses will encrypt to identical ciphertexts.
Even if you use CTR mode or CBC mode, patterns in the plaintext show up in the output if you encrypt enough data.
For example, if you by chance end up with the same ciphertext output block in CBC mode you can obtain the XOR of the corresponding plaintext blocks by XORing the two immediately preceeding ciphertext blocks. If you encrypt enough blocks, the laws of statistics favor two blocks ending up being identical by chance. And this XOR equality allows you to determine if the data is encrypted because it's generally relatively easy to tell if the XOR is the result of XORing two pieces of unecrypted data.
CTR mode has a different sort of relationship that can be exploited. You know that the XOR of any two ciphertext blocks is not equal to the XOR of the corresponding two plaintext blocks. This can eventually leak information about the plaintext with enough blocks to work with. But using the inequality to determine if the bunch of data is truly random or encrypted data takes a lot fewer blocks.
And, of course, if your blocks are larger you need many more of them for the statistics to work out in your favor for attacking the cipher. This is why block ciphers should be re-keyed periodically when encrypting a lot of data. It's also why it's much easier to write a distinguisher for DES (64-bit blocks) than AES (128-bit blocks) that distinguishes between encrypted and random data.
Need a Python, C++, Unix, Linux develop
But a good defense attorney would apply the same principle to show that the prosecution's legal submissions were really steganography hiding insults to the judge's mother.
Socialism: a lie told by totalitarians and believed by fools.
What happens if you use the old "torn sheet of paper" routine?
Each drive or device moving from A to B goes with a different courier/ISP/method and no "piece" contains enough information to be identifiable or usable.
All the pieces need to arrive at the destination to be able to be re-constructed back into usable form.
Any time you send a complete message in one burp, one hard drive or one CD or one image, there is a chance for decryption by any number of accidents or threat of death to all your family members one person at a time while you watch.
No encryption was used in the creation of this message...thus I have deniability.
Dammit, I finally get cthulu back to sleep and some jackass wakes him up again.
I read TFA and all I got was this lousy cookie
It's super easy to make up a key. XOR = key.
I read TFA and all I got was this lousy cookie