Distinguishing Encrypted Data From Random Data?
gust5av writes "I'm working on a little script to provide very simple and easy to use steganography. I'm using bash together with cryptsetup (without LUKS), and the plausible deniability lies in writing to different parts of a container file. On decryption you specify the offset of the hidden data. Together with a dynamically expanding filesystem, this makes it possible to have an arbitrary number of hidden volumes in a file. It is implausible to reveal the encrypted data without the password, but is it possible to prove there is encrypted data where you claim there's not? If I give someone one file containing random data and another containing data encrypted with AES, will he be able to tell which is which?"
erjpgoijpoij erghoiehrgoiuh ernnerughoiehrgh poiuhgriuhpoihegh erherhoiuerhgih.
Was that random or encrypted?
If he works for the NSA
After a few whacks on the head with the NYC Yellow Pages (old school, print edition) I think someone could find out which file is encrypted and which is garbage.
Home of The Suki Series
Encrypted files have maximum entropy, just like absolutely random files. Basically, you can't tell which one is which. However, absolute random noise on a disk isn't all that usual, so any encrypted file (or pure random file) will stand like a sore thumb: it will be highly visible. But, again, you can't tell the difference.
cpghost at Cordula's Web.
Properly encrypted data is indistinguishable from random data. However, just the presence of random files on the system could be incriminating. Perhaps it's better to hide the data in another type of file? Perhaps using the lsb of a bitmap file?
Sometimes I doubt your committment to SparkleMotion!
As far as I know finding patterns in the output is tightly linked to reducing the number of possible keys, so good encryption algorithms should not create patterns. Of course if your encryption software writes some kind of header - which wouldn't affect the security of the encrypted contents - then it will be obvious to anyone looking that you have an encrypted container. So this is 99% about implementation and 1% about encryption algorithms.
Live today, because you never know what tomorrow brings
you're not the best one to write this kind of software if you don't know the answer. start here:
http://www.amazon.com/Applied-Cryptography-Protocols-Algorithms-Source/dp/0471117099
I think this is the software version of the ADE 651 http://en.wikipedia.org/wiki/ADE_65.
Crunch, Crunch, beep, bop, boop.... yup, our software says that is encrypted data right there. Now give us the decryption key.
What, you can't? Looks like you're just trying to hide terrorist information from us.
Does the person to whom you give these two files have a rubber hose? Is he a member of the “extraordinary rendition” team?
The point of steganography is to not get caught in the first place. If you need plausible deniability, you’ve already lost.
Cheers,
b&
All but God can prove this sentence true.
Let me guess... Random!.. No, wait, too obvious. Encrypted!
No. You cannot distinguish between the two. If you could, you would have an attack vector against the encryption. The trick, once you have a key, is to have authentication-strings in your data structure, so you can see whether the key you used is actually correct, and the decrypted data is actually useful. An attack based on this authentication string, is one of the many, many possible attack vectors against encryption. Also, 'random' is not always very 'random'. In the world of cryptography, we need serious random. C'mon dude, how far are you with this ?
Religion is what happens when nature strikes and groupthink goes wrong.
Perhaps the question is incorrect. If i have a volume with data and a volume with encrypted data, then the encrypted data can be discerned from the non-encrypted data by virtue that there will be patterns detectable in the non-encrypted volume. So technically if you have a drive and there is random data on it but no discernible patterns, then there is either encrypted data on it, or it is an empty drive. It is likely not even factory default since that it likely to have some structure imposed upon it as well. What is the point of carrying around an encrypted volume with the ability for plausible deniability if that plausible deniability requires you to have random data as a volume? The existence of random data will render your plausible deniability claim useless since, by definition, your claim is no longer plausible.
If everything is perfect? No.
If you have two plaintext blocks, and encrypt using ECB mode with the same IV though, then the two cipher blocks would turn out identical cipher blocks. Would make it trivial to see which was the encrypted one.
So basically the answer is; it depends.
If you need to ask the question, do more research before you continue your work. This is stuff you really should understand before you embark on such a project.
Terje Elde
LUKS has a header. When I run cryptsetup on an uninitialized volume /dev/sdb1 foo /dev/sdb1 is not a valid LUKS device.
# cryptsetup luksOpen
Device
That means it can tell if the header is valid without a password. So now every offset needs valid LUKS header. Once you've done that, just make a bunch of perfectly valid encrypted volumes. Put real data in them. Install a working operating system that looks used.
I was always baffled by this. If you run your encryption algorithm on the encrypted data again(n times). wouldn't produce something indistinguishable at then end from random data?
Do you think that 'I'm just sending random garbage' is going to offer any kind of plausible denyability in any situation where you could expose yourself to prosecution simply for using encryption?
Unless you're using a properly constructed one time pad, which the poster is not, encrypted data is distinguishable from random. The more of it you have, the more distinct it is. With a good encryption algorithm it's not easy, and investigators would probably use other techniques, but it is possible.
Weird. I guess I there's a bug in my ROT13 implementation. If I run my text through twice, I just get the original message.
The CB App. What's your 20?
The files on employees computers containing seemingly random data can be assumed to be just that unless they're driving brand new Porsches and have vacation condos in Whistler.
Have gnu, will travel.
AES is designed to be a pseudo-random function (meaning it's evaluated against that criteria). What this means is that /when used properly/ AES encrypted data should be indistinguishable from random data, at least for a distinguisher running in bounded time. If anyone discovers an efficient algorithm that can distinguish this, it'll be a big nail in AES's coffin (and yes, at the very theoretical level I realize that there already are some known weaknesses in AES, but for the moment you're in good shape).
It depends what you call an 'encryption algorithm'. If you mean 'DES', then no - DES is nowadays considered a weaker algorithm. If you mean 'AES-256', then still no - you need to *apply* AES-256 before it's any good, because AES is a block-cipher and will re-encrypt identical blocks of plain-text with the same key to identical blocks of ciphertext. If you mean 'AES-256 in CBC mode with random IV and SHA-256 HMAC authentication', then that's an algorithm that can be safely used. Under certain real-world circumstances.
Religion is what happens when nature strikes and groupthink goes wrong.
Perhaps. But if you use cryptsetup with LUKS, there is a readable header for the encrypted file, you don't need the key to determine encryption has been used. In fact, you can set multiple passphrases that have the authority to decrypt the partition.
GPG Encrypted data is also distinguishable, regardless of whether you use ASCII armoring or binary .GPG files.
There are headers in the encrypted output that can be recognized without having the key to decrypt anything.
Now if you run 'openssl' from the command line, and choose 'aes-256-cbc', supply a true random key, and enter data bits interspersed with random 'padding bits'. It will be probably impossible for anyone to determine from the output whether there are any data bits or not, without knowing the key.
Maybe you should design it so the encrypted data has some patterns in it (ie. interleaved with the ciphertext)
No sig today...
Data encrypted with a one-time pad looks completely random. Provably. Indeed, if you were given the pad and the encrypted data, you'd not be able to say which is which.
The Tao of math: The numbers you can count are not the real numbers.
Hard to say from your question, but if you haven't done already, get yourself some crypto knowledge. Crypto is hard, there is a reason that you are laughed out of the room if you say you've invented a new crypto algorithm and you don't already have strong credentials.
Randomness is one of the harder computer problems. Especially in steganography, many implementations have been defeated by creating not enough or too much randomness. If you want to hide your message in something, it doesn't matter if your output is distinguishable from randomness, it matters if it is distinguishable from what should be there. Simple approaches like LSB tricks have often fallen because those happen to be not random in many input data.
Assorted stuff I do sometimes: Lemuria.org
Weird. I guess I there's a bug in my ROT13 implementation. If I run my text through twice, I just get the original message.
Just do what they did with DES... use 3rot13 and you're much more secure than the original implementation.
Karma: SELECT `karma` FROM `users` WHERE `userid`=138474;
If you use AES in ECB mode, then the answer is that it's usually painfully obvious that the original data was structured.
If you do use chaining (CBC, or something similar), then it will look quite random.
Excellent example here: http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Electronic_codebook_.28ECB.29
Am I part of the core demographic for Swedish Fish?
Math.
Be sure to use some math and it'll all be good
But what if my giant porn collection is exactly the information I'm trying to hide? ;-)
The Tao of math: The numbers you can count are not the real numbers.
what you want is plausible deniability and that is not easy to achieve, as some states have started to have laws allowing to hold you hostage if you do not provide an decryption key to an encrypted container (which, with your method, would be corrupted). Have a look as TrueCrypts technical details behind their plausible deniability feature: http://www.truecrypt.org/docs/?s=plausible-deniability
Arthur Conan Doyle wrote that the best place to hide something was right under the nose of the person looking for it - if someone spots an encrypted file, no matter how well hidden, it says you have something to hide. Instead you could simply print out your secret file in the form of a glossy magazine, something no one in the security world would really be interested in flicking through. Gardeners World or Philosophy Today. And then just leave it lying around, literally. It will be the last place they look. And the last thing they will confiscate.
You cannot distinguish between the two.
This is categorically not true, unless the key is as long or longer than the data file (and never used again). There is indeed an attack vector against any encrypted data file if the key length is small by comparison. Statistical analysis plus the slightest idea of what type of data is being encrypted is more than adequate to mount a successful attack (given sufficient computational resources) unless the key is _much_ longer than what is typical today. The lack of computational resources is the only thing that keeps typical encrypted data secure.
While the data within the encrypted volume should be indistinguishable from randomness, the metadata headers are quite distinguishable. It's pretty obvious if something is a LUKS volume, but within that you shouldn't be able to tell.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
Steganographic attempts are considered foiled if someone can detect that there is a secret message, they don't need to be able to retrieve the message in order for the attempt to be considered a failure. I did my Master's project on hiding data in the least significant bitplane of imagery. The trick is to "randomly" scatter your secret message throughout this plane. I showed methods that would allow you to do this so that the data was indistinguishable. You should always encrypt your secret message first so that it looks random, or better yet, shape the statistics of your encoded message to match the noise characteristics that were in the original LSB plane. If you use an image created from a very noisy source, such as a digital camera, and you encrypt the embedded message and scatter it using a reversible algorithm, and iteratively ensure that the statistics of the altered LSB plane look the same as the original LSB plane, I proved that it is not possible for someone to tell that there is a secret message hidden there. However, you need to be careful to use an original image you created yourself, and to destroy the original, because if someone ever compared the original to the one with the embedded message, they could definitely tell there was something altered by comparing the LSB planes.
I thought the point was there is one encrypted file and one random file.
With stenography, one is encrypted and the other normal data. But both are usable - say a JPG image, where both load. So who is to say one has encrypted data, and they weren't just compressed using different settings? The question is, could anyone tell that (though with any given file format it seems like sophisticated enough analysis might be able to tell you that).
"There is more worth loving than we have strength to love." - Brian Jay Stanley
What about some sort of web service where you can download tidbits of random data to overwrite unused blocks on your drive ? Doesn't even have to be random all the time; it could be pieces of non copyrighted material for instance. So if anyone ask why you have random data, then you can say 'oh, thats just some stuff from the internet i downloaded'. That would be better than leaving your current unused blocks with pieces of your old deleted data too. If this became common, completely random data wold be unsuspicious. And on top of that it could be used as part of a disk-checking software to test your disk much like memcheck. /2cent
Can I light a sig ?
It is my understanding that if you start with a random one-time pad of 1s and 0s, it should be impossible to determine whether or not you have XORed a file of any entropy against it. If any background communication has surplus bits that are random 1s and 0s, that is your ideal stenography sub-channel. So any a test for encrypted data is really a test for a channel of random bits being transmitted.
The other half of the question is whether a test for random data is sufficient to detect a stenography channel? The smallest encrypted message that can be sent is a single bit - basically just a flare going up. And from a single bit, you can grow over time that channel to any arbitrary length. So that means your algorithm would be required to detect the existence of any pattern of random (unexplained) bits in a file or communication channel that the receiver could be flagging to read as a string of bits.
That's as far as I get ... I think that it probably reduces to being able to put a statistical upper bound on the size of a file that could be hidden given the number of random/unexplained bits in a data stream, but I don't think you could "prove" stenography wasn't happening.
( standard disclaimer: I am not a $profession )
Unless you're Bruce (Schneier).
It seems rubberhose is dead, but look at it and especially the fundamental ideas in it if you really wish to pursue this (I like the idea of having N encrypted volumes and the fact that you cannot prove that you have fully co-operated [and they cannot prove that you're not], of course you need some interesting data on the "bait" volumes as well).
The problem with properly used encryption being indistinguishable from random data is that you need a lot of good quality random data to hide your encrypted data in, because it will be distinguishable from the not-so-random data that you get out of /dev/urandom.
If you are in a situation where you will actually need encryption (especially deniable the sort) then don't trust your own code. As they say: A lawyer who represents himself has a fool for a client. (Don't trust someone elses code either unless it has been actually reviewed by more than two people who actually know how to do cryptoanalysis)
A lot of people have been beating about the bush. Put simply:
(1) Well-encrypted data, by itself (i.e., without any other container or header) is pretty much indistinguishable from random data or "noise".
(2) If you are actually planning to use steganography (hiding the resulting data by modifying another file, like say a photo image), you should be aware that sloppy steganography can often be detected, even if the "hidden" data is random. It depends a lot on the type of file in which you are hiding the data, and the method of encoding.
You should also be aware that effective steganography requires files that are very much larger than the actual data you are trying to hide.
The real advantage of steganography is misdirection: the resulting file appears innocuous. Don't count on it hiding your data well, all by itself.
All the investigators need to do is run some fake but seemingly complex program that looks at the file under inspection and says "yes, stenography in use". Then the full weight of the law comes down, because now the suspect has to prove the negative - impossible of course.
So actually what is needed is a suspect's right that investigators prove any assertion that files have been hidden if that assertion/analysis is used as evidence in court.
If you find a file on my hard drive with data you can't readily decode, is it:
A) Compressed with an unknown compressor
B) Encrypted with an unknown encryptor
C) Random bytes used for an encryption process
D) Random bytes used for something else
I can't prove that answer D is wrong... but I don't have to because I know that 99% of the time, it's one of the other answers.
If you want to hide your data, the file must ostensibly have some other purpose... something that isn't obviously a lie. That's what steganography is about. For example, you might download as much of the 1 meter-resolution Google Maps satellite image as fits on your hard disk, save it uncompressed and then store encrypted data in the low-order bit of each byte (3 bytes to a pixel). Coupled with a map application that can display the imagery, it would appear to be one thing (a map) while really being another (a container for encrypted information).
At that point, unless you capture the encryption software it becomes hard to suspect that there is encrypted data, let alone prove it.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
Imagine if investigators simply state that their analysis, not disclosable under state secrets privilege, shows hidden text saying "We'll bomb the Eiffel Tower on Thursday". The suspect is now stuffed with no defense.
I don't work for any 3-letter agency and even I could easily get the information needed.
With the right tools.
Mit der Dummheit kämpfen Götter selbst vergebens
LUKS can be easily detected.
The specifications for the on-disk format are published online.
http://code.google.com/p/cryptsetup/wiki/Specification.
What I would recommend and personally employ... First, fill the disk with a random background: /dev/sdz /dev/mapper/mapper1
# cryptsetup --cipher=aes-xts-essiv:sha256 -s 256 --key-file=/dev/random create mapper1
# cryptsetup --cipher=twofish-xts-essiv:sha256 -s 256 --key-file=/dev/random create mapper2
# dd if=/dev/zero of=/dev/mapper/mapper2 bs=512
Don't bother creating a partition table or anything else. Leave the entire disk full of this background data. /dev/sdz /dev/mapper/encrypted
Then create an encrypted volume using a hash for key material and offset and skip sector counts from the hash string:
# echo "secret_password@drive_serial_number" | sha512sum
4839 eeac 06 a 2045 d 60 6dbf519ba5e9[...]e312009896441a5
# cryptsetup --cipher=twofish-xts-essiv:sha256 -s 256 -o 483906 -p 204560 create encrypted
Password:
# pvcreate
# vgcreate
# lvcreate
If questioned I would respond with nothing, no words, and just chill there.
Ok, I think this guy's question has been answered (almost before he asked it, honestly). Uhhh, no, the random data and encrypted data will be indistinguishable, however, random data does not occur without human intervention (intent). Meaning, if your rouse of carrying random data to confuse and supply deniability is your plan, think of another plan.
My question is, what are you trying to hide? You don't have to be overly specific, but it helps to hide similar data within a similar structure. Steganography works well for hiding pictures within other pictures, sound within other sound, etc. Why? Because even the encrypted data looks like errors in the image file that might just be due to a corrupted file. If you're thinking of hiding entire filesystem contents within a single image then you're about as dumb as a post and will be hiding nothing from anyone who knows what they are doing, let alone what they are looking for.
So, think about what you're hiding and why. If you're hiding financial or other personal information and the like, why steganography? Just use TrueCrypt to lock it down. If you're transmitting the data, encrypt it as a file, encrypt the connection, and physically send the key with a carrier that uses a tracking number. If you're doing something legally questionable, think again. You just outed your intentions on a global public forum and no amount of random data will save you from prosecution.
Exactly. If you do not already know the answer to this question, there is no way on earth you will write a program that is at all secure.
Back to the books and study.
If you're smart the feds or whoever can only look at the in between. So if your entire com between safehouse -> hq is say ipsec tunnel. The feds are left with simply a frequency factor. That is to say... they cant decrypt coms between alleged talban safehouse and hq... but typically they have only 1-2 connections a day... but suddenly there's 10 on a day. Something's going to happen soon! However if you randomly pipe /dev/urandom for no reason at all. Then they dont even have that.
How do you tell the difference between encrypted random data and encrypted data? pretty unlikely.
Actually for something emulating a block device you would not use a CBC mode since you need to decrypt the entire chain from the beginning to get to a piece of data. Usually you use some kind of block offset mixed into the IV so the same data stored in block 12543 and block 46424 look different. But yes I agree that post was inaccurate, if you use ECB mode then there's plenty patterns.
Live today, because you never know what tomorrow brings
However, no matter how well your secret files are hidden, there's one thing which may still reveal you: The very existence of the steganography program on your hard disk. Therefore your steganography executable should be hidden, too. One possible way to hide it in would be to have the same executable do something completely different, unless you give a special, secret option to turn it into the steganography program.
The Tao of math: The numbers you can count are not the real numbers.
's obvious. Start a conspiracy investigation to hide your porn in.
Rubberhose (Pronounced Marutukku) is transparently deniable encryption, developed by (among others) Julian Assange.
This seems to do exactly what you're trying to do, so even if you want to go ahead and implement it yourself from scratch, it's worth reading up on what they've done to get some ideas and avoid some potential pitfalls.
Specialist Mac support for creative pros, Melbourne
No, you ALL miss the point. How are you going to explain having a HDD or partition full of "garbage"? Nobody with half a brain will believe you there's nothing encrypted in the noise.
(Yeah, an entropy file would be easy to explain, but entropy files usually don't come in sizes big enough to hide data in, PLUS, who apart from us here understands what an entropy file is? A judge sure doesn't.)
Steganography, OTOH, would be very useful. I have around 50 GB of family photos on my machine, that would make for a nice data storage.
Who is General Failure and why is he reading my hard disk?
Please remember that encrypting data isn't only to protect it from authoritarian governments. It isn't even to protect it from governments at all. There are plenty of other reasons to encrypt your data. We encrypt things to work to keep it safe and private, but not from the government. If they show up with a subpoena, I'll decrypt whatever they need. However they aren't the ones we are worried about. We are worried about hackers and the like. I'm not very worried about a hacker, even a determined one, trying to come and hit me on the head. Crossing in to the physical exposes you to a lot of risks. While I personally might not be a threat to them, the campus police and city SWAT would be, and they'd come when called for that.
In regards to that, there are plenty of legit questions, like can they identify encrypted data, or an encrypted stream or not? Lets say we toss some HDDs. Some are DBAN'd. Others got missed, but are encrypted. Can they tell which is which? If so they can focus their efforts. If not, they will waste a lot of time on dead ends.
It is a legit question, even ignoring governments.
In principle, it should be impossible to tell the difference between encrypted data and a random oracle. In practice, many encryption algorithms leave tell-tale signs. (Suggest taking a look at the 2DEM encryption mode paper on the NIST website for an example.) One-Time Pads, if the encryption key is truly random, are guaranteed to yield data that is indistinguishable from random, but that is the only time you have a guarantee of that.
Modern cryptography is divided over the issue, but my understanding is that if you heavily compress data first and then encrypt it, you will get something much closer to the "ideal" of appearing random.
In steganography, the problem is slightly different. Image data isn't truly random. You can analyze the level of randomness and see if that level is within the bounds you would expect to find in an image. Your problem, then, is not to produce something that looks like a random oracle, but rather to produce something that looks like a natural oracle. I would imagine you could space the encrypted data out a bit and inject garbage that artificially kept the level of randomness within the bounds you would expect to find in an image. Do a bit of interpolation, come up with something that would make some sort of sense if it were all natural image data. Remember, the analysis will be done by computers and computers won't look at the aesthetics or the plausibility of the image, they'll only be looking for whether some algorithmically-defined metric falls inside or outside some given bounds. Algorithms are great for "what" questions, not so good for "why" questions.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
If you need to ask this question, you shouldn't be developing a crypto tool. Seriously, don't.
There are a million ways to get something like this wrong. Doing it right requires deep domain knowledge, which it seems you don't have.
(To answer the question, the definition of a secure encryption function E(k, m) is that, when k is random, E(k, m) is indistinguishable from random. If you believe that AES CBC mode is secure, then you believe that an attacker can't distinguish AES-encrypted text from random text.)
It's very possible that either or both of your random number generator or AES has some artifacts that make them stand out. Even if you can't find anyone with such knowledge, you shouldn't be too sure that they don't exist. So, just do this:
Run your random number generator numbers through your AES encryption to create your "random" blocks. The encrypted real data and the encrypted random data will be absolutely indistinguishable.
DES is nowadays considered a weaker algorithm
DES is considered too weak for many uses due to its small key size.
Nonetheless, if you can find a way to reliably distinguish DES output from random bits, without knowledge of the key and with remotely-practical efficiency, you can publish a paper that will gain you substantial name recognition among the world's cryptographic elite.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Merely saying "it's not an encrypted dataset, it's just random numbers" cuts no ice, simply because Occam's razor kicks in. Which is more likely- some *suspicious" guy has a 100MB encrypted file or he's just created a 100MB of randon noise just for the hell of it?
You will rapidly discover that all this "innocent until proven guilty" nonsense is merely the stuff of TV drama and that in the real world the burden of proof will be in your lap. The reason is that the security people (and any jury, if it came to that) would all assume that the data is encrypted. So you're left with the philosophical knot of proving that your data is not encrypted and the only way to do that is to exhaustively demonstrate that no key will decrypt it.
Good luck with that!
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
If I give someone one file containing random data and another containing data encrypted with AES, will he be able to tell which is which?"
The answer is no, sort of. With a properly chosen key, and a sufficiently strong (read "good") encryption algorithm, it should not be possible to distinguish random data from encrypted data. Obviously, there are caveats to this, but in general, it should not be case.
The reason for this is that in a true random (or near random) stream (or block) of data, the following two conditions hold (or should hold):
1. the symbols in the datum have uniform distribution, meaning, they are equally likely to occur at any position in the stream. 2. the probability of a symbol appearing at a position X in the stream is an event independent of all symbol have have occurred prior to X. That is, a random stream has maximal information capacity (or entropy), according to Shannon's theory of information. It is impossible to predict what value will occur at a given location in the stream by analyzing the values that have occurred before (which is how you search for patterns.)
A properly secured encryption algorithm (with a suitable key and conditions) attempts - via substitution and permutation - reduce the statistics of a plain text down to symbols exhibiting uniform or near uniform distribution. Once you do that, then the statistics characteristics of the plain text (which is what you use for pattern searching) are no longer there. The information is unrecoverable, maximum entropy, maximum information potential.
You would need the original key (and original conditions) to reverse the uniform distribution of the ciphertext symbols
Without the key, then it is impossible to find patterns on the ciphertext without knowing something about the plain text, or a sample ciphertext other than the ciphertext you are trying to attack. There are attack vectors than can be used, but then this is no longer about trying to distinguish patterns in the ciphertext, but reversing (usually by brute force) the encryption process.
The reliability of an encryption algorithm is inversely proportional to the number of potentially recognizable patterns that remain in the ciphertext. With AES, the answer to your question should be no.
Having said that, experiment, play with it. Just don't use it in real-world products, though :)
Technically it's possible to do what you want to do, but to avoid leaking information and get good performance you have to use non-trivial cipher modes. I suggest you have a look at the documentation for TrueCrypt, which covers most of the mathematics:
http://www.truecrypt.org/docs/
The problems you are likely to run into are related to the handling of identical data in several files. A naive implementation will leak a lot of information.
The steganographic payload has to be protected from damage. If it's randomly scattered on a partition, it has to be marked as used blocks, or it will get overwritten by the OS. If it's not part of a partition at all, then it's immediately suspect. So, it has to be embedded in a file. Same issue here -- the data has to make sense in terms of the file format. Some image formats like TIFF have internal pointers that allow you to make unused areas, but it's painfully easy to read the header and find that you did this.
The only time it really works is when you have a single, unchanging payload. Stick it in a carefully crafted payload and go on you way. But data that regularly gets used can't easily be hidden.
Unless someone looks at his plaintext shell history that is.
There is no file with random data.
Why would one want to create a file with random data?
The point of creating a file is that it ain't random data what you are saving, otherwise you wouldn't save it.
If you are talking about an encrypted block of storage in an otherwise intelligible file it will be patently obvious that there is something there that is not like the rest of the file.
However if you had an encrypted block of storage in an file that was itself stored encrypted you might be able to plausibly deny the existence of the secret block.
Lets assume you had an encrypted item, movie, book, etc. Call this the innocuous payload.
Embedded in that you add a secret payload encrypted with a totally different key.
You might be able to develop an encryption system that simply ignored these blocks, and decrypted the innocuous payload when the provided the key. Pass it off as your DRM mechanism. Provide the key to the innocuous payload when pressed to do so.
If the secret payload was small relative to the innocuous payload, and perhaps scattered, most investigators would simply assume its encryption padding and ignore it once they decrypted the innocuous payload.
The secret portion would look largely similar to the innocuous portion in its encrypted form.
Sig Battery depleted. Reverting to safe mode.
Information entropy. In other words, yes.
I think this is the software version of the ADE 651 http://en.wikipedia.org/wiki/ADE_65.
Your link seems only weakly encrypted to me.
Did you mean: http://en.wikipedia.org/wiki/ADE_651 ??
Sig Battery depleted. Reverting to safe mode.
Steganographic attempts are considered foiled if someone can detect that there is a secret message, they don't need to be able to retrieve the message in order for the attempt to be considered a failure. I did my Master's project on hiding data in the least significant bitplane of imagery. The trick is to "randomly" scatter your secret message throughout this plane. I showed methods that would allow you to do this so that the data was indistinguishable. You should always encrypt your secret message first so that it looks random, or better yet, shape the statistics of your encoded message to match the noise characteristics that were in the original LSB plane. If you use an image created from a very noisy source, such as a digital camera, and you encrypt the embedded message and scatter it using a reversible algorithm, and iteratively ensure that the statistics of the altered LSB plane look the same as the original LSB plane, I proved that it is not possible for someone to tell that there is a secret message hidden there. However, you need to be careful to use an original image you created yourself, and to destroy the original, because if someone ever compared the original to the one with the embedded message, they could definitely tell there was something altered by comparing the LSB planes.
Would you be kind enough to provide Slashdot readers with a pointer to your master's thesis? I would like to experiment with this.
If you find a file on my hard drive with data you can't readily decode, is it:
A) Compressed with an unknown compressor
B) Encrypted with an unknown encryptor
C) Random bytes used for an encryption process
D) Random bytes used for something else
I can't prove that answer D is wrong... but I don't have to because I know that 99% of the time, it's one of the other answers.....
OK, let's, as a community, add an (E). Everyone create a file on your laptop, in your home directory, named random.bin, as follows:
dd if=/dev/urandom of=random.bin bs=4096 count=10000
The actual value of the count isn't important, as long as it is large enough to create lots of random bits. If lots of people do this, we have “(E) Random bytes because Slashdot told me to”, providing plausible deniability for anyone who needs to use that file to encrypt something important.
With cryptsetup, make sure you use essiv:sha256 or AES LRW. There are watermarking attacks against the earlier versions of AES-CBC with unprotected block-based IVs used by cryptsetup and truecrypt.
Nonetheless, if you can find a way to reliably distinguish DES output from random bits, without knowledge of the key and with remotely-practical efficiency, you can publish a paper that will gain you substantial name recognition among the world's cryptographic elite.
If you have enough DES output you can do this. Someone already mentioned that if you use a strong cipher, even AES-256 in ECB (electronic codebook mode) then the output is nearly trivially distinguishable because repeated plaintext patterns of the size of the block the block cipher algorithm uses will encrypt to identical ciphertexts.
Even if you use CTR mode or CBC mode, patterns in the plaintext show up in the output if you encrypt enough data.
For example, if you by chance end up with the same ciphertext output block in CBC mode you can obtain the XOR of the corresponding plaintext blocks by XORing the two immediately preceeding ciphertext blocks. If you encrypt enough blocks, the laws of statistics favor two blocks ending up being identical by chance. And this XOR equality allows you to determine if the data is encrypted because it's generally relatively easy to tell if the XOR is the result of XORing two pieces of unecrypted data.
CTR mode has a different sort of relationship that can be exploited. You know that the XOR of any two ciphertext blocks is not equal to the XOR of the corresponding two plaintext blocks. This can eventually leak information about the plaintext with enough blocks to work with. But using the inequality to determine if the bunch of data is truly random or encrypted data takes a lot fewer blocks.
And, of course, if your blocks are larger you need many more of them for the statistics to work out in your favor for attacking the cipher. This is why block ciphers should be re-keyed periodically when encrypting a lot of data. It's also why it's much easier to write a distinguisher for DES (64-bit blocks) than AES (128-bit blocks) that distinguishes between encrypted and random data.
Need a Python, C++, Unix, Linux develop
In Q2 2009, a company called Forensic Innovations, Inc. claimed to be able to distinguish random data from TrueCrypt containers. See http://www.forensicinnovations.com/blog/?p=7 However, I downloaded their tool (Version 2.23 of File Investigator TOOLS) and it identified truly random data as a truecrypt container... In fact it identified any container of truly random data bigger than a certain size as encrypted data. Therefore, their $249 tool (or this particular version at least), which claims to "Detects Encrypted Files, including TrueCrypt" is not performing as advertized. Try it for yourself.
This is in fact a major design principle for modern ciphers, so forget it.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
DES is nowadays considered a weaker algorithm
DES is considered too weak for many uses due to its small key size.
Nonetheless, if you can find a way to reliably distinguish DES output from random bits, without knowledge of the key and with remotely-practical efficiency, you can publish a paper that will gain you substantial name recognition among the world's cryptographic elite.
Agreed, but his point is well-taken regarding the identical encryption of duplicate blocks and the need for a rotating key schedule.
One could buy a random wiping program and use it regularly; perhaps by a scheduled task.
Later, when asked what all the random data is, one points to the random wiper.
C//
What if the encryption algorithm, or at least an implementation, inserts some kind of headers or other marking so that anyone can tell that it's not random data? Can any actual security experts chime in, does anything like this ever happen or is an encrypted file just pure encrypted bits start to finish?
Then hide it in secret conspiracy information.
A cipher is basically just a random sequence generator seeded with the key and an initialization vector (so the same key can generate different sequences), so the answer is no; you can't differentiate a data stream that has been XORed with a random sequence, from a random sequence. In practice though, if you see a random sequence it's a safe bet that it's either encrypted or a decoy. In some cases you can detect handshakes and key/iv exchanges (like diffie-hellman).
As far as I know finding patterns in the output is tightly linked to reducing the number of possible keys, so good encryption algorithms should not create patterns.
And that's the fundamental problem. It's hard to have a convincing explantion for random data on your hard drive. Steganography tneds to work with the "noise bits" in images or music, but those bits normally create patterns, so you can distinguish cryptography. It's possible to produce crypto tuned specifically to the bit distributions in whatever you're using to mask your payload, but it's a hard problem by crypto standards (which is saying something).
I'm sure the NSA can do it. I'm sure I don't trust anyone else to. I'm also sure I'm going to get in trouble one of these days, because I wipe drives with TrueCrypt before I use them (even unencrypted), as a handy way of seeing if the drive is going to fail early on me, and there's no way I could ever convince someone that my unencrypted drive really isn't hiding anything in those unused areas filled with "random" bits.
Socialism: a lie told by totalitarians and believed by fools.
Encrypted files have maximum entropy, just like absolutely random files. Basically, you can't tell which one is which. However, absolute random noise on a disk isn't all that usual, so any encrypted file (or pure random file) will stand like a sore thumb: it will be highly visible. But, again, you can't tell the difference.
Absolutely correct. Any "investigator" who finds a pure random file will immediately suppose that it contains encrypted data. I mean, what else? Compressed files are not random, and there is no real good reason to store gobs of random data on a disk. OK, maybe you can come with a good reason, such as doing research on random numbers, but that will be highly suspicious.
On the other hand, it is possible to systematically add entropy to a file. One very simple way may be to consider the random bits as codes in a variable length alphabet, much like a Huffman code. You can then "decompress" the random file using the variable length code. Voila, a larger file with the desired entropy/redundancy. It will look like binary data, not encrypted data.
We've discussed this on MetaOptimize.
Short answer: Download an empirical testing script like dieharder, and see if the encrypted output looks "random" under this battery of tests.
What happens if you use the old "torn sheet of paper" routine?
Each drive or device moving from A to B goes with a different courier/ISP/method and no "piece" contains enough information to be identifiable or usable.
All the pieces need to arrive at the destination to be able to be re-constructed back into usable form.
Any time you send a complete message in one burp, one hard drive or one CD or one image, there is a chance for decryption by any number of accidents or threat of death to all your family members one person at a time while you watch.
No encryption was used in the creation of this message...thus I have deniability.
as far as I know you can calculate the randomness of a sequence of 0's and 1's.
it has to do with the distribution of the length of strings with the same value.
taking the true crypt model as example (hidden partition in encrypted partition) the hidden encrypted part hidden, it should at least be indistinguishable from the not hidden encrypted part, so they should be equally random. also unused space in both partitions should have the same amount of randomness as the used space.
Privacy is terrorism.
Of course if your encryption software writes some kind of header - which wouldn't affect the security of the encrypted contents - then it will be obvious to anyone looking that you have an encrypted container. So this is 99% about implementation and 1% about encryption algorithms.
TrueCrypt implements a headerless encryption protocol for just this reason. Its ciphertext files should be indistinguishable from a sequence of random bytes. So all you have to do is hide those bytes steganographically, or have a plausible excuse for sending/storing a bunch of random bytes.
"This algorithm runs in constant time. Come on, 2,147,483,648 is a constant..."
If you can distinguish encrypted data from random noise without knowing the encryption key, you've found a weakness in the encryption algorithm! That doesn't happen too often these days. I'm sure the job offers will be lining up once you publish.
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
Advanced steganography is indistinguishable from propoganda.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
nt;
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
I've used recurrence plot analysis and surrogate data testing for this. Both are more suited to time series analysis, but can be used with any data. In principle these examine compressibility, and any form of compression could be adapted and give you a yes/no), but these give you meaningful statistical analyses (if yes, then how much). Be aware that your random isn't rally, and so will give you a non-zero result. But you'll get very close to the same very small result, whereas with surrogate it'll be different and some data set will acutually improve with scrambling. But you should bw ablew to trell yes/no ns how diferent dfrom random
"I may be synthetic, but I'm not stupid." -- Bishop 341-B
In cryptography, reinventing the wheel is very bad - you usually end up with something that's broken in a way that isn't at all obvious. It's better to go with something that's been peer reviewed. In this case, the product you're looking for is TrueCrypt. TrueCrypt volumes look like random data, and you can have multiple encrypted volumes inside the same container. Each volume has either random data in its free space, or another volume that looks like random data in there. So you make one or more decoy volumes with your tax/banking information, non-incriminating diary, etc, and then put the thing you're really trying to hide in another volume. Since resizing TrueCrypt volumes is very inconvenient, you have a plausible reason for making it too big.
Also note that regardless of how well hidden your steganographic data is, the fact that you have steganography software installed (which you can't effectively hide if you want to use it) is enough to damn you if you're trying to make it look like you don't have any encrypted data.
Sounds good!
What's the ratio of jpeg file size to how many encrypted payload?
Or, the other way around, to encrypt+hide 1 MB how many
MB of jpegs would be needed?
(I am guessing that you are saying the ratio is less than 1 to 8)
Also did you look at audio files?
Stephan
http://stephan.sugarmotor.org
Why does OP talk of offsets? Because he wants to hide more than one partition! Give up two to police in a pinch. Is there a third? They can't know...
wouldn't it be more useful to mod your laptop so it's got two drives, the goodies being on the drive that only becomes accessible when you put the little magnet in the right place to energize the reed relay that powers up the dark drive? No seeum drive in directory, no drive there, Kemo Sabe.
is it possible to prove there is encrypted data where you claim there's not?
prove 100%, no; prove beyond reasonable doubt, sure -- unless you happen to be really good at convincing juries that carrying around a hard drive full of noise is common practice
I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
If you use AES in ECB mode, then the answer is that it's usually painfully obvious that the original data was structured.
If you do use chaining (CBC, or something similar), then it will look quite random.
Is there painfully obvious structure in data encrypted with AES in a counter mode, where the key is some function of the IV and the offset? Counter mode is easier to parallelize across multiple cores than chaining modes.
Cryptography is hard, very hard, to get right. If you don't know if you can distinguish random data vs encrypted data, then either (1) you are an crypto expert and this is an unsolved question like P=NP, or (2) you know very little about crypto.
Well, since you are asking HERE, my bet is (2) is the actual situation.
So my answer to you would be: STOP, don't write this program. You are not qualified, and anyone using your program (including yourself) will be given a false sense of security which may end up worse than knowing your data is not hidden.
If you insist on proceeding, then first plan a couple years of self education before you start. And /. is NOT the place to begin.
Oliver.
The op wants to know if the output is distinguishable from random garbage. He doesn't ask about the difficulty of decrypting.
One of the goals of most crypto systems is to generate output that approaches random garbage.
The question isn't whether the file with the AES'd data can be decoded, it's whether a third party can detect which file is which. To that end, I would say the odds are fairly low. Especially if the OP is embedding the encrypted data inside a block of noise.
The GP is dead on, AES was designed with exactly this in mind.
Your arguments about compression arent precisely true. Its true for archives but not true for a wide array of todays general purpose compression algorithms that use high order arithmetic encoding, and are bijective.
The bijective class of compressors are ones where any arbitrary stream is a valid input for the decompressor.
If you think that you can find entropy that isnt squeezed out of text with algorithms like PAQ, then you can also eliminate it and improve PAQ and win some cash offered by the Hutter prize. We are talking about algorithms that compress English text to about 1 bit per character for large inputs.
"His name was James Damore."
Also consider the entropy of your "random" data. The encrypted data could be distinguishable by being *too* random compared to other samples which maintain some sort of detectable pattern or base. Random usually isn't with respect to computers (hard to guess or predict != random), and folks go to great length to make it as unpredictable as possible when it counts the most. If an encryption algorithm, which is by design patternless, is compared to a bunch of randomly inserted data, it's possible that it could be detected by as much of what it isn't as what it is.
I think you'd be better off inserting random data which has been encrypted by the same routine and key size. If it's a good, patternless method, you will end up with many samples of data only one of which is useful. Anyone not knowing the offset of the target data could waste a bunch of time trying to get into the other pieces.
Why not do a couple passes of pure 0s after the random wiping? Clears out the random bits and gives you an additional load test.
"Evil will always triumph because good is dumb." -- Dark Helmet
Steganographic attempts are considered foiled if someone can detect that there is a secret message, they don't need to be able to retrieve the message in order for the attempt to be considered a failure. I did my Master's project on hiding data in the least significant bitplane of imagery. The trick is to "randomly" scatter your secret message throughout this plane. I showed methods that would allow you to do this so that the data was indistinguishable. You should always encrypt your secret message first so that it looks random, or better yet, shape the statistics of your encoded message to match the noise characteristics that were in the original LSB plane. If you use an image created from a very noisy source, such as a digital camera, and you encrypt the embedded message and scatter it using a reversible algorithm, and iteratively ensure that the statistics of the altered LSB plane look the same as the original LSB plane, I proved that it is not possible for someone to tell that there is a secret message hidden there. However, you need to be careful to use an original image you created yourself, and to destroy the original, because if someone ever compared the original to the one with the embedded message, they could definitely tell there was something altered by comparing the LSB planes.
That's a fascinating idea.
Lots of 1600 ISO images from your camera. Feed them into the software so it can analyze the noise characteristics for imitation later, then it loads the image into RAM, modifies them, and overwrites the old image on the harddrive.
Gotta love slashdot for posts like yours.
> Gotta love slashdot for posts like yours.
Yeah, uh, me too. Thanks masters dude.
Do daemons dream of electric sleep()?
Red herrings. Don't make them on your computer. Spreading them via automated means would be another topic though.
You mean like http://www.truecrypt.org/ already does?
The GP should record some LP, at 24 bits per sample, at 96 kSa/s, in stereo. It wouldn't be too unusual, especially if he picks a well known music. Classical music will be particularly good here.
How about this one? http://www.youtube.com/watch?v=Nz0b4STz1lo
In general, it is safe and legal to kill your children. -- POSIX Programmer's Guide
Useful data are not random. If it looks random, and you're devoting disk space to it, the investigator will assume that it is encrypted, or is key material for encryption (e.g. a one-time pad). Why else would you have lots of random data around?
I ran across a website years ago where you could type in your plain text, e.g. "I mow the lawn every Thursday." and it would encrypt it to some other plain-language sentence or paragraph, e.g. "The merry-go-round is painted yellow with red stripes. Neighborhood children rake cans from the park."
Anybody know what this is called or where that website can be found?
Or randomize the encrypted data...
The hard drive may have been freshly bought off ebay (leave a record!) or a garage sale or a gift (make sure the person that provides you cover is willing to back your story and is beyond the legal reach of the goons from the country you are dealing with). A disk whose earlier life was being a part of encrypted RAID is fairly likely to look like random garbage. Alternatively, you can claim it is YOUR disk, which contained sensitive data and you did a complete DoD-grade wipe with /dev/random last pass before shipping. (Or, better, used a software that does it for you.)
The best alternative would be modding the hard drive firmware to look like empty or so.
If you have a clean room, you can disconnect the heads or simulate other kind of a failure, and claim the disk is faulty and goes for data recovery. Without a clean room, remove the circuitboard from the disk, cover the right contacts with insulating tape, and mount the board back; the disk will behave like not spinning up or having broken heads. A technician carrying a broken disk is a good "legend".
In my iTunes collection I have many GB of audio and podcasts, and I assume that the first bytes of all the blocks would be different. If only I could just get the files to fragment in such a away that when reading the raw device the first character of each sector was the values I wanted... I could even use RC4 or some other stream cypher to generate block offsets, giving a password and further defying analysis. As long as I don't defrag the disk, my data is safely hidden in files that just look like files.
Just hope I don't need to update the data often, as read and write time would be rather shabby...
What if you had an encrypted partition with multiple passwords- each password decrypts different files from that partition. You could go even further and set it up to generate random files and passwords, along with some parts that are just random data that do not correspond to a password. If someone asks about the encrypted partition, you type in a password and the corresponding files come up with no indication there were more passwords. If the size of the files you showed them compared to the partition makes them suspicious enough (or some other indicator), you can use more passwords, but they would have no way to know whether you've used all your passwords. Of course they would realize your security is at paranoia-level, so if you explained the random passwords, you understandably would have no way to actually show them everything on the partition.
On the up-side, instead of trying to convince someone that encrypted partition is just random data, you actually decrypt stuff for them- plus with the random passwords they can't prove if you're lying. On the downside, you can never show them everything (unlike a 1-password system), so there is no "cut your losses" route. This leads to two questions: could this plausibly work, and how bad could it be for you should law enforcement ask to see what's on your computer?
Other encryption methods leave you dead in the water should someone figure out you've got encryption- by being out in the open it may be more dangerous (it always raises suspicion), but it possibly mitigates the damage that suspicion can do.
My webcomic
Give yourself a cookie. You've just invented steganography.
"What interests me is that it recorded approximately eighteen hours of static."
http://www.imdb.com/title/tt0118884/quotes?qt0379375
Atari rules... ermm... ruled.
Hasn't this whole subject already been addressed by Truecrypt?
they probably won't bother to hit you with the wrench until you tell them.
That is - it won't hurt THEM.
And besides, a little preemptive beating with a wrench never did kill anyone.
Anyone doing the beating that is.
Mit der Dummheit kämpfen Götter selbst vergebens
OK, I wanted to try to find out if there were encrypted data at some offset in a chunk of random data, I'd start with Knuth's tests for randomness. I'd break the thing up into decent sized chunks (1 meg or so) and run a bunch of different randomness tests on each chunk and on the whole data set and see if any patterns emerge.
The thing is, even if the encrypted data looks pretty random, it's likely to look DIFFERENT than the surrounding random data.
The worse problem is that if you have someone who's asking you if there is encrypted data, and they find some bogus pattern in the random noise, then you've got a problem because you can't prove that there ISN'T any data there. If you are being prosecuted in a normal US court, you might get away with this (if they can't prove that you've got anything encrypted, it may be hard to hold you in contempt trying to get you to give up the keys), but if you fall under the sway of some intelligence agency that doesn't like the look of you, it's not likely that they'll just let you go because you claim there isn't any data.
A thousand pounds of wood moving at 300 feet per minute. Don't get in the way.
When you use full disk encryption only the crypt header reveals your using it. Using: cryptsetup luksDump /dev/device
You can get the length of the header to make a backup:
dd if=/dev/device of=./backupfile count=length
Then you can overwrite it with:
dd if=/dev/urandom of=/dev/device
Now your disk cannot be recognized as encrypted, storing the cryptheader on a micro-SD card allows you to quickly eat it when necessary.
You can also carry a very strong magnet Neodymium magnet, this allows you to wipe your harddrive in a second.
What if a few popular distros simply had, turned on by default, a low priority process that wrote random data to free blocks?
Then you've got a perfectly legitimate excuse for a drive full of random data. You really can't beat, "it's a security measure enabled by default on Debian," as excuses go.
why bother with crypto on HDs? Simply do as many businesses do. Move your drive image to a server, wipe the drive, reinstall the base OS, and once you're at your destination, open an ssh channel to the server and rsync your data back to the drive. Of course, you might have to rent a colo for a day to get access to a pipe big enough to move that data in a reasonable length of time.
If ctypto use is illegal at your destination, what are you doing there?
Tech Public Policy stuff
Did you save these in binary, no ascii formatting or line numbers? Negative numbers as well as positive, covering the entire range of bits in the integer (or other) length? I'm assuming they were integers, since floats have a specific format. Any sort of formatting would have constituted a non-random pattern.
Just as with your laptop. Create a fake login.
So when someone beats you with a phone book or a tirewrench, you can say "the login is jdoe, password 123!" and they'll login and see your not so important files. When actually your login is janedoe password abc.
Same applies with encrypted partitions in your setup. Have a partition A at index N and a partition B at index M. A,N is the fake one, complete with files recently modified (.bashrc and cron will help with that). And B,M is your normal secure parition.
In a moderately large file, it would be very suspicious if no two blocks were the same.
I'm a Programmer. That's one level above Software Engineer and one level below Engineer.
You might encrypt and steg away, and then use a bit of software to remove the record of said file(s) from the fat/mft and mark those sectors as "bad". Email yourself the backup fat/mft, travel, and reverse the process at your destination.
Or you might (in a rush) encrypt and steg away on a partition, then mark the partition as unallocated while you go through security. Won't fool a reasonably competent computer tech, which is not going to be a problem in most countries but might be a risk in America, seeing as how we've offshored enough IT that your airport security grunts might just be very "technical", indeed.
Orwell: "In a Time of Universal Deceit, telling the Truth is a Revolutionary Act"
Couldn't you just establish a code using the content of the images? Something using the names or hair colors or whatnot of the subjects. Then just make a slideshow of family photos in a specific order and name it TOP SECRET MESSAGE.
Don't worry about hiding the data - there are many ways of doing that. Worry about hiding the software that accesses it. The thing that gets most folks using steganography caught is the `investigator' finding steg. software on their machine. After that it's just a question of searching through each of the formats it does or threatening them with obstruction of justice / other crimes until they tell you what they used it for. Or, at least, that's what I learnt from the high tech crime squad...
By stuffing files with encrypted random data (indistinguishable from encrypted compromising data until decrypted) you dilute the the attempts of the Other Side.
Even with weak systems such as DES decryption without the key is enormously more time consuming than encryption.
If the Other Side requries a 10,000 core server farm to break 1 message a minute, and you can keep them busy by generating 1 false message a minute with a single core, and still have cpu cycles to spare, you are ahead of the game.
A second reason for encrypting random data: Thwart operational traffic analysis. A darknet that spends free cycles moving encrypted random chunks around makes it more difficult (impossble?) to figure out who is talking to who.
Third Career: Tree Farmer Second Career: Computer Geek First Career: Teacher, Outdoor Instructor, Photographer.
How about killing two birds with one stone: Create a new picture file type. Call it jpeg-V (for verify). It would have to have a major feature making it worth widely adopting. As part of the spec, it would contain a 1024 byte random string. Heck, call it a unique identifier string to facilitate indexing.
But ...
There's nothing to prevent encrypted date from being buried in there. Maybe file pointers so pieces can be appended and also so you'd need both the password and the starting file.
Now just populate the web with billions of pictures, most perfectly innocent. A thousand photos, taking up around a GB or two or three, might hold a meg of encrypted data.
Now you're faced with just what the original question asks: can random and encrypted strings be indistiguishable.
The world is made by those who show up for the job.
From your description of what you think you're doing, the odds are fairly high that you're not implementing correctly. As a recursive proof, your question suggests doubt (a newbie would be done with the hack by now), your doubt suggests fear, your fear suggests potential wisdom, your potential suggests its own answer, your answer suggests you should listen to yourself.
``Tension, apprehension & dissension have begun!'' - Duffy Wyg&, in Alfred Bester's _The Demolished Man_
Valid points, given very large volumes of ciphertext. VERY large volumes for ciphers with 128-bit block sizes. Large enough that the attacks are impractical, though theoretically possible.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
If he was stashing it in the LSB's then he was using uncompressed BMP most likely. The first thing JPEG throws out is the LSB's for lossy compression.
If you have information that they want and it's encrypted, the laws no longer apply in situations like these. They'll covertly torture and drug you until you tell them some lies or tell them the truth.
Only governments have the resources to protect Alice and Bob from Gordon. Basically only governments can use encryption, you can experiment with it to see how it works but you wont be able to practically use it and it doesn't matter what algorithm you choose because all algorithms are only as strong as Alice and Bob, and usually Alice and Bob are physically defenseless.
They can torture you for the rest of your life whether they find out or not.
Yes, XORing each plaintext block with an encrypted block index/counter would be enough to render the output theoretically indistinguishable from random data. This is "CTR" mode, and I'd be surprised if most hard drive encryption systems aren't using it.
http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Counter_.28CTR.29
Use it several times, 4 pass.
Some include encryption.
See http://webcache.googleusercontent.com/search?q=cache:mw0S9v_ew1UJ:www3.sympatico.ca/mt0000/bicom/
Does anyone here actually use stenography for more than fun or school or some side project? Other than spy vs spy stuff I can't imagine using it on a regular basis.
To use it properly you would need to have your decryption software installed on a totally different computer, otherwise when they look for the secret encrypted stuff they will see your stego encryption/decryption software and be tipped off. Also, since you can't hide much data using stenography without tipping someone off it seems to be of limited use for nearly all real world applications. I particularly can't see a business or even a government using this to protect their data.
Don't get me wrong. It seems like a cool thing to do. So, if you are using it regularly for a reasonable purpose, what exactly is that?
Ninjas don't carry tic tacs
If it's correctly written, it will be random. Any header required by the implementation should just be encrypted with the plaintext. There may be an initialization vector at the beginning (16 bytes with AES), but this will also be a pure random number.
No, they don't. That's the one flaw.
Hint. Take one of the unencrypted image formats - with a relatively simple image, encrypt everything but the image header - display.
You'll probably be able to tell what the image was.
Compressed encrypted files on the other hand are pretty close to maximum entropy.
My favorite demonstration of why ECB is almost always the wrong choice for block modes of operation, regardless of the algorithm, can be seen in the Tux series of pictures on this page. http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation It's a great picture to show people when they don't understand why "just encrypt the data" isn't secure enough, and why they need a competent cryptographer.
But for this discussion, notice that your eyes can not detect a pattern in the encrypted data, even though the source is an uncompressed BMP file.
John
I may agree with you,that really very important.