Distinguishing Encrypted Data From Random Data?
gust5av writes "I'm working on a little script to provide very simple and easy to use steganography. I'm using bash together with cryptsetup (without LUKS), and the plausible deniability lies in writing to different parts of a container file. On decryption you specify the offset of the hidden data. Together with a dynamically expanding filesystem, this makes it possible to have an arbitrary number of hidden volumes in a file. It is implausible to reveal the encrypted data without the password, but is it possible to prove there is encrypted data where you claim there's not? If I give someone one file containing random data and another containing data encrypted with AES, will he be able to tell which is which?"
erjpgoijpoij erghoiehrgoiuh ernnerughoiehrgh poiuhgriuhpoihegh erherhoiuerhgih.
Was that random or encrypted?
If he works for the NSA
you're not the best one to write this kind of software if you don't know the answer. start here:
http://www.amazon.com/Applied-Cryptography-Protocols-Algorithms-Source/dp/0471117099
If your question is: Can an encryption algorithm create patterns which can be identified upon analysis, then the answer is yes.
After a few whacks on the head with the NYC Yellow Pages (old school, print edition) I think someone could find out which file is encrypted and which is garbage.
Home of The Suki Series
Just a thought, encrypt the random data?
Encrypt the random data with AES. Then do what you are doing. Result: It shouldn't be possible to tell whether you've written more AES encrypted data over some parts of the old AES encrypted data. And it should be impossible to distinguish what the data was like before the encryption.
(I don't know if that would really work as I've never studied cryptography... But I don't see any obvious reason why that wouldn't solve the problem.)
Encrypted files have maximum entropy, just like absolutely random files. Basically, you can't tell which one is which. However, absolute random noise on a disk isn't all that usual, so any encrypted file (or pure random file) will stand like a sore thumb: it will be highly visible. But, again, you can't tell the difference.
cpghost at Cordula's Web.
Properly encrypted data is indistinguishable from random data. However, just the presence of random files on the system could be incriminating. Perhaps it's better to hide the data in another type of file? Perhaps using the lsb of a bitmap file?
Sometimes I doubt your committment to SparkleMotion!
In a perfect world, encrypted data would appear to be entirely random. We don't live in a perfect world, though, and as far as I'm aware there aren't any algorithms that exist with this attribute. You can make it very difficult, though.
Does the person to whom you give these two files have a rubber hose? Is he a member of the “extraordinary rendition” team?
The point of steganography is to not get caught in the first place. If you need plausible deniability, you’ve already lost.
Cheers,
b&
All but God can prove this sentence true.
Let me guess... Random!.. No, wait, too obvious. Encrypted!
No. You cannot distinguish between the two. If you could, you would have an attack vector against the encryption. The trick, once you have a key, is to have authentication-strings in your data structure, so you can see whether the key you used is actually correct, and the decrypted data is actually useful. An attack based on this authentication string, is one of the many, many possible attack vectors against encryption. Also, 'random' is not always very 'random'. In the world of cryptography, we need serious random. C'mon dude, how far are you with this ?
Religion is what happens when nature strikes and groupthink goes wrong.
Perhaps the question is incorrect. If i have a volume with data and a volume with encrypted data, then the encrypted data can be discerned from the non-encrypted data by virtue that there will be patterns detectable in the non-encrypted volume. So technically if you have a drive and there is random data on it but no discernible patterns, then there is either encrypted data on it, or it is an empty drive. It is likely not even factory default since that it likely to have some structure imposed upon it as well. What is the point of carrying around an encrypted volume with the ability for plausible deniability if that plausible deniability requires you to have random data as a volume? The existence of random data will render your plausible deniability claim useless since, by definition, your claim is no longer plausible.
If everything is perfect? No.
If you have two plaintext blocks, and encrypt using ECB mode with the same IV though, then the two cipher blocks would turn out identical cipher blocks. Would make it trivial to see which was the encrypted one.
So basically the answer is; it depends.
If you need to ask the question, do more research before you continue your work. This is stuff you really should understand before you embark on such a project.
Terje Elde
LUKS has a header. When I run cryptsetup on an uninitialized volume /dev/sdb1 foo /dev/sdb1 is not a valid LUKS device.
# cryptsetup luksOpen
Device
That means it can tell if the header is valid without a password. So now every offset needs valid LUKS header. Once you've done that, just make a bunch of perfectly valid encrypted volumes. Put real data in them. Install a working operating system that looks used.
Would be to encrypt the empty data portions of the filesystem when the file system is created (or when files are deleted) using the same algorithm and a pseudo-random key (you want to encrypt the empty data portions using a pseudo-random key so that you can deny it).
Do you think that 'I'm just sending random garbage' is going to offer any kind of plausible denyability in any situation where you could expose yourself to prosecution simply for using encryption?
I prefer the Boeing Boeing Gone guy. At least he varies the intro of his troll so that it's relevant.
Unless you're using a properly constructed one time pad, which the poster is not, encrypted data is distinguishable from random. The more of it you have, the more distinct it is. With a good encryption algorithm it's not easy, and investigators would probably use other techniques, but it is possible.
Very rarely do people just have random data sitting around or transferring, when was the last time you burned a cd of random bits? Context can easily give away the purpose of data besides the data itself. Pretty much the only time you will see 'random' data being sent across a network would be for metric, discovery or troubleshooting purposes, though that might be a clever cover.
The files on employees computers containing seemingly random data can be assumed to be just that unless they're driving brand new Porsches and have vacation condos in Whistler.
Have gnu, will travel.
AES is designed to be a pseudo-random function (meaning it's evaluated against that criteria). What this means is that /when used properly/ AES encrypted data should be indistinguishable from random data, at least for a distinguisher running in bounded time. If anyone discovers an efficient algorithm that can distinguish this, it'll be a big nail in AES's coffin (and yes, at the very theoretical level I realize that there already are some known weaknesses in AES, but for the moment you're in good shape).
As a practical matter, if anyone finds storage media that contains a byte sequence which passes many statistical tests of randomness, then they should conclude it probably contains encrypted data. There are very few other common mechanisms for random data to get on a hard drive in the first place. Disk wiping utilities usually only write a sequence of fixed bit patterns like all 1's or all 0's, but not random sequences.
That's not to say that randomness is a useless property. In a legal sense, no one can prove that random data is an encrypted volume. But if the goal is to avoid attention entirely, then even OTP is not good enough. Instead you have to hide your information in a natural, common source of entropy, like photographs or video. (Yes, your giant porn collection is a great place to hide the evidence from your secret conspiracy investigation.)
You best run it through a statistical script for finding structure in data sets.
Also, remember that the file itself will contain structure.
If the file is suddenly structured, then random, then structured, it is pretty easy to tell there is something hidden there.
This is the easiest way used to detect hidden data on your average image upload websites and the like.
So stay away from that idea.
Hiding data in messy file formats, such as JPG, audio and video files, works really well.
You can use the files data itself to hide data. (such as using a certain section of the color channels for data)
A good example is hiding encoded data in the last X amounts of alpha channel in PNG. (10 for example)
It limits the amount of data, but it also limits the chance of anyone, or any thing, noticing it unless they are really looking hard. Alternative is hiding the data in the first X set of alpha channel, then it makes more sense that large chunks of the image is transparent (and the original image maker just sucked at making transparent backgrounds)
As always though, best way to test any algorithms is posting examples up of your methods and getting others to crack it, maybe even with a reward.
Perhaps. But if you use cryptsetup with LUKS, there is a readable header for the encrypted file, you don't need the key to determine encryption has been used. In fact, you can set multiple passphrases that have the authority to decrypt the partition.
GPG Encrypted data is also distinguishable, regardless of whether you use ASCII armoring or binary .GPG files.
There are headers in the encrypted output that can be recognized without having the key to decrypt anything.
Now if you run 'openssl' from the command line, and choose 'aes-256-cbc', supply a true random key, and enter data bits interspersed with random 'padding bits'. It will be probably impossible for anyone to determine from the output whether there are any data bits or not, without knowing the key.
Maybe you should design it so the encrypted data has some patterns in it (ie. interleaved with the ciphertext)
No sig today...
http://xkcd.com/221/
Hard to say from your question, but if you haven't done already, get yourself some crypto knowledge. Crypto is hard, there is a reason that you are laughed out of the room if you say you've invented a new crypto algorithm and you don't already have strong credentials.
Randomness is one of the harder computer problems. Especially in steganography, many implementations have been defeated by creating not enough or too much randomness. If you want to hide your message in something, it doesn't matter if your output is distinguishable from randomness, it matters if it is distinguishable from what should be there. Simple approaches like LSB tricks have often fallen because those happen to be not random in many input data.
Assorted stuff I do sometimes: Lemuria.org
if one regularly wipes their deleted files with the final pass being a write of random data. That should, in theory, provide some haven in particular with encrypted drives and partitions.
So why can't the encrypted data just be added to a normal file like one created by Open Office or Word (docx format)? I know the word format better, but my understanding is that both are simply zip containers that house XML and some pointers to things like graphics, etc. that sit in different folders in the zip. So just add the encrypted data as "graphic7" to a file that only has graphic1 - graphic6. I don't believe the applications would care at all about this and would display the file correctly.
If you use AES in ECB mode, then the answer is that it's usually painfully obvious that the original data was structured.
If you do use chaining (CBC, or something similar), then it will look quite random.
Excellent example here: http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Electronic_codebook_.28ECB.29
Am I part of the core demographic for Swedish Fish?
Math.
Be sure to use some math and it'll all be good
what you want is plausible deniability and that is not easy to achieve, as some states have started to have laws allowing to hold you hostage if you do not provide an decryption key to an encrypted container (which, with your method, would be corrupted). Have a look as TrueCrypts technical details behind their plausible deniability feature: http://www.truecrypt.org/docs/?s=plausible-deniability
The NSA can.
If the right AES mode is used (CBC is a good one) and the implementation doesn't have any unencrypted headers, then it should be unable to differentiate between that and random.
Arthur Conan Doyle wrote that the best place to hide something was right under the nose of the person looking for it - if someone spots an encrypted file, no matter how well hidden, it says you have something to hide. Instead you could simply print out your secret file in the form of a glossy magazine, something no one in the security world would really be interested in flicking through. Gardeners World or Philosophy Today. And then just leave it lying around, literally. It will be the last place they look. And the last thing they will confiscate.
You cannot distinguish between the two.
This is categorically not true, unless the key is as long or longer than the data file (and never used again). There is indeed an attack vector against any encrypted data file if the key length is small by comparison. Statistical analysis plus the slightest idea of what type of data is being encrypted is more than adequate to mount a successful attack (given sufficient computational resources) unless the key is _much_ longer than what is typical today. The lack of computational resources is the only thing that keeps typical encrypted data secure.
While the data within the encrypted volume should be indistinguishable from randomness, the metadata headers are quite distinguishable. It's pretty obvious if something is a LUKS volume, but within that you shouldn't be able to tell.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
Steganographic attempts are considered foiled if someone can detect that there is a secret message, they don't need to be able to retrieve the message in order for the attempt to be considered a failure. I did my Master's project on hiding data in the least significant bitplane of imagery. The trick is to "randomly" scatter your secret message throughout this plane. I showed methods that would allow you to do this so that the data was indistinguishable. You should always encrypt your secret message first so that it looks random, or better yet, shape the statistics of your encoded message to match the noise characteristics that were in the original LSB plane. If you use an image created from a very noisy source, such as a digital camera, and you encrypt the embedded message and scatter it using a reversible algorithm, and iteratively ensure that the statistics of the altered LSB plane look the same as the original LSB plane, I proved that it is not possible for someone to tell that there is a secret message hidden there. However, you need to be careful to use an original image you created yourself, and to destroy the original, because if someone ever compared the original to the one with the embedded message, they could definitely tell there was something altered by comparing the LSB planes.
I thought the point was there is one encrypted file and one random file.
With stenography, one is encrypted and the other normal data. But both are usable - say a JPG image, where both load. So who is to say one has encrypted data, and they weren't just compressed using different settings? The question is, could anyone tell that (though with any given file format it seems like sophisticated enough analysis might be able to tell you that).
"There is more worth loving than we have strength to love." - Brian Jay Stanley
There is an old stego app called "texto" that uses some words (as I recall, 64) to encode bits. What comes out looks like
boring and rather repetitive text, but does look like text.
Encrypt, use something like texto (pick different words at any rate), then compress. Result will look like text,
can be hidden in more conventional ways (e.g., name the file "George's boring report"). This will not
leave a file around with suspiciously high entropy. Few folks store random data so finding such would tend
to arouse suspicion. They can and do sometimes store boring and repetitive data (especially where it
comes from superiors). Once someone decompresses it, they may stop messing further.
By now intelligence services are well aware of truecrypt and could always ask you to zero the
free space in the outer volume (which would destroy your unrevealed inner one), so it is not
as good, I would submit, as the above.
What about some sort of web service where you can download tidbits of random data to overwrite unused blocks on your drive ? Doesn't even have to be random all the time; it could be pieces of non copyrighted material for instance. So if anyone ask why you have random data, then you can say 'oh, thats just some stuff from the internet i downloaded'. That would be better than leaving your current unused blocks with pieces of your old deleted data too. If this became common, completely random data wold be unsuspicious. And on top of that it could be used as part of a disk-checking software to test your disk much like memcheck. /2cent
Can I light a sig ?
It is my understanding that if you start with a random one-time pad of 1s and 0s, it should be impossible to determine whether or not you have XORed a file of any entropy against it. If any background communication has surplus bits that are random 1s and 0s, that is your ideal stenography sub-channel. So any a test for encrypted data is really a test for a channel of random bits being transmitted.
The other half of the question is whether a test for random data is sufficient to detect a stenography channel? The smallest encrypted message that can be sent is a single bit - basically just a flare going up. And from a single bit, you can grow over time that channel to any arbitrary length. So that means your algorithm would be required to detect the existence of any pattern of random (unexplained) bits in a file or communication channel that the receiver could be flagging to read as a string of bits.
That's as far as I get ... I think that it probably reduces to being able to put a statistical upper bound on the size of a file that could be hidden given the number of random/unexplained bits in a data stream, but I don't think you could "prove" stenography wasn't happening.
( standard disclaimer: I am not a $profession )
Unless you're Bruce (Schneier).
It seems rubberhose is dead, but look at it and especially the fundamental ideas in it if you really wish to pursue this (I like the idea of having N encrypted volumes and the fact that you cannot prove that you have fully co-operated [and they cannot prove that you're not], of course you need some interesting data on the "bait" volumes as well).
The problem with properly used encryption being indistinguishable from random data is that you need a lot of good quality random data to hide your encrypted data in, because it will be distinguishable from the not-so-random data that you get out of /dev/urandom.
If you are in a situation where you will actually need encryption (especially deniable the sort) then don't trust your own code. As they say: A lawyer who represents himself has a fool for a client. (Don't trust someone elses code either unless it has been actually reviewed by more than two people who actually know how to do cryptoanalysis)
A lot of people have been beating about the bush. Put simply:
(1) Well-encrypted data, by itself (i.e., without any other container or header) is pretty much indistinguishable from random data or "noise".
(2) If you are actually planning to use steganography (hiding the resulting data by modifying another file, like say a photo image), you should be aware that sloppy steganography can often be detected, even if the "hidden" data is random. It depends a lot on the type of file in which you are hiding the data, and the method of encoding.
You should also be aware that effective steganography requires files that are very much larger than the actual data you are trying to hide.
The real advantage of steganography is misdirection: the resulting file appears innocuous. Don't count on it hiding your data well, all by itself.
because I always thought plausible deniability had nothing to do with hiding encrypted data, but rather, allowing two different keys to decrypt two different sets of data. The idea being, that under rubberhose conditions, you could easily reveal one key, that would decrypt and reveal sensitive data, whose encryption is understandable, but not reveal the legitimate data that you are trying to protect. The plausible deniablitity is in that they cannot say that there is any more data on that volume; you have given them a successful key and they have obtained access to data, and there is no way to guarantee that any of the other random data contains other (more sensitive) encrypted data.
All the investigators need to do is run some fake but seemingly complex program that looks at the file under inspection and says "yes, stenography in use". Then the full weight of the law comes down, because now the suspect has to prove the negative - impossible of course.
So actually what is needed is a suspect's right that investigators prove any assertion that files have been hidden if that assertion/analysis is used as evidence in court.
If you find a file on my hard drive with data you can't readily decode, is it:
A) Compressed with an unknown compressor
B) Encrypted with an unknown encryptor
C) Random bytes used for an encryption process
D) Random bytes used for something else
I can't prove that answer D is wrong... but I don't have to because I know that 99% of the time, it's one of the other answers.
If you want to hide your data, the file must ostensibly have some other purpose... something that isn't obviously a lie. That's what steganography is about. For example, you might download as much of the 1 meter-resolution Google Maps satellite image as fits on your hard disk, save it uncompressed and then store encrypted data in the low-order bit of each byte (3 bytes to a pixel). Coupled with a map application that can display the imagery, it would appear to be one thing (a map) while really being another (a container for encrypted information).
At that point, unless you capture the encryption software it becomes hard to suspect that there is encrypted data, let alone prove it.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
Imagine if investigators simply state that their analysis, not disclosable under state secrets privilege, shows hidden text saying "We'll bomb the Eiffel Tower on Thursday". The suspect is now stuffed with no defense.
I don't work for any 3-letter agency and even I could easily get the information needed.
With the right tools.
Mit der Dummheit kämpfen Götter selbst vergebens
LUKS can be easily detected.
The specifications for the on-disk format are published online.
http://code.google.com/p/cryptsetup/wiki/Specification.
What I would recommend and personally employ... First, fill the disk with a random background: /dev/sdz /dev/mapper/mapper1
# cryptsetup --cipher=aes-xts-essiv:sha256 -s 256 --key-file=/dev/random create mapper1
# cryptsetup --cipher=twofish-xts-essiv:sha256 -s 256 --key-file=/dev/random create mapper2
# dd if=/dev/zero of=/dev/mapper/mapper2 bs=512
Don't bother creating a partition table or anything else. Leave the entire disk full of this background data. /dev/sdz /dev/mapper/encrypted
Then create an encrypted volume using a hash for key material and offset and skip sector counts from the hash string:
# echo "secret_password@drive_serial_number" | sha512sum
4839 eeac 06 a 2045 d 60 6dbf519ba5e9[...]e312009896441a5
# cryptsetup --cipher=twofish-xts-essiv:sha256 -s 256 -o 483906 -p 204560 create encrypted
Password:
# pvcreate
# vgcreate
# lvcreate
If questioned I would respond with nothing, no words, and just chill there.
Ok, I think this guy's question has been answered (almost before he asked it, honestly). Uhhh, no, the random data and encrypted data will be indistinguishable, however, random data does not occur without human intervention (intent). Meaning, if your rouse of carrying random data to confuse and supply deniability is your plan, think of another plan.
My question is, what are you trying to hide? You don't have to be overly specific, but it helps to hide similar data within a similar structure. Steganography works well for hiding pictures within other pictures, sound within other sound, etc. Why? Because even the encrypted data looks like errors in the image file that might just be due to a corrupted file. If you're thinking of hiding entire filesystem contents within a single image then you're about as dumb as a post and will be hiding nothing from anyone who knows what they are doing, let alone what they are looking for.
So, think about what you're hiding and why. If you're hiding financial or other personal information and the like, why steganography? Just use TrueCrypt to lock it down. If you're transmitting the data, encrypt it as a file, encrypt the connection, and physically send the key with a carrier that uses a tracking number. If you're doing something legally questionable, think again. You just outed your intentions on a global public forum and no amount of random data will save you from prosecution.
If you're smart the feds or whoever can only look at the in between. So if your entire com between safehouse -> hq is say ipsec tunnel. The feds are left with simply a frequency factor. That is to say... they cant decrypt coms between alleged talban safehouse and hq... but typically they have only 1-2 connections a day... but suddenly there's 10 on a day. Something's going to happen soon! However if you randomly pipe /dev/urandom for no reason at all. Then they dont even have that.
How do you tell the difference between encrypted random data and encrypted data? pretty unlikely.
However, no matter how well your secret files are hidden, there's one thing which may still reveal you: The very existence of the steganography program on your hard disk. Therefore your steganography executable should be hidden, too. One possible way to hide it in would be to have the same executable do something completely different, unless you give a special, secret option to turn it into the steganography program.
The Tao of math: The numbers you can count are not the real numbers.
While theoretically true, this has little practical relevance. If it is not computationally feasible to distinguish between some pseudorandom number and a true random number (which is essentially the goal of any given cryptosystem), then what does it matter? Sure, mathematical breakthroughs such as efficient factorization and such would completely change things for particular systems, but does anyone honestly care if your statistical analysis produces an algorithm that runs in 2^(n-.00000000001) as opposed to the 2^n required to brute force an n-bit key?
That is the entire basis of modern cryptography. We cannot get perfect secrecy without an OTP, and an OTP is not feasible as a real system. It can only be used as a measuring stick to see how close to ideal a system can get, not as a practical goal.
Computer-generated 'random' data is designed to be difficult to predict. But it does not resemble the noise generated by natural phenomena. Truly random (naturally occurring) noise tends to have sections that appear ordered, and may fall into a recognizable distribution. There are statistical tests to tell the difference.
So an AES file will look pretty similar to a pseudo-random file, but neither will resemble a truly random file. This means that your AES file will NOT look like the noise on an uninitialized hard drive. So it will be pretty easy to spot. And since most people don't carry around random files for no reason, the obvious assumption is that it is encrypted - so you could not plausibly deny that it's not.
Steganography hides bits as errors in a carrier file - such that the file is still usable, but is distorted in a way that most people would not notice. However, it might be possible for a machine to identify it - the chances go up the more data you try to cram in.
So you might as well just carry your encrypted files in plain site as use your above scheme. If there is danger to you in doing this, than you are already in trouble. If it's just a toy, have fun.
Rubberhose (Pronounced Marutukku) is transparently deniable encryption, developed by (among others) Julian Assange.
This seems to do exactly what you're trying to do, so even if you want to go ahead and implement it yourself from scratch, it's worth reading up on what they've done to get some ideas and avoid some potential pitfalls.
Specialist Mac support for creative pros, Melbourne
No, you ALL miss the point. How are you going to explain having a HDD or partition full of "garbage"? Nobody with half a brain will believe you there's nothing encrypted in the noise.
(Yeah, an entropy file would be easy to explain, but entropy files usually don't come in sizes big enough to hide data in, PLUS, who apart from us here understands what an entropy file is? A judge sure doesn't.)
Steganography, OTOH, would be very useful. I have around 50 GB of family photos on my machine, that would make for a nice data storage.
Who is General Failure and why is he reading my hard disk?
Please remember that encrypting data isn't only to protect it from authoritarian governments. It isn't even to protect it from governments at all. There are plenty of other reasons to encrypt your data. We encrypt things to work to keep it safe and private, but not from the government. If they show up with a subpoena, I'll decrypt whatever they need. However they aren't the ones we are worried about. We are worried about hackers and the like. I'm not very worried about a hacker, even a determined one, trying to come and hit me on the head. Crossing in to the physical exposes you to a lot of risks. While I personally might not be a threat to them, the campus police and city SWAT would be, and they'd come when called for that.
In regards to that, there are plenty of legit questions, like can they identify encrypted data, or an encrypted stream or not? Lets say we toss some HDDs. Some are DBAN'd. Others got missed, but are encrypted. Can they tell which is which? If so they can focus their efforts. If not, they will waste a lot of time on dead ends.
It is a legit question, even ignoring governments.
In principle, it should be impossible to tell the difference between encrypted data and a random oracle. In practice, many encryption algorithms leave tell-tale signs. (Suggest taking a look at the 2DEM encryption mode paper on the NIST website for an example.) One-Time Pads, if the encryption key is truly random, are guaranteed to yield data that is indistinguishable from random, but that is the only time you have a guarantee of that.
Modern cryptography is divided over the issue, but my understanding is that if you heavily compress data first and then encrypt it, you will get something much closer to the "ideal" of appearing random.
In steganography, the problem is slightly different. Image data isn't truly random. You can analyze the level of randomness and see if that level is within the bounds you would expect to find in an image. Your problem, then, is not to produce something that looks like a random oracle, but rather to produce something that looks like a natural oracle. I would imagine you could space the encrypted data out a bit and inject garbage that artificially kept the level of randomness within the bounds you would expect to find in an image. Do a bit of interpolation, come up with something that would make some sort of sense if it were all natural image data. Remember, the analysis will be done by computers and computers won't look at the aesthetics or the plausibility of the image, they'll only be looking for whether some algorithmically-defined metric falls inside or outside some given bounds. Algorithms are great for "what" questions, not so good for "why" questions.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
No, they don't. That's the one flaw.
Hint. Take one of the unencrypted image formats - with a relatively simple image, encrypt everything but the image header - display.
You'll probably be able to tell what the image was.
Compressed encrypted files on the other hand are pretty close to maximum entropy.
If you need to ask this question, you shouldn't be developing a crypto tool. Seriously, don't.
There are a million ways to get something like this wrong. Doing it right requires deep domain knowledge, which it seems you don't have.
(To answer the question, the definition of a secure encryption function E(k, m) is that, when k is random, E(k, m) is indistinguishable from random. If you believe that AES CBC mode is secure, then you believe that an attacker can't distinguish AES-encrypted text from random text.)
It's very possible that either or both of your random number generator or AES has some artifacts that make them stand out. Even if you can't find anyone with such knowledge, you shouldn't be too sure that they don't exist. So, just do this:
Run your random number generator numbers through your AES encryption to create your "random" blocks. The encrypted real data and the encrypted random data will be absolutely indistinguishable.
Merely saying "it's not an encrypted dataset, it's just random numbers" cuts no ice, simply because Occam's razor kicks in. Which is more likely- some *suspicious" guy has a 100MB encrypted file or he's just created a 100MB of randon noise just for the hell of it?
You will rapidly discover that all this "innocent until proven guilty" nonsense is merely the stuff of TV drama and that in the real world the burden of proof will be in your lap. The reason is that the security people (and any jury, if it came to that) would all assume that the data is encrypted. So you're left with the philosophical knot of proving that your data is not encrypted and the only way to do that is to exhaustively demonstrate that no key will decrypt it.
Good luck with that!
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
If I give someone one file containing random data and another containing data encrypted with AES, will he be able to tell which is which?"
The answer is no, sort of. With a properly chosen key, and a sufficiently strong (read "good") encryption algorithm, it should not be possible to distinguish random data from encrypted data. Obviously, there are caveats to this, but in general, it should not be case.
The reason for this is that in a true random (or near random) stream (or block) of data, the following two conditions hold (or should hold):
1. the symbols in the datum have uniform distribution, meaning, they are equally likely to occur at any position in the stream. 2. the probability of a symbol appearing at a position X in the stream is an event independent of all symbol have have occurred prior to X. That is, a random stream has maximal information capacity (or entropy), according to Shannon's theory of information. It is impossible to predict what value will occur at a given location in the stream by analyzing the values that have occurred before (which is how you search for patterns.)
A properly secured encryption algorithm (with a suitable key and conditions) attempts - via substitution and permutation - reduce the statistics of a plain text down to symbols exhibiting uniform or near uniform distribution. Once you do that, then the statistics characteristics of the plain text (which is what you use for pattern searching) are no longer there. The information is unrecoverable, maximum entropy, maximum information potential.
You would need the original key (and original conditions) to reverse the uniform distribution of the ciphertext symbols
Without the key, then it is impossible to find patterns on the ciphertext without knowing something about the plain text, or a sample ciphertext other than the ciphertext you are trying to attack. There are attack vectors than can be used, but then this is no longer about trying to distinguish patterns in the ciphertext, but reversing (usually by brute force) the encryption process.
The reliability of an encryption algorithm is inversely proportional to the number of potentially recognizable patterns that remain in the ciphertext. With AES, the answer to your question should be no.
Having said that, experiment, play with it. Just don't use it in real-world products, though :)
Technically it's possible to do what you want to do, but to avoid leaking information and get good performance you have to use non-trivial cipher modes. I suggest you have a look at the documentation for TrueCrypt, which covers most of the mathematics:
http://www.truecrypt.org/docs/
The problems you are likely to run into are related to the handling of identical data in several files. A naive implementation will leak a lot of information.
The steganographic payload has to be protected from damage. If it's randomly scattered on a partition, it has to be marked as used blocks, or it will get overwritten by the OS. If it's not part of a partition at all, then it's immediately suspect. So, it has to be embedded in a file. Same issue here -- the data has to make sense in terms of the file format. Some image formats like TIFF have internal pointers that allow you to make unused areas, but it's painfully easy to read the header and find that you did this.
The only time it really works is when you have a single, unchanging payload. Stick it in a carefully crafted payload and go on you way. But data that regularly gets used can't easily be hidden.
I have just been thru the first 50 answers to this question asked by the OP and not ONE actual answer .
This is the very reason i now have absolutely zero regard for slashdot it used to be a decent web site it is now so full of tossers wankers and idiots it is of no regard at all will someone with some control and authority please get rid of the bloody idiots on here and get it back to being a worth while site to visit .
I may just come back as a fully registered user then but boy it has got to seriously improve and some real tossers have got to be got rid of
Information entropy.
In other words, yes.
Information entropy. In other words, yes.
There is a program called Truecrypt, which lets you encrypt a drive into two partitions with different passwords, so if you are forced to give them your password under duress you give them the 'dummy' password which unlocks only one portion of the drive, which you might have (legal) porn on or something that you have plausible deniability for wanting to hide, but not incriminating.
And of course if you enter in the 'real' password it unlocks your actual stuff you want to hide. Works very well, and there is no evidence of the 2nd driver (as far as I know) until you enter in the correct password
Steganographic attempts are considered foiled if someone can detect that there is a secret message, they don't need to be able to retrieve the message in order for the attempt to be considered a failure. I did my Master's project on hiding data in the least significant bitplane of imagery. The trick is to "randomly" scatter your secret message throughout this plane. I showed methods that would allow you to do this so that the data was indistinguishable. You should always encrypt your secret message first so that it looks random, or better yet, shape the statistics of your encoded message to match the noise characteristics that were in the original LSB plane. If you use an image created from a very noisy source, such as a digital camera, and you encrypt the embedded message and scatter it using a reversible algorithm, and iteratively ensure that the statistics of the altered LSB plane look the same as the original LSB plane, I proved that it is not possible for someone to tell that there is a secret message hidden there. However, you need to be careful to use an original image you created yourself, and to destroy the original, because if someone ever compared the original to the one with the embedded message, they could definitely tell there was something altered by comparing the LSB planes.
Would you be kind enough to provide Slashdot readers with a pointer to your master's thesis? I would like to experiment with this.
Why don't you use existing tools?
TrueCrypt will provide plausible deniability through the use of hidden encrypted volumes inside of apparent encrypted volumes.
http://www.truecrypt.org/docs/?s=plausible-deniability
If you find a file on my hard drive with data you can't readily decode, is it:
A) Compressed with an unknown compressor
B) Encrypted with an unknown encryptor
C) Random bytes used for an encryption process
D) Random bytes used for something else
I can't prove that answer D is wrong... but I don't have to because I know that 99% of the time, it's one of the other answers.....
OK, let's, as a community, add an (E). Everyone create a file on your laptop, in your home directory, named random.bin, as follows:
dd if=/dev/urandom of=random.bin bs=4096 count=10000
The actual value of the count isn't important, as long as it is large enough to create lots of random bits. If lots of people do this, we have “(E) Random bytes because Slashdot told me to”, providing plausible deniability for anyone who needs to use that file to encrypt something important.
With cryptsetup, make sure you use essiv:sha256 or AES LRW. There are watermarking attacks against the earlier versions of AES-CBC with unprotected block-based IVs used by cryptsetup and truecrypt.
With the exception of NULLs, encryption such as AES will eliminate any and all repeated bytes just like compression.
Random data greater than a fairly low number will have repeated bytes.
Steganography though is different. It is not encryption nor compression, it is concealment via obfuscation or confusion.
Ok, you've peeked my interest, I'm going off to program a proof of concept using AES 256 CBC, and replacing the least significant data in an image with it. Curious about the % of the image file I can replace while maintaining a high enough image quality to avoid suspicion.
In Q2 2009, a company called Forensic Innovations, Inc. claimed to be able to distinguish random data from TrueCrypt containers. See http://www.forensicinnovations.com/blog/?p=7 However, I downloaded their tool (Version 2.23 of File Investigator TOOLS) and it identified truly random data as a truecrypt container... In fact it identified any container of truly random data bigger than a certain size as encrypted data. Therefore, their $249 tool (or this particular version at least), which claims to "Detects Encrypted Files, including TrueCrypt" is not performing as advertized. Try it for yourself.
It would seem that if you used TrueCrypt or some other disk encryption program, the whole disk would be encrypted, even empty areas. Then keeping other data in encrypted containers would not be detectable. Make the containers as small as possible and rename them with a .dll or .exe extension. Someone else will have to comment on the validity of my comment.
This is in fact a major design principle for modern ciphers, so forget it.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
A cipher is basically just a random sequence generator seeded with the key and an initialization vector (so the same key can generate different sequences), so the answer is no; you can't differentiate a data stream that has been XORed with a random sequence, from a random sequence. In practice though, if you see a random sequence it's a safe bet that it's either encrypted or a decoy. In some cases you can detect handshakes and key/iv exchanges (like diffie-hellman).
Encrypted files have maximum entropy, just like absolutely random files. Basically, you can't tell which one is which. However, absolute random noise on a disk isn't all that usual, so any encrypted file (or pure random file) will stand like a sore thumb: it will be highly visible. But, again, you can't tell the difference.
Absolutely correct. Any "investigator" who finds a pure random file will immediately suppose that it contains encrypted data. I mean, what else? Compressed files are not random, and there is no real good reason to store gobs of random data on a disk. OK, maybe you can come with a good reason, such as doing research on random numbers, but that will be highly suspicious.
On the other hand, it is possible to systematically add entropy to a file. One very simple way may be to consider the random bits as codes in a variable length alphabet, much like a Huffman code. You can then "decompress" the random file using the variable length code. Voila, a larger file with the desired entropy/redundancy. It will look like binary data, not encrypted data.
We've discussed this on MetaOptimize.
Short answer: Download an empirical testing script like dieharder, and see if the encrypted output looks "random" under this battery of tests.
What happens if you use the old "torn sheet of paper" routine?
Each drive or device moving from A to B goes with a different courier/ISP/method and no "piece" contains enough information to be identifiable or usable.
All the pieces need to arrive at the destination to be able to be re-constructed back into usable form.
Any time you send a complete message in one burp, one hard drive or one CD or one image, there is a chance for decryption by any number of accidents or threat of death to all your family members one person at a time while you watch.
No encryption was used in the creation of this message...thus I have deniability.
as far as I know you can calculate the randomness of a sequence of 0's and 1's.
it has to do with the distribution of the length of strings with the same value.
taking the true crypt model as example (hidden partition in encrypted partition) the hidden encrypted part hidden, it should at least be indistinguishable from the not hidden encrypted part, so they should be equally random. also unused space in both partitions should have the same amount of randomness as the used space.
Privacy is terrorism.
If you can distinguish encrypted data from random noise without knowing the encryption key, you've found a weakness in the encryption algorithm! That doesn't happen too often these days. I'm sure the job offers will be lining up once you publish.
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
Advanced steganography is indistinguishable from propoganda.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
nt;
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
This is a technically a moot point. The police do not have to prove that there is encrypted data on your storage device. YOU must prove to their satisfaction that there isn't. In any criminal lawsuit the prosecutor will simply inform the jury that you are non-cooperative and deliberately withholding vital evidence (even if your file actually is just random data) and use that implication to make you look even more guilty.
Also remember when traveling overseas that in many countries it is ILLEGAL for you NOT to turn over all encryption keys to the state when they demand it. They tried that in the US once (remember the "Clipper Chip"?) but fortunately it failed.
I've used recurrence plot analysis and surrogate data testing for this. Both are more suited to time series analysis, but can be used with any data. In principle these examine compressibility, and any form of compression could be adapted and give you a yes/no), but these give you meaningful statistical analyses (if yes, then how much). Be aware that your random isn't rally, and so will give you a non-zero result. But you'll get very close to the same very small result, whereas with surrogate it'll be different and some data set will acutually improve with scrambling. But you should bw ablew to trell yes/no ns how diferent dfrom random
"I may be synthetic, but I'm not stupid." -- Bishop 341-B
In cryptography, reinventing the wheel is very bad - you usually end up with something that's broken in a way that isn't at all obvious. It's better to go with something that's been peer reviewed. In this case, the product you're looking for is TrueCrypt. TrueCrypt volumes look like random data, and you can have multiple encrypted volumes inside the same container. Each volume has either random data in its free space, or another volume that looks like random data in there. So you make one or more decoy volumes with your tax/banking information, non-incriminating diary, etc, and then put the thing you're really trying to hide in another volume. Since resizing TrueCrypt volumes is very inconvenient, you have a plausible reason for making it too big.
Also note that regardless of how well hidden your steganographic data is, the fact that you have steganography software installed (which you can't effectively hide if you want to use it) is enough to damn you if you're trying to make it look like you don't have any encrypted data.
Most people don't have ready access to 512GB of pure noise. The ONLY feasible way for most people to get that much random data is to use a pseudorandomized function.
Pseudorandom functions are still deterministic. If you store the key you used to generate the data in the first place (say by making the key a hash of the password), you can regenerate the data anywhere you need it. Instead of storing 512GB of data you could store the algorithm and recompute the data on the fly.
Of course if you DID this, your algorithm would basically just be an inefficient variation of AES.
(PS did you consider that most people barely have 512GB of HDD space, let alone RAM? You'd need to do a multitude of random accesses off of a spinning disk for each encrypt/decrypt operation. PAINFUL.)
Sounds good!
What's the ratio of jpeg file size to how many encrypted payload?
Or, the other way around, to encrypt+hide 1 MB how many
MB of jpegs would be needed?
(I am guessing that you are saying the ratio is less than 1 to 8)
Also did you look at audio files?
Stephan
http://stephan.sugarmotor.org
Why does OP talk of offsets? Because he wants to hide more than one partition! Give up two to police in a pinch. Is there a third? They can't know...
wouldn't it be more useful to mod your laptop so it's got two drives, the goodies being on the drive that only becomes accessible when you put the little magnet in the right place to energize the reed relay that powers up the dark drive? No seeum drive in directory, no drive there, Kemo Sabe.
is it possible to prove there is encrypted data where you claim there's not?
prove 100%, no; prove beyond reasonable doubt, sure -- unless you happen to be really good at convincing juries that carrying around a hard drive full of noise is common practice
I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
If you use AES in ECB mode, then the answer is that it's usually painfully obvious that the original data was structured.
If you do use chaining (CBC, or something similar), then it will look quite random.
Is there painfully obvious structure in data encrypted with AES in a counter mode, where the key is some function of the IV and the offset? Counter mode is easier to parallelize across multiple cores than chaining modes.
Cryptography is hard, very hard, to get right. If you don't know if you can distinguish random data vs encrypted data, then either (1) you are an crypto expert and this is an unsolved question like P=NP, or (2) you know very little about crypto.
Well, since you are asking HERE, my bet is (2) is the actual situation.
So my answer to you would be: STOP, don't write this program. You are not qualified, and anyone using your program (including yourself) will be given a false sense of security which may end up worse than knowing your data is not hidden.
If you insist on proceeding, then first plan a couple years of self education before you start. And /. is NOT the place to begin.
Oliver.
The op wants to know if the output is distinguishable from random garbage. He doesn't ask about the difficulty of decrypting.
One of the goals of most crypto systems is to generate output that approaches random garbage.
The question isn't whether the file with the AES'd data can be decoded, it's whether a third party can detect which file is which. To that end, I would say the odds are fairly low. Especially if the OP is embedding the encrypted data inside a block of noise.
The GP is dead on, AES was designed with exactly this in mind.
Your arguments about compression arent precisely true. Its true for archives but not true for a wide array of todays general purpose compression algorithms that use high order arithmetic encoding, and are bijective.
The bijective class of compressors are ones where any arbitrary stream is a valid input for the decompressor.
If you think that you can find entropy that isnt squeezed out of text with algorithms like PAQ, then you can also eliminate it and improve PAQ and win some cash offered by the Hutter prize. We are talking about algorithms that compress English text to about 1 bit per character for large inputs.
"His name was James Damore."
Steganographic attempts are considered foiled if someone can detect that there is a secret message, they don't need to be able to retrieve the message in order for the attempt to be considered a failure. I did my Master's project on hiding data in the least significant bitplane of imagery. The trick is to "randomly" scatter your secret message throughout this plane. I showed methods that would allow you to do this so that the data was indistinguishable. You should always encrypt your secret message first so that it looks random, or better yet, shape the statistics of your encoded message to match the noise characteristics that were in the original LSB plane. If you use an image created from a very noisy source, such as a digital camera, and you encrypt the embedded message and scatter it using a reversible algorithm, and iteratively ensure that the statistics of the altered LSB plane look the same as the original LSB plane, I proved that it is not possible for someone to tell that there is a secret message hidden there. However, you need to be careful to use an original image you created yourself, and to destroy the original, because if someone ever compared the original to the one with the embedded message, they could definitely tell there was something altered by comparing the LSB planes.
That's a fascinating idea.
Lots of 1600 ISO images from your camera. Feed them into the software so it can analyze the noise characteristics for imitation later, then it loads the image into RAM, modifies them, and overwrites the old image on the harddrive.
Gotta love slashdot for posts like yours.
> Gotta love slashdot for posts like yours.
Yeah, uh, me too. Thanks masters dude.
Do daemons dream of electric sleep()?
The key here is whether or not the "empty" space of an encrypted volume can be differentiated from a "hidden" volume stored within another container volume. In essence, you have given out one top level key, but have not given out your lower level key -- so can they even PROVE that their is a hidden volume within the outer volume.
My suggestion to the authors of TrueCrypt, etc. that should solve the problem nicely -- when filling the empty areas of the drive, DON'T USE RANDOM DATA. Actually encrypt MEANINGLESS but totally valid data and store it instead. A nice simple means of doing this would be to have TrueCrypt start doing random fetches from Google, Yahoo, or some miscellaneous RSS feeds. It could also liberally mix in sets of NTFS file headers, JPEG headers, etc. In essence, lots of totally useless, but totally "valid" data.
To make matters even worse, make sure that every "fake" JPEG, PNG, etc. stored includes Steganographic data -- from a randomly chosen RSS feed or something similar.
Basically, if ALL data stored in the file includes hidden data by default WITHOUT the knowledge and/or control of the user of the software, then the ability to argue as to the presence of hidden data in court becomes moot -- yes, there is hidden data there, but no, I didn't put it there and no I don't know the password.
It would also be quite useful to have TrueCrypt, etc. create multiple hidden volumes containing random files downloaded from public sources with random passwords -- and never even inform the user of these passwords or the contents. These passwords could even be chosen to be semi-hard passwords based on a dictionary word plus number combination that would eventually fall to a standard dictionary attack -- yielding a trove of useless data.
In essence, if it is hard to see the forest for the trees and you like it that way -- PLANT MORE TREES.
The GP should record some LP, at 24 bits per sample, at 96 kSa/s, in stereo. It wouldn't be too unusual, especially if he picks a well known music. Classical music will be particularly good here.
How about this one? http://www.youtube.com/watch?v=Nz0b4STz1lo
In general, it is safe and legal to kill your children. -- POSIX Programmer's Guide
Useful data are not random. If it looks random, and you're devoting disk space to it, the investigator will assume that it is encrypted, or is key material for encryption (e.g. a one-time pad). Why else would you have lots of random data around?
I ran across a website years ago where you could type in your plain text, e.g. "I mow the lawn every Thursday." and it would encrypt it to some other plain-language sentence or paragraph, e.g. "The merry-go-round is painted yellow with red stripes. Neighborhood children rake cans from the park."
Anybody know what this is called or where that website can be found?
"will he be able to tell which is which?"
Maybe he will not, but she will always know when you have something to hide...
(women are smarter + and a sixth sense)
The hard drive may have been freshly bought off ebay (leave a record!) or a garage sale or a gift (make sure the person that provides you cover is willing to back your story and is beyond the legal reach of the goons from the country you are dealing with). A disk whose earlier life was being a part of encrypted RAID is fairly likely to look like random garbage. Alternatively, you can claim it is YOUR disk, which contained sensitive data and you did a complete DoD-grade wipe with /dev/random last pass before shipping. (Or, better, used a software that does it for you.)
The best alternative would be modding the hard drive firmware to look like empty or so.
If you have a clean room, you can disconnect the heads or simulate other kind of a failure, and claim the disk is faulty and goes for data recovery. Without a clean room, remove the circuitboard from the disk, cover the right contacts with insulating tape, and mount the board back; the disk will behave like not spinning up or having broken heads. A technician carrying a broken disk is a good "legend".
Unless your approach is based on an accurate input-output model of the process that produced the image, a sufficiently resourceful attacker could probably break it. For starters, did you consider all higher-order statistics?
In my iTunes collection I have many GB of audio and podcasts, and I assume that the first bytes of all the blocks would be different. If only I could just get the files to fragment in such a away that when reading the raw device the first character of each sector was the values I wanted... I could even use RC4 or some other stream cypher to generate block offsets, giving a password and further defying analysis. As long as I don't defrag the disk, my data is safely hidden in files that just look like files.
Just hope I don't need to update the data often, as read and write time would be rather shabby...
"If you really want to transport some material across the checkpoint..."
Put your damn encrypted information on a website you control, and get it when you're safely across the border. OK?
Keep the keys on a different website.
Sheesh.
search google for denyfs ;o)
What if you had an encrypted partition with multiple passwords- each password decrypts different files from that partition. You could go even further and set it up to generate random files and passwords, along with some parts that are just random data that do not correspond to a password. If someone asks about the encrypted partition, you type in a password and the corresponding files come up with no indication there were more passwords. If the size of the files you showed them compared to the partition makes them suspicious enough (or some other indicator), you can use more passwords, but they would have no way to know whether you've used all your passwords. Of course they would realize your security is at paranoia-level, so if you explained the random passwords, you understandably would have no way to actually show them everything on the partition.
On the up-side, instead of trying to convince someone that encrypted partition is just random data, you actually decrypt stuff for them- plus with the random passwords they can't prove if you're lying. On the downside, you can never show them everything (unlike a 1-password system), so there is no "cut your losses" route. This leads to two questions: could this plausibly work, and how bad could it be for you should law enforcement ask to see what's on your computer?
Other encryption methods leave you dead in the water should someone figure out you've got encryption- by being out in the open it may be more dangerous (it always raises suspicion), but it possibly mitigates the damage that suspicion can do.
My webcomic
Give yourself a cookie. You've just invented steganography.
"What interests me is that it recorded approximately eighteen hours of static."
http://www.imdb.com/title/tt0118884/quotes?qt0379375
Atari rules... ermm... ruled.
Create a big file, filled with random data.
At first create a cryptsetup file with luks. Put some files on it.
Then create another cryptsetup file with offset, like you said.
Now, if somebody asks what is this file, then just tell it's a cryptsetup container with some file, you can give the password.
The, allways imagine that the attacker know the first password.
And. you have to be really carefull:
-If you continue to write/delete data on the first container, there are risk that you overwrite the second container, so write some files, then never use the first container again.
-If you use a filesystem on the first container that write some metadata anywhere on the disk, there are some problem too: if we imagine than the second conatiner is filled, from the first container we will have the filesystem, and the metadata will be overwritten by the second container. So, this is highly suspicious. Ext have this problem, FAT have not.
-The second container should not use the Luks. Luks is a known type. It could be suspicious if somebody can see a luks header in the middle of a file.
So you have to use a first container with FAT, filled with some files. But it could be suspicious that you use a FAT filesystem on a linux box.
Your defense line should be:
-I used a long time ago cryptsetup with FAT in order to do some tests under windows, with freeOTFE.
-I forgot the password, try 1234 or password.. Mmmh, it's 1234, there's nothing interesting in this file, it was only for some tests.
-I did not delete the file because I have plenty of free space on the disk, I will, one day or another.
Elettra seems to already do what you need:
http://www.phrack.org/issues.html?issue=65&id=6#article
Hasn't this whole subject already been addressed by Truecrypt?
they probably won't bother to hit you with the wrench until you tell them.
That is - it won't hurt THEM.
And besides, a little preemptive beating with a wrench never did kill anyone.
Anyone doing the beating that is.
Mit der Dummheit kämpfen Götter selbst vergebens
OK, I wanted to try to find out if there were encrypted data at some offset in a chunk of random data, I'd start with Knuth's tests for randomness. I'd break the thing up into decent sized chunks (1 meg or so) and run a bunch of different randomness tests on each chunk and on the whole data set and see if any patterns emerge.
The thing is, even if the encrypted data looks pretty random, it's likely to look DIFFERENT than the surrounding random data.
The worse problem is that if you have someone who's asking you if there is encrypted data, and they find some bogus pattern in the random noise, then you've got a problem because you can't prove that there ISN'T any data there. If you are being prosecuted in a normal US court, you might get away with this (if they can't prove that you've got anything encrypted, it may be hard to hold you in contempt trying to get you to give up the keys), but if you fall under the sway of some intelligence agency that doesn't like the look of you, it's not likely that they'll just let you go because you claim there isn't any data.
A thousand pounds of wood moving at 300 feet per minute. Don't get in the way.
When you use full disk encryption only the crypt header reveals your using it. Using: cryptsetup luksDump /dev/device
You can get the length of the header to make a backup:
dd if=/dev/device of=./backupfile count=length
Then you can overwrite it with:
dd if=/dev/urandom of=/dev/device
Now your disk cannot be recognized as encrypted, storing the cryptheader on a micro-SD card allows you to quickly eat it when necessary.
You can also carry a very strong magnet Neodymium magnet, this allows you to wipe your harddrive in a second.
What if a few popular distros simply had, turned on by default, a low priority process that wrote random data to free blocks?
Then you've got a perfectly legitimate excuse for a drive full of random data. You really can't beat, "it's a security measure enabled by default on Debian," as excuses go.
why bother with crypto on HDs? Simply do as many businesses do. Move your drive image to a server, wipe the drive, reinstall the base OS, and once you're at your destination, open an ssh channel to the server and rsync your data back to the drive. Of course, you might have to rent a colo for a day to get access to a pipe big enough to move that data in a reasonable length of time.
If ctypto use is illegal at your destination, what are you doing there?
Tech Public Policy stuff
If the encrypted portion is randomised then it will stick out like a sore thumb against all the non-random file junk that fills the rest of your drive.
Simple randomness bias test and Bob is your oh-yes-and-what-do-we-have-here-that's-worth-hiding monkey.
What you want to learn to do is hide the data amongst the other data in the redundant spaces.
Just as with your laptop. Create a fake login.
So when someone beats you with a phone book or a tirewrench, you can say "the login is jdoe, password 123!" and they'll login and see your not so important files. When actually your login is janedoe password abc.
Same applies with encrypted partitions in your setup. Have a partition A at index N and a partition B at index M. A,N is the fake one, complete with files recently modified (.bashrc and cron will help with that). And B,M is your normal secure parition.
In a moderately large file, it would be very suspicious if no two blocks were the same.
I'm a Programmer. That's one level above Software Engineer and one level below Engineer.
You might encrypt and steg away, and then use a bit of software to remove the record of said file(s) from the fat/mft and mark those sectors as "bad". Email yourself the backup fat/mft, travel, and reverse the process at your destination.
Or you might (in a rush) encrypt and steg away on a partition, then mark the partition as unallocated while you go through security. Won't fool a reasonably competent computer tech, which is not going to be a problem in most countries but might be a risk in America, seeing as how we've offshored enough IT that your airport security grunts might just be very "technical", indeed.
Orwell: "In a Time of Universal Deceit, telling the Truth is a Revolutionary Act"
Couldn't you just establish a code using the content of the images? Something using the names or hair colors or whatnot of the subjects. Then just make a slideshow of family photos in a specific order and name it TOP SECRET MESSAGE.
Don't worry about hiding the data - there are many ways of doing that. Worry about hiding the software that accesses it. The thing that gets most folks using steganography caught is the `investigator' finding steg. software on their machine. After that it's just a question of searching through each of the formats it does or threatening them with obstruction of justice / other crimes until they tell you what they used it for. Or, at least, that's what I learnt from the high tech crime squad...
By stuffing files with encrypted random data (indistinguishable from encrypted compromising data until decrypted) you dilute the the attempts of the Other Side.
Even with weak systems such as DES decryption without the key is enormously more time consuming than encryption.
If the Other Side requries a 10,000 core server farm to break 1 message a minute, and you can keep them busy by generating 1 false message a minute with a single core, and still have cpu cycles to spare, you are ahead of the game.
A second reason for encrypting random data: Thwart operational traffic analysis. A darknet that spends free cycles moving encrypted random chunks around makes it more difficult (impossble?) to figure out who is talking to who.
Third Career: Tree Farmer Second Career: Computer Geek First Career: Teacher, Outdoor Instructor, Photographer.
Ideally encrypted data is indistinguishable from ideally random data and is also indistinguishable from ideally compressed data. Our methods are not ideal, so practically encrypted data may be distinguishable from random data and from practically compressed data. In particular, as simple a thing as a header describing what kind of data follows may give it away! But even without such a header, there may be statistical differences. Other things can slip by: If the user keeps backups, it would be odd that some blocks of data that should be random and unused change from time to time, as seen in backups from different times.
One approach may be to create organized data, encrypt it with a random key, and discard the key without ever revealing it to the user. It should be very difficult to distinguish encrypted data for which the user knows the key from encrypted data for which the user does not know the key. Even then, the problem of some blocks changing (or not changing) in backups remains.
How about killing two birds with one stone: Create a new picture file type. Call it jpeg-V (for verify). It would have to have a major feature making it worth widely adopting. As part of the spec, it would contain a 1024 byte random string. Heck, call it a unique identifier string to facilitate indexing.
But ...
There's nothing to prevent encrypted date from being buried in there. Maybe file pointers so pieces can be appended and also so you'd need both the password and the starting file.
Now just populate the web with billions of pictures, most perfectly innocent. A thousand photos, taking up around a GB or two or three, might hold a meg of encrypted data.
Now you're faced with just what the original question asks: can random and encrypted strings be indistiguishable.
The world is made by those who show up for the job.
Essentially, when encrypting you are just XORing data with a key. The apparent randomness of the encrypted data depends on the randomness of the key. So, if you are using well generated, highly random keys, then no, you should not be able to tell the difference.
OK, I've got my big file of 1's and my even bigger file of 0's.
What is the next step before: 3. Profit!
From your description of what you think you're doing, the odds are fairly high that you're not implementing correctly. As a recursive proof, your question suggests doubt (a newbie would be done with the hack by now), your doubt suggests fear, your fear suggests potential wisdom, your potential suggests its own answer, your answer suggests you should listen to yourself.
``Tension, apprehension & dissension have begun!'' - Duffy Wyg&, in Alfred Bester's _The Demolished Man_
Go buy a PS2 game or two, and make image-dumps of your DVDs. Then, look up how to slim-down PS2 Images (Some games can go from 3-4GB down to 1GB-1.5GB) because of unnecesary stuff on the disk.
It seems like because PS2 formats are relatively unknown compared to standards like MP3, JPG, etc.. You could hide lots in extra similarly-named files as to what is on the PS2 disc, especially because if someone looked at the file size and expected it to be DVD sized, they wouldn't know better if it was 1.5GB of game, and 2GB of hidden stuff :)
If he was stashing it in the LSB's then he was using uncompressed BMP most likely. The first thing JPEG throws out is the LSB's for lossy compression.
If you have information that they want and it's encrypted, the laws no longer apply in situations like these. They'll covertly torture and drug you until you tell them some lies or tell them the truth.
Only governments have the resources to protect Alice and Bob from Gordon. Basically only governments can use encryption, you can experiment with it to see how it works but you wont be able to practically use it and it doesn't matter what algorithm you choose because all algorithms are only as strong as Alice and Bob, and usually Alice and Bob are physically defenseless.
They can torture you for the rest of your life whether they find out or not.
The "encryption" is usually more than just an "encryption cipher". Especially with "encryption solutions", such as GPG or dm-crypt, identifiers are often automatically added to designate the file as an encrypted/signed file. For example the GPG ascii armored format.
In other words: maybe. I think the issue may come down to whether or not the signing / authentication's structure / algorithm is identifiable from the format of the encrypted device / file. Identifiable by things like the bits where the keysize would go is 4096, and another part where the filesize goes is, coincidentally, the exact size of the file, and so on and so forth. With a few other bits of information, it becomes identifiable enough for practical purposes. Like for a jury to put you to death. So again: maybe not for Slashdot, but maybe enough to put you to death.
Look into whether the file has a container or something with metadata in it. If your using dm-crypt on a file and not telling it all the parameters, IOW its somehow finding out itself, then there is a likelihood the file is identifiable.
Yes, XORing each plaintext block with an encrypted block index/counter would be enough to render the output theoretically indistinguishable from random data. This is "CTR" mode, and I'd be surprised if most hard drive encryption systems aren't using it.
http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Counter_.28CTR.29
Who is to say that that file of random data isn't your encryption key for something else?
Use it several times, 4 pass.
Some include encryption.
See http://webcache.googleusercontent.com/search?q=cache:mw0S9v_ew1UJ:www3.sympatico.ca/mt0000/bicom/
Does anyone here actually use stenography for more than fun or school or some side project? Other than spy vs spy stuff I can't imagine using it on a regular basis.
To use it properly you would need to have your decryption software installed on a totally different computer, otherwise when they look for the secret encrypted stuff they will see your stego encryption/decryption software and be tipped off. Also, since you can't hide much data using stenography without tipping someone off it seems to be of limited use for nearly all real world applications. I particularly can't see a business or even a government using this to protect their data.
Don't get me wrong. It seems like a cool thing to do. So, if you are using it regularly for a reasonable purpose, what exactly is that?
Ninjas don't carry tic tacs
No, they don't. That's the one flaw.
Hint. Take one of the unencrypted image formats - with a relatively simple image, encrypt everything but the image header - display.
You'll probably be able to tell what the image was.
Compressed encrypted files on the other hand are pretty close to maximum entropy.
My favorite demonstration of why ECB is almost always the wrong choice for block modes of operation, regardless of the algorithm, can be seen in the Tux series of pictures on this page. http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation It's a great picture to show people when they don't understand why "just encrypt the data" isn't secure enough, and why they need a competent cryptographer.
But for this discussion, notice that your eyes can not detect a pattern in the encrypted data, even though the source is an uncompressed BMP file.
John
Two problems:
#1 This only applies to common types of encryption, but you claim is not general to all types of encryption. For example, a one time pad the same length as your data set should be impossible to distinguish from random.
#2 Considering that Kolmogorov Complexity is not computable, you haven't really identified a weakness. I suppose that an attacker could try to compress the data in question and see if it gets smaller, but they can't compare it to a known Kolmorogov complexity, so the test that you propose will still have a non-zero error rate, even if a shorter key is used.
If you do get caught and want to avoid disclosing said key to authorities, it's as simple as creating a random 256 byte random file and using that sole key as your encryption key. When you know that the authorities are chasing you, the easiest way to get rid of all evidence is to write directly over said random key using urandom. If they have no evidence, they have no proof and the case in court wouldn't go so well for the defendant. Read about legal law and see how evidence works in your state/country. If all they have in court is a random byte file, what can they really do without a means of opening the file. They have no proof that there was anything there, as the encryption key just seemed to stopped working. Explain that your personal diary was in there and that you have a Backup of it at home in yet another encrypted container. Be sure the container at home does have your diary and be sure to write in it every few days...
This method has everything needed to avoid problems with authority, you have an alibi, the container contains your diary and the encryption key just seemed to stop working(file corruption can happen/virus infection). Also be sure to not have any backups of this key file, if your drugged you have reveal said back-up location. Just before being drugged have in your mind one single thought your private dairy at home, the less you think about what data your actually hiding while drugged, the less likely you will reveal this information to the authorities. If you can actually get ahold of said drug they use in your country of origin or departure, have a trusted friend try the drug on you and do a test run to see if you can prevent revealing the information to him/her.
Basically, in order to protect your data, you need more than encryption, you also need confidence that you will not disclose the information in whatever situation may arise. Think towards the future and guess who may attempt to reach your data and practice protecting yourself and your data from being disclosed.
I may agree with you,that really very important.