Battling Steganography
An anonymous reader submitted a fairly thin little story about a researcher who is Battling
Steganography. I can certainly see the appeal of the study but it really seems like a needle in a hay stack sort of project. And when you actually can detect one technique, new and better techniques will crop up and take its place.
Last I heard, the FBI doesn't go around busting people for passing around what might be secret messages. I know there's been complaining about a general erosion of rights and privacy in the US, but I doubt it's gotten that bad.
Was it just me, or did the article make it seem like anyone that would use steganography would be a criminal? Since when in a 'free' country should the ability to hide a message be of interest to the "legal community"?
It's hard to tell the cool to chill, my favorite hotel room has a view to an ill.
Let's say I wanted a message to be available to a wide number of people, hidden with stenography, and encoded as well. I pick a image, such as an X10 ad, that could be easily found from a "legit" source. I encode my message, then hide the encoded message in the least significant bit of the color for each pixel of the image - net effect, the ad looks just about the same, but there is data encoded in it.
If I knew messages were being passed this way, I might be able to get the message. First, I'd have to acquire the source image. Then, I would do my own diffs, and try to find the meaningful data. At that point, it's a decryption problem.
But how do I detect the data hiding in the first place? I would have to detect that a stream of data is very similar to another stream of data, but with minor differences.
Let's say I've solved that problem, and now have some signature, such that all identical data streams have the same signature, and very close streams have very close signatures. Then, I have to catalog data streams as they pass by, assign signatures, count instances of signatures, and call a hit when signatures are significantly close but not the same. A quick visual check can confirm the match.
Back to my original thought - instead of a data stream representing an image, what if the data stream represented the subject line of an e-mail, or the e-mail itself? A central database could manage signatures, automatically reported by e-mail clients that generate the signatures. When I get a new e-mail, I can get the signature for the header, and send it to the database. It could then report "that might be spam", and I could delete it without downloading the whole message. I could also download the message, upload the signature, and the database could say "that's probably spam", and it could be deleted or moved before it shows up in my Inbox. With many people uploading signatures, the database could quickly generate the average signature and the variance of the signatures, with people double-checking "Yes, I consider this to be spam".
A couple of benefits would be that, hopefully, the signature doesn't give much info about the text, so it would be safe to upload signatures for personal email. Also, it may be fairly easy to get enough responses to be statisically certain that email with a particular signature is spam, so that many would benefit from a randomly chosen few who choose to respond that an email is spam.
Of course, it may be impossible to generate that signature, or the signature may be long enough to identify the text of messages. Still, I could see that as a benefit of this kind of research. I'd also like a way to auto-respond "You have been found guilty of forwarding hoax emails. Please stop and desist." to just about everything my family sends me...
Excuse me ? Did I wander into The Onion by mistake ?
Thanks for the complement. It was a lot harder than I'd expected it to be.
Certainly there are tools out there that put together random, sensical-looking text with specific patterns in word usage, punctuation, spacing, whatever, to encode messages, but to actually tweak a message with intrinsic meaning in itself is a bit more difficult.....
An large number of people in this discussion are entirely missing the point of what Farid does.
Let's put it this way. If Farid alone can crack a variety of steganography, then the NSA or whoever it is who really want to invade your privacy. If he was trying to crack RSA or DES or PDF's ROT13 encryption, he would be praised - do you really think that steganography is somehow special?
So the article was rather uninformative. I've met Farid. He's a very cool guy. He's working against things like SDMI - which is a form of steganography. As part of a lecture he gave, he showed how to defeat various watermarking techniques for images (without getting arrested, even.)
Consider that when you say "battling steganography is battling privacy! We must hate him!" you are using the same logic that put the DMCA in place. Congratulations.
Win dain a lotica, en vai tu ri silota
Well, Skylarov is currently in jail for breaking ROT13
That lack of certainty really isn't that big an issue, because with a good idea of what percentage of images are false positives it would be fairly simple to look for image sources where the percentage was well outside the norm.
All of this would of course be very resource intensive and would require access to large amounts of data (Omnivore, anyone?) but it's far from outside the capabilities of most governments.
Possibly also of interest to people is Benford's Law, which relates to the distribution of numbers - turns out that in many areas it's very simple to identify real data vs random data, because real data has some definite non-random properties.
fencepost
just a little off
Good steganography is essentially the same as adding random noise to an image. You can structure the noise any way you like. There are lots of images that plausibly contain lots of noise, for example images taken in low light and images scanned from film. As long as you don't insist on a very efficient steganographic embedding, there are undetectable steganographic methods. Farid's research is pointless, and it is scary to think that courts may start relying on it.
TCP windows are probably a bad place. They tend to follow very well defined behavior, and often only change in direct response to other packets in the stream. For example, during a one-way bulk data transfer, the senders window will rarely change at all. The receivers window will usually change only by the amount of data received in a packet. All very very predictable.
Suppose one gets caught with such an image. According to him, the technique has a 90% chance of success. So what about the 10%, wherein, one has no message encoded in an image, but triggers tha alarms anyway?
The 10% miss rate in and of itself should still represent plausable deniability. If you take standard legal practices, a 90% probability of a "match" is still weak enough that it would require other supporting evidence, circumstantial or otherwise to present a reasonable case.
If you get caught by the FBI, what can you say?
Caught how? It's not illegal to embed hidden messages in images, just as it's not illegal to hide a plot in pornography - though both are equally unlikely.
I just more more eveidence than this is required for a warrant to be issued.
IANAL, but a 90% probability that you're engaging in a perfectly legal activity doesn't seem, on its face, to meet the burden of probable cause necessary to perform a legal search and seizure.
My car gets 40 rods to the hogshead, and that's the way I likes it!
No, this would actually be really cool. Make an Apache module which automatically inserts something steganographically into every JPG it serves. Some people put encrypted data into the images, and others just direct it to read from randomly encrypted gibberish. Then the government has to deal with lots of script kiddies who think they are cool by embedding Brittney Spears mp3s into the images from their webpages.
So, how can the algorithms mentioned in the article (which is interesting, but rather short on facts...) distinguish between the noise added by a steganographically embedded encrypted message and the noise caused by a slightly underspecced A to D converter?
You're right, there isn't too much of a difference between random noise and an encrypted communication. If you had a pure digital stream that had just been converted from analog, you could stick data in the least significant bits and no one would be the wiser. For example, a CD is just a sequence of 16 bit words iterated 44,100 times a second; you could just replace the least significant bit in each word with bits from your hidden message and it would be indistiguishable from random noise.
The problem arises when you try to compress digital information. These compression algorithms use the most optimum way to represent data that they can find and discard the least significant data, so they would completely destroy the afore mentioned hidden message. To hide data in a compressed file you need to play with how the compression mechanism stores the data, and the resulting file is most probably not going to be optimally compressed when you're done. What this guy is doing is looking at how the information was compressed, extract the overlying data that was being stored, and making sure the compression algorithm was indeed optimal. If there are any odd quirks in the compressed data or it doesn't look like the compression was optimal, it may be because data is hidden inside.
I hope this is a good enough explanation. I'm short on the examples but the underlying ideas are pretty basic.
You might also want to check the techreports that I published about my research.
At HAL 2001, I presented on Detecting Steganographic Content on the Internet. You might like that.
Dartmouth certainly seems to know how to do PR. I would just like to know where their publications are.
You might say that 90% is no pretty significant. But considering how many actual images are there out there with actually no steganographic message, I think you'll actually end up persecuting more innocent people.
I just more more eveidence than this is required for a warrant to be issued.
The same applies to steganography, IMHO. SOMEONE has to break it - it might as well be me.
grep -ri 'should work'
When a method of steganography is discovered, it is useless.
Yes, if the _method_ itself is discovered, it's useless. However, if each instance of the method's use is quantitatively/qualitatively different enough then the method itself may still be capable of generating additional useful instances even once some are discovered. In other words, if the pattern of uses of a particular method isn't obvious then the method itself remains safe even if some of its output is discovered. Of course, this requires a very sophisticated, dynamic, chaotic, magical method. Or maybe just many methods rolled into one.
Imagine trying to decipher the hidden messages in "The 5000 fingers of Dr. T.". It is a movie and as such contains the symbolism and iconography and messages of many individuals. Some of them are apparent, some of them covert, and some of them downright indecipherable.
Also, think about the Blade Runner/Ridley Scott "Is Deckard a replicant" business that lasted, well, right up until he told the world the answer. It is that sort of interpretation that someone hoping to decipher steganography would have to perfect. It's not just stuff like: Hi Everyone Likes Punch!
The only way to get messages out of such texts is intimate knowledge of the author(s) or intended recipients of the hidden meanings. By asking them, or sodium pentothal, or the NSA's computer simulation of everybody's brain.
I'm no cryptographer, but the most reliable and cost effective way to discover a secret is likely to investigate the people that know the secret, rather than try to divine meaning from a text that came into your hands.
I don't need large brains to have a good time.
If steganography can be made "turnkey", it'll work
for most of today's privacy requirements.
You might think that it'd be easy to detect,
or simple to prevent, but that's simply not true.
Unless someone lists all the ways in which one
can hide information, and a fantastically fast
approach to testing any given communication on the
net against those techniques. Otherwise, to
read a steganographically-encoded message,
each recipient will need to figure out which of
all the messages intercepted even includes the
data you're looking for, and what was used in
this particular instance. Hell, one might even
have two or more different techniques applied
in a single message. Like this message does.
Sort of.
....
Steganography is nothing new. People have been hiding secret messages in innocuous objects since time began. Naturally, various people want to prevent this, but the method's very nature makes it almost impossible to simply track.
Got Rhinos?
While it's true that human beings can interpret images to mean something that a machine could never pick up on, that's not the thrust of the research being done here.
He is doing research into a very particular kind of steganography, whereby messages are concealed within an image via slightly altering the least significant bits of an image.
When you encode information in this way, somebody knowing how to extract it can pull out a message which is not subjective (as in the example of interpreted images given by another poster), but rather is very concrete.
There is some evidence that this form of encoding has been used to communicate information throughout terrorist cells.
What the researcher is doing is developing a method to detect when the LSB's in an image have been manipulated slightly. He is not trying to decode the message, but only to flag particular images as being suspicious.
Decoding would be a matter for someone completely different -- like the FBI, for instance.
His method does have applications, and if it is through alteration of LSB that a message is embedded in an image, it will apparently detect such 90% of the time.
This is a vast improvement over any existing methods I know of for detecting LSB manipulation.
So he's not quite looking for a needle in a haystack. He's examining millions of haystacks, and pinpointing the ones that probably *do* have needles in them.
Quite a large difference, really.
-l
Now we have more people looking at steganography. This can only make it more effective. Sure, the methods we have now might be broken but what about the next ones, the ones that don't show up on the statistical analysis that he appears to be using.
Bleh!
You see? You see? Your stupid minds! Stupid! Stupid!
If you take a photo of a TV screen, it comes out black..
Second, I'm not sure how to react to this. I don't use steganography to hide information, nor do I encrypt my email normally. I guess it's good to know if the techniques used to do this are detectable or breakable, but if it was actually used on a large scale you can bet I'd be screaming, "Big Brother!!!"
Even Slashdot wants to hide some things
The fact that an image after altered can be detected via a mathematical function is true, but saying that it can be detected without having a source image to begin with? What If I take a picture of a random image and then stuff the message which was encrypted into the image. Voila undetectable. Randomness makes the perfect concealment.
I can see detectability from some of the crude software packages out there, but not the better ones that make sure the applied file is expanded to the size of the image and reversed.
Do not look at laser with remaining good eye.
neils provos (openbsd and openssh developer) has a stego detector based on similar principles (i.e., look for statistical anomalies in jpeg files).
in fact he is presenting a paper on the subject at the usenix security conference tomorrow.
unlike the dartmouth folks, who apparently think press reports are the proper medium for scientific interchange, provos makes his results publicly available; see
http://www.citi.umich.edu/techreports/
reports 01-1 and 01-4.
nobody
parturiunt montes, nascetur ridiculus mus
o/~ Join us now and share the software
The article stated that the guy used an algorithm to detect statistical variations and predict wether an image had steganographically hidden data 90% of the time.
How about a GIMP or Photoshop plugin to randomly insert junk data in any JPEG saved in order to make this technique useless? It'd be fun to the the NSA sit and fret over an image that apparently had a list of Warez traders and DMCA violators but instead contained the lyrics to 'Penny Lane'.
Better yet, how about an Apache module that does this same thing to every JPG it serves?
The point is, that as soon as it becomes common procedure to intercept images to check for steganography, those who use steganography will switch methods. I bet PGP data encoded in a JPG is a lot harder to detect, and infinitely harder to extract.
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
I think the legal community would be interested in anything that might help them find and interpret potential evidence. When evidence is properly confiscated, we now have the techniques to break locks on door and safes, we now have the techniques to crack certain types of cryptography, and hopefully we'll soon have the techniques to FIND a stack of steganographically-hidden evidence. It's pretty relevant to our legal system.
http://members.tripod.com/steganography/stego.html
is a great place and has a software archive.
Do not look at laser with remaining good eye.
u cn b a stngrfr!
Name your attachment letter.doc.pif and send it with the message "I send you this file in order to have your advice"
The article sure does make it seem like steganography is the work of the devil. But watermarking documents and sound files is endorsed by such fine members of the establishment as the RIAA and SDMI. So is steganography evil or good? It's just neutral, despite what the article says.
This guy should still be afraid of violating the DMCA. If he tries to detect steganographic images in a sound file, he might run afoul of the RIAA. He shouldn't even think about publishing his research.
How does open-sourcing a steganographic technique impact its usefulness? I suppose it would depend on the nature of the technique. For example, it seems open-source public-key encryption techniques don't compromise their usefulness simply by sharing their algorithms/source. Is this equally possible with steganography or must the methods remain more secretive?
This is an interesting idea, but surely any good encryption produces an output which is indistinguishable from random noise. So, how can the algorithms mentioned in the article (which is interesting, but rather short on facts...) distinguish between the noise added by a steganographically embedded encrypted message and the noise caused by a slightly underspecced A to D converter?
I'm honestly curious... has anyone got any links to a more detailed report on this?
Here's an interesting article that mentions some steganographic pictures hidden on some ebay auctions! Bin Laden at work?
NSA, Pentagon, Police Fund Research Into Steganography
cpeterso
Given a certain state of network bandwidth, the quality of images transferred over the network is likely to increase as the ability to transmit that data increases. This means that anyone trying a large scale data mining for steganographic data, for example in a Carnivore-type application, would need to have many times the bandwidth of ALL the senders/recievers in order to analyze that much data.
That would make it so the only real application of this method would be for people you already suspect of sending steganographic data. You could direct the search toward them. However, then it is still trial and error to find which steganographic protocol they used, etc., and you're back to square one.
Maybe if the steganographic checking system was actually *intergrated* to the Carnivore system you could get somewhere. It might be a good way to search for messages that were "suspicious".
It is interesting, though, that this method is possible without knowing the individual steganographic protocols. It just seems that it would be too resource-intensive to deploy on a wide scale, and a wide scale is the only place it would be really more useful than trial and error.
"He's more machine now than man, twisted and evil."
What it boils down to is this:
The more the corporations, and their lackeys in government restrict freedom, the more determined those to preserve it will become, and the less effective their efforts will be.
For one thing, it's a challenge, and nothing inspires great accomplishments from hackers than waving the red flag.
=== The price of freedom is eternal vigilance
then only criminals will hide secrets in porn.
Porn's good. Er, I mean for steganography that is... "I only use porn for security reasons".
As another thought, how about using TCP window pointers? You might only get a couple bits per TCP packet, but they can add up. This might be useful for key exchange, for instance. Also, there would be no lasting image (or whatever) subject to future recovery. On the other hand, you would have to watch out for proxies.
A dingo ate my sig...
I suggest that we flood the net with documents containing hidden bogus messages. Maybe an innocuous worm or virus would do the trick. It could seek out audio and image files and insert random messages. That should keep the spying computers of the government and other freedom hating organizations busy.
But wait a minute, seeing they can enact freedom squashing laws like the DMCA with impunity, what's to keep them from making steganography illegal? Resist Big Brother. Demand freedom always!
If the government is allowed to keep secrets from its citizens, so should we, too, have the right to keep secrets from the government. Either we trust each other or we don't. A government that spies on its own people should not and will not last. A house divided and all that.
The 1 minute explanation of entropy signature analysis is that it seeks to quantify in R^(n+m) space, the statistical properties of a stream of data by applying n statistical tests to the data. How well or poorly the data passes these tests helps identify the method of generation.
I'm curious about this statement. Assuming a truly random number source, an excellent encryption system, and removing any identifying marks (header, etc.), a cryptographic string should be indistinguishable from random data. Any given byte should statistically appear the same number of times as any other, any pattern should appear the same number of times as any other pattern of the same length. Is there some important mathematical precept I'm missing or are you merely talking about the idiosyncrasies of convention algorithms?
In case anyone was wondering why I spend time working with LavaRnd, cryptographically strong PRNGs, Lava Lite ® lamps and other random oddiments
When I came across the original SGI Lava Lamp number generator so long ago, I thought it was one of the coolest things around. I have yet to come across something that could generate as random a number in as closed a space... cool stuff.
1) Take the first letter of each line.
2) Take the first work of each paragraph.
Ahh but that's the point.
:-)
I take a picture If a room and the Television has only static on the screen... Pretty innocent picture, except the tv screen holds DeCSS.c or The chemical forumla for Cokeacola.
There is a large amount of randomness in the world. A photograph taken during a rainstorm, an artsy photo of sand.... etc...
I can give you many many innocent looking photos that have quite a bit of randomness in them. (and a few nicely staged UFO photos, but that my hobby
Do not look at laser with remaining good eye.
There's that stenography tool, Outguess, that claims it can hide info into a pic without changing the pic's statistical properties (entropy et al, I surmise). I wonder if it's Outguess that makes false (or misinformed) claims, or if Prof. Farid's research on statistical analysis is already out of date...
Personally, no matter what, I wish Prof. Farid a lot of luck. His work might be what will save our collective ass from SDMI-like schemes down the road.
-- B.
This sig does in fact not have the property it claims not to have.
"Hi, mom. Went hiking in the mountains last weekend. While I was hiking, it started to rain. I heard what I thought was thunder and saw this really cool wedged-shape plane just screaming through the clouds. Never seen anything like it before in my life. Like a stealth fighter but way more weirdly-shaped, and totally faster. I guess it was a sonic boom, not thunder. I was lucky enough to get a high-res pic of it as it passed over my head. Sorry about the raindrops that fell on the lens. Here's the pic!"
(Boy, it's amazing how hard you have to work to keep the Harry Fox Agency off your ass for mailing steganographically-embedded song lyrics and guitar tablatures these days!)