Battling Steganography
An anonymous reader submitted a fairly thin little story about a researcher who is Battling
Steganography. I can certainly see the appeal of the study but it really seems like a needle in a hay stack sort of project. And when you actually can detect one technique, new and better techniques will crop up and take its place.
That lack of certainty really isn't that big an issue, because with a good idea of what percentage of images are false positives it would be fairly simple to look for image sources where the percentage was well outside the norm.
All of this would of course be very resource intensive and would require access to large amounts of data (Omnivore, anyone?) but it's far from outside the capabilities of most governments.
Possibly also of interest to people is Benford's Law, which relates to the distribution of numbers - turns out that in many areas it's very simple to identify real data vs random data, because real data has some definite non-random properties.
fencepost
just a little off
So, how can the algorithms mentioned in the article (which is interesting, but rather short on facts...) distinguish between the noise added by a steganographically embedded encrypted message and the noise caused by a slightly underspecced A to D converter?
You're right, there isn't too much of a difference between random noise and an encrypted communication. If you had a pure digital stream that had just been converted from analog, you could stick data in the least significant bits and no one would be the wiser. For example, a CD is just a sequence of 16 bit words iterated 44,100 times a second; you could just replace the least significant bit in each word with bits from your hidden message and it would be indistiguishable from random noise.
The problem arises when you try to compress digital information. These compression algorithms use the most optimum way to represent data that they can find and discard the least significant data, so they would completely destroy the afore mentioned hidden message. To hide data in a compressed file you need to play with how the compression mechanism stores the data, and the resulting file is most probably not going to be optimally compressed when you're done. What this guy is doing is looking at how the information was compressed, extract the overlying data that was being stored, and making sure the compression algorithm was indeed optimal. If there are any odd quirks in the compressed data or it doesn't look like the compression was optimal, it may be because data is hidden inside.
I hope this is a good enough explanation. I'm short on the examples but the underlying ideas are pretty basic.
You might also want to check the techreports that I published about my research.
At HAL 2001, I presented on Detecting Steganographic Content on the Internet. You might like that.
Dartmouth certainly seems to know how to do PR. I would just like to know where their publications are.
Steganography is nothing new. People have been hiding secret messages in innocuous objects since time began. Naturally, various people want to prevent this, but the method's very nature makes it almost impossible to simply track.
Got Rhinos?
neils provos (openbsd and openssh developer) has a stego detector based on similar principles (i.e., look for statistical anomalies in jpeg files).
in fact he is presenting a paper on the subject at the usenix security conference tomorrow.
unlike the dartmouth folks, who apparently think press reports are the proper medium for scientific interchange, provos makes his results publicly available; see
http://www.citi.umich.edu/techreports/
reports 01-1 and 01-4.
nobody
parturiunt montes, nascetur ridiculus mus
Here's an interesting article that mentions some steganographic pictures hidden on some ebay auctions! Bin Laden at work?
NSA, Pentagon, Police Fund Research Into Steganography
cpeterso