USAF Wants To Find Steganographic Content
Bud Higgins writes "The U.S. Air Force has posted a Small Business Technology Transfer Program (STTR) solicitation in which they seek proposals for the automated detection of steganographic content. They seek an application that should run both unobtrusively in the background and in a manual mode, and provide the user the capability to scan all email attachments, downloaded materials and accessed files with an appropriate steganalysis algorithm, reporting any abnormal results (i.e. the presence of steganography). I personally don't think that is feasible, but maybe a good programmer can prove me wrong. A link to the solicitation AF04-T008 can be found here. For those who are not familiar with the SBIR/STTR program, it provides up to $850k for 3 years of research." This sounds very similar to what Niels Provos did over a several-year period at University of Michigan's CITI and released under a free license. I hope the USAF doesn't spend too much of my money without considering extending that research.
...reporting any abnormal results (i.e. the presence of steganography). I personally don't think that is feasible...
I think it probably depends on where you hide the data. For instance, it's probably harder to hide data in the LSBs of an image than, e.g. a file that's supposed to be white noise ("Hey, my mic doesn't work, it only records noise. See for yourself"). Of course, the less data you encode, the harder it is to detect it.
Opus: the Swiss army knife of audio codec
Those of you paranoid enough will probably chime in with something along the lines of "Yeah, but Echelon probably has something like this built-in already!". Anyway, isn't the point of steganography to hide information in such a way that you *cannot reliably* tell whether the information was there in the first place?
I'm not sure what they're looking for here; perhaps a better steganography algorithm?
I work for a company that is funded through a SBIR grant, so on behalf of the company I work for and to all tax paying Americans let me just say: Thank You!
It really is an interesting government program. All the IP we generate with the money stays with us. However in the interest of equitable return to the taxpayer, we have decided to release all of our core software components GPL. (Okay, okay this also helps when it comes time for our semi-annual review, to show that we aren't just soaking the taxpayers.) We hope to turn a profit partially by our user interface components (non-core code that we are not releasing) and also through support.
Trying to get one of these grants is highly competitive, but if you have a really good idea and don't want the vulture capitalists to "fund" you, this is a great program.
Education is a better safeguard of liberty than a standing army.
Edward Everett (1794 - 1865)
Maybe statistical analysis can determine if a given image or other medium is possibly hiding information. But if that information is encrypted, doesn't it look like random data without the key? Without knowing the key or even the cipher used to encrypt it... how can it be shown to actually be information? "That's just random noise/corruption in my images your honor... I dont know what your talking about"
As stegdetect (last time I checked) easily fails on files created with steghide
argan0n
One thing that does surprise me is that they have allowed the Air Force guys to look at this at all, it seems much more like an Army or NSA thing.
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
Take off the tinfoil hat, dude. Checking all pics on the net for steganographic info is virtually impossible - just too much info to sort through in a reasonable time frame.
They likley want this to scan documents leaving thier internal network in an attempt to catch people who are sending out sensitive or secret info. To me this looks like the USAF is plugging a leak, not going on the hunt.
Soko
"Depression is merely anger without enthusiasm." - Anonymous
In "Unification" (Star Trek episode 108), the cloaked Klingon ship that delivers Picard and Spock into Romulan territory sends a coded message to Enterprise that is piggybacked on surrounding Romulan transmissions. If the Romulans were not able to discover this in their time, what makes the USAF think they'll be able to do it now?
But I had a this little idea. Suppose we "pollute" normal images with random data with say 1% redundancy. What I mean is, whenever you create an image you take some random data and steganographically embed it in the image. Write a gimp plugin or something so that the process is transparent and automatic. Your file only becomes 1% bigger, so its no big deal. Not everyone needs to do this, just sufficiently many people so that the vast majority of the positives of stego detection systems are going to be false positives. As long as the message is encrypted before embedding, it is provably impossible to tell a genuine stego image from a false positive, assuming that the underlying encryption isn't broken. So you get a secure stegosystem with 1% efficiency "for free".
[dons tinfoil hat]
We'd all better soon start doing something like this, given where governments are going.
The "solution" can be implemented with the current laws and regulations, and I think the programmer is only a small part to make this system work. A lot of enforcement authorities have to come together and the current evidence suggests that they will come together. Of course, it is a moot point that by the time they figure this out, people would have learned to hide data in other creative ways - the eternal cat-and-rat game ...
Consider this
If Adobe (and others) could be forced to include in their code methods to detect currencies Slashdot | Photoshop CS Adds Banknote Image Detection, Blocking? and not disclose it till they were caught by some vigilant users, what makes us so smug that other major companies with "closed" software are not already in-bed-with-the-feds ? So, it is conceivable that the automatic detection may be going on and we wouldn't be any wiser.
See the Adobe example of how such "spyware" can be forced to run "unobtrusively."
Major Email providers like Yahoo and Hotmail already provide automatic scanning for virus, AOL is including automatic scanning for spyware, MicroTrend (?) already has Online Virus Scanning of your Hard Drive (!), and so under the threat of the Patriot Act (and it's ilk) many of these companies can be forced to scan everything that goes in and out of their systems.
This is the key. Now the threshold for "abnormal" has been reduced so much (almanac carriers as potential terrorists, CAPPS passenger detection based on names and 15 flights were cancelled last month based on this, anti-war protestors as possible terrorists and hence being tailed by the Feds etc.) that the problem of false alarms no longer dogs the current administration and law enforcement agencies.
This is the crux. When the error threshold is reduced so much that the high rates of error are no longer problematic, then any solution (whether efficient or not) can be implemented. Who cares whether it works well or not. Till now the false alarms were the things that stopped such 1984-ish like scenarios from unfolding. Once you accept high errors, and accept even high collatoral damage as the price of doing "business," you can have a solution to almost anything implemented - whether it deserves to be implemented or not is a whole different issue. But who cares? You got nothing to hide - Right?
To see a world in a grain of sand, and then to step back and see the beach where the sand lies
A use for the code I wrote to sort porn based on image content. I can see it now. Project JISM: Joint Image Statistical Modeling. Any my mom said my chronic masterbation wouldn't get me anywhere.
People who bite the hand that feeds them usually lick the boot that kicks them
If the steg'd data has obvious headers and block formatting, a weak algorithm could leave enough of a pattern in the output file to be detectable. And of course some applications of stego are used to embed cleartext data...
Proponents of stego sometimes suggest it's use in environments where even the suspicion of crypto is enough to risk persecution and/or prosecution.The other "trick" to detecting stego is that "normal" JPG/BMP/WAV/MP3/AVI/MPEG files tend to not actually show a high degree of random noise -- the seemingly random data in the LSB tends to have a pattern imposed by the encoder used and the input device.
I'd guess that this problem is more of an issue on highly-processed information from clean sources. You wouldn't expect random noise on an MP3 file ripped off the latest pop album release, but it wouldn't be out of place on a .SHN "bootleg" recording of a TMBG live concert from a handheld DAT recorder...
I do not deploy Linux. Ever.
... from stenographic content. Either he knows it's there (so he won't report it, surely) or he doesn't know (so he does not extract the potentially dangerous content). A scan for steganographic content should be performed by ISPs or by something like carnivore.
Anyway the USAF initiative is more clever than it seems, because vital steganographic content (terrorist plans and so) must be hidden in "popular" files, to make it hard for the good guys to find out the intended audience of the message. So a user level scan might be somewhat helpful.
It will also give a good excuse to people caught surfing for porn ("I am just helping out the USAF, dear!").
---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
It is easy to 'steganohide' content in uncompressed noisy files like tiff or wav. But that content gets destroyed by lossfull compression which is mainly used by multimedia formats (jpeg, mpeg, divx, mpg3, ...). If not, it's called a watermark, but (un)fortunately nobody found a watermark algorithm yet which is robust against lossfull codecs and adding some more noise.
So You have to steganohide Your content after compressing. But compressed files have much less noise, and that noise is not random noise but has statistical quirks. If You just hide Your content as white noise and add it to the file - thats detectable, because it changes the statistical behaviour of the file!
Instead You have to write an specific steganografic algorithm for each lossfull compression format You want to hide content in! It has to respect the 'format noise character'. That's what Niels Provos did for pnm and jpeg with outguess.
paper (pdf) on detection of steganographic messages based on simple statistical analisys of the image. It seems to work well against 2 of the 3 major steganographic endodings they tried.
For any such system to work, it would have to basically be the greatest code-cracking machine on the face of the planet. More than that, though, would be the implications of false-positives. Let's say I send a photoshopped picture of, oh, I don't know, Natalie Portman to a buddy who works for the Air Force. The system, working under the operating parameters it's set to work with, picks up on a specific pattern of bits in the picture and determines that it's a coded message. The coded message is decoded to, inexplicably, reveal GPS coordinates, a date/timestamp, and the phrase "Free XXXXXX" (or some equally suspect verbiage). What would YOU think the "message" meant?
/dev/rand can produce terrorist messages. It's the million-monkey problem, except with thermonuclear weapons.
Given enough processing power, even
"Why Subscribe?" Good question...
Imagine if seganographic checking software was to be mandatory on all computers containing DRM. And, removing it would be a felon. Remember boys and girls, owning a computer is a privilege, not a "right".
Think it can't happen? Think again, we have the Patriot Act as the front runner for this kinda shit. Seriously, I'm voting Libertarian this election. I'm tired of the same old Demo/Repub bull shit!! Arrtrrggghhhhhhaaaa
Life is not for the lazy.
It is trivial to write a program to discover content that has been stegged. A jpeg with hidden content would be quite easy to find if the areas with content where significantly different from those without. The problem comes when the data is similar to the carrier.
.....damn, fractal-based stenography I wonder if anybody's using it?
It's only trivial if they we using the most basic method possible and you had some idea what the data you were looking for was like.
If just I straight-up encode a bunch of dictionary words into the LSB's in a black and white bitmap, then you could easily find them.
If distort the image using a fractal pattern as my method of encoding and the original data source is compressed and encrypted as part of the operation, it's not trivial anymore, is it?
Life is too short to proofread.
The point of steganography is to hide information so that its presence cannot be detected. This means hiding information below the noise floor of the media. Information hidden in this way cannot be practically detected, assuming the stego is halfway decent, and the message to be hidden appears random (easily accomplished by encrypting it first).
Sure, *if* you had access to the unaltered original, then you could detect that it had been altered, but any competent steganographer would encrypt the hidden information first.
This sentence demonstrates that you don't understand either /dev/urandom or steganography.
More mis-informed rubbish - kernel implementation and processor type have little to do with the algorithms underlying the /dev/urandom implementation. Furthermore, /dev/urandom is based on "natural type" entropy (i.e randomness derived from unpredicable physical processes).
So if you have to hide something from the feds then become a scientist and collect lots of data from nature. It should have an element of randomness that allows you to steg your secrets in the data.or, you could go and take a regular photo. Plenty of real, nature-derived randomess there.
In audio that is. SAy you decide to start hiding stuff in live performance music, as in fan recorded data. Much of that is distributed in 24-bit format since we are talking about hardcore people here. Well, this is good already, seeing as you aren't going to find 24-bit converters that really get 24-bits of SNR. So you have plenty of inherant noise to begin with. Add to that the noise of a concert and you've plenty to mask the signal with.
> I personally don't think that is feasible
Of course this is feasable! At least with todays steganography software.
What the software does, is to overwrite appearently insignificant portions of the "container" data (the audio/picture/text/whatever file that transports the smaller hidden file). The steganographers say (rightfully) that, by encrypting the hidden data with a strong-enough algorithm, it is indistinguishable from random data. Ie, no one (without the key used for encryption) would be able to tell if it's encrypted data, or perfectly random data.
However, the programmers of steganographic software now go one step further and say (wrongly!) that images and audio files carry random noise in their least significant bits (LSB). Certainly, the lowest of those 16 bits of CD quality audio does not carry much data. And granted, 16 bits give 96dB of dynamic range while analog master tapes (studio quality) only have about 80dB, and microphone technology hardly touches 96dB. The LSB of an audio wave file definately is noisy, no doubt about that.
But (big "BUT"), it is far from being perfectly random. In the LSB you might find 50Hz/60Hz hiss from the buildings electric cabeling. You might find characteristic noise that's typical for your brand of microphone, or even a kind of "noise fingerprint" that could be used to distinguish your microphone from others of the same brand (much like crime investigators can distinguish typewriters by analyzing the blackmail letter). Actually, an experiment showed that when cutting all but the LSB of a music wave file, the tune remains still recognizable!
What the stego programmers do is to replace that LSB (or even 4 least significant bits) with perfectly (pseudo) random data. That's a difference! I can just cut all but the LSB and check if it statistically matches perfect random data (whitenoise) or if "some of" the music tune is "somehow" in there (eg by correlation, a DSP technique).
The same applies for pictures. If the pictures were scanned, the lower bits will contain artefacts characteristic to the particular scanner used. Digital photos exhibit "signatures" of the CCD/CMOS chip used in the digicam. Etc.
The steganographers know this, while the programmers of stegano software deliberately ignores it. It's a solvable problem, but infinitely difficult. If you know what the stegano-detection software is looking for, you can easily avoid it. Just encrypt your hidden data to "perfect random" and then transform it (by adding data, thus loosing efficiency) to exhibit almost the same "fingerprint" signature as the data you are going to overwrite. In case of an audio wave file, impress a bit of the tune on your data.
But obviously, you can't reach perfection, because a 100% match means that you overwrite the original data with a 100% copy of it (-> you have stored 0 bytes of hidden data). Or you know how the detector works, what tresholds it uses to bin the file as "steganographic", and stay a little below the treshold. But that puts you on the risky side.. Will they change the tresholds? Will they check for other characteristics as well, something that you didn't address in your steganographic software?
That's why the steganographic programmers (not researchers!) ignore this problem. It has no practical solution. It's so much easier to just ignore it, and offer you the choice between 4 and 8 bits of hidden data per 16 bits of wave data (like eg "Scramdisk" does, a recommendable harddisk encryption software). This is better than nothing, but it is far from "not feasable" to detect!
Marc
As others stated, (as always in cryptography) if the stegging user isn't stupid (means he would encode before steg), the data to be stegged would be as random as the data that you steg it in. There is no possibility to tell one set of random data from another set of random data. I think they do it for discovering stupid spys.
They likley want this to scan documents leaving thier internal network in an attempt to catch people who are sending out sensitive or secret info. To me this looks like the USAF is plugging a leak, not going on the hunt.
That's exactly one of the reasons for the technology. The DoD has an obligation to protect sensitive information. There are a crazy number of hoops that need to be gone through to get unclassified info off of a classified system. They can't have people encoding stuff in pictures of Barney then walking away with it.
I know the usual paranoids are up in arms about the AF doing this, but the same people would flood "The DoD is so stupid" if it were found out that people were abusing the technology to transport classified info.
In these days when the FBI thinks possession of an almanac makes you suspicious...what happens to you if some half-baked experimental steganography-detection program looks at billions of .jpgs, gets to an image you've included in an eBay auction descriptions, and detects some not-quite-decodable signal just above the noise that it interprets "there's definitely something hidden in that image, even though we can't tell what?"
How do you prove that you're innocent?
How do you prove that your image does NOT contain steganography?
Worse yet, suppose you are using steganography--say, a watermark to prevent people from stealing your image. Will the FBI believe what you tell them is the decoded content?
I mean, a few decades ago some nutcase analyzed Shakespeare's First Folio and decided that it was printed in a mixture of two slightly different fonts that constituted a binary code with a message proving that it had been written by Sir Francis Bacon. (No kidding). That proves that it's easy for someone who's looking for steganography to find it, whether it's there or not.
"How to Do Nothing," kids activities, back in print!
The problem with the LSBs of an image is that they aren't quite random. Unless the image is raytraced or otherwise artificially produced, there's a fair amount of order there. Even a raytraced image might not be quite random.
The same holds with audio. For instance, crypted data is white noise, but concert noise is "pink noise" which has a characteristic spectrum. The noise produced by converters is closer to white, but it isn't quite either. People like Neils Provos have been studying this for a while, trying to find out which bits they can change without altering the statistics of the image or audio, but with limited success. As of last year (don't know how it is this year), all published steganography schemes at least a few months old had been broken.
I hereby place the above post in the public domain.