USAF Wants To Find Steganographic Content

← Back to Stories (view on slashdot.org)

USAF Wants To Find Steganographic Content

Posted by timothy on Saturday January 10, 2004 @09:29PM from the sir-yes-sir-we-must-examine-porn-sir dept.

Bud Higgins writes "The U.S. Air Force has posted a Small Business Technology Transfer Program (STTR) solicitation in which they seek proposals for the automated detection of steganographic content. They seek an application that should run both unobtrusively in the background and in a manual mode, and provide the user the capability to scan all email attachments, downloaded materials and accessed files with an appropriate steganalysis algorithm, reporting any abnormal results (i.e. the presence of steganography). I personally don't think that is feasible, but maybe a good programmer can prove me wrong. A link to the solicitation AF04-T008 can be found here. For those who are not familiar with the SBIR/STTR program, it provides up to $850k for 3 years of research." This sounds very similar to what Niels Provos did over a several-year period at University of Michigan's CITI and released under a free license. I hope the USAF doesn't spend too much of my money without considering extending that research.

27 of 267 comments (clear)

Min score:

Reason:

Sort:

Oh yeah? by Mynkami · 2004-01-10 21:40 · Score: 2, Interesting

"They seek an application that should run both unobtrusively in the background and in a manual mode, and provide the user the capability to scan all email attachments, downloaded materials and accessed files with an appropriate steganalysis algorithm, reporting any abnormal results (i.e. the presence of steganography)."
Suuuuure, Carnivore anyone?
how stegged is stegged? by sparkes · 2004-01-10 21:44 · Score: 1, Interesting

It is trivial to write a program to discover content that has been stegged. A jpeg with hidden content would be quite easy to find if the areas with content where significantly different from those without. The problem comes when the data is similar to the carrier.

If you had hidden your message is bogus scientific data taken from a near random source then it would be very difficult to see the areas that contained stegged data.

It would be possible with time and processing power to dicover what bits where stegged if you used /dev/urandom to get the data. Knowing your processor type and kernel implientation the powers that be could find patterns in the data and look for those (or absence of those) in your message. But if the randomness is of a natural type then the difficulty increases by a massive amount.

So if you have to hide something from the feds then become a scientist and collect lots of data from nature. It should have an element of randomness that allows you to steg your secrets in the data.

--
blog and junk
1. Re:how stegged is stegged? by theLOUDroom · 2004-01-10 23:23 · Score: 4, Interesting
  
  It is trivial to write a program to discover content that has been stegged. A jpeg with hidden content would be quite easy to find if the areas with content where significantly different from those without. The problem comes when the data is similar to the carrier.
  
  It's only trivial if they we using the most basic method possible and you had some idea what the data you were looking for was like.
  
  If just I straight-up encode a bunch of dictionary words into the LSB's in a black and white bitmap, then you could easily find them.
  If distort the image using a fractal pattern as my method of encoding and the original data source is compressed and encrypted as part of the operation, it's not trivial anymore, is it?
  
  .....damn, fractal-based stenography I wonder if anybody's using it?
  
  --
  Life is too short to proofread.
Wonder why Air Force by Saeed+al-Sahaf · 2004-01-10 21:53 · Score: 4, Interesting

The Air Force has always been at the fore front of technological thought within the military. I've been Air Force since 1984, and currently work in Information Management, although my first career field was Fire Fighting, I cross trained into IT in 1998. I work with many first class programmers and network guys, most of them classic "hackers". It does not surprise me they are looking at this.
One thing that does surprise me is that they have allowed the Air Force guys to look at this at all, it seems much more like an Army or NSA thing.

--
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
The end user doesn't need protection... by marcello_dl · 2004-01-10 22:28 · Score: 3, Interesting

... from stenographic content. Either he knows it's there (so he won't report it, surely) or he doesn't know (so he does not extract the potentially dangerous content). A scan for steganographic content should be performed by ISPs or by something like carnivore.

Anyway the USAF initiative is more clever than it seems, because vital steganographic content (terrorist plans and so) must be hidden in "popular" files, to make it hard for the good guys to find out the intended audience of the message. So a user level scan might be somewhat helpful.

It will also give a good excuse to people caught surfing for porn ("I am just helping out the USAF, dear!").

--
---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
Re:stego wrapped pgp by Ronald+Dumsfeld · 2004-01-10 22:33 · Score: 5, Interesting

Maybe statistical analysis can determine if a given image or other medium is possibly hiding information. But if that information is encrypted, doesn't it look like random data without the key? Without knowing the key or even the cipher used to encrypt it... how can it be shown to actually be information? "That's just random noise/corruption in my images your honor... I dont know what your talking about"

Statistical analysis can indeed detect where hidden information is placed into an image, usually by noticing that the balance of the image is off. In fact, using encrypted data is more likely to stand out because images are not usually populated with statistically random data.

Here's a piece on scanning Usenet for hidden images. As a broadcast medium you'd expect it to be most frequently used as you can anonymously post material and it is well-nigh impossible to locate the intended recipient.

--
Where's the Kaboom?
There's supposed to be an Earth-shattering Kaboom.
Re:Interesting by Anonymous Coward · 2004-01-10 22:34 · Score: 2, Interesting

My guess is that they aren't so interested in decoding it. Well, they would like to be able to do that, but their main intent is probably to know when someone is sending an encoded image out of their network. That person would then get investigated for possible espionage. In fact, in a case like that, decoding it would be a hindrance to the Air Force. Here's an example:

Suppose you work inside the Air Force and want to blow the whistle on them for some illegal acts. So you gather the incriminating documents and emcode them into images of your kids, cats, whatever, and e-mail them to a reporter friend. As soon as you do, the Air Force's spiffy new software sounds the alarm, and you're busted. The top brass knows you aren't a spy, but they want to nail you to the wall for ratting them out. So they haul you into court on an espionage charge and use the results the software generated as evidence. They'll say that you must've been passing secret information, but they can't decode it to see exactly what you sent, but you must be a spy. At this point, you're caught in a bind. You can keep your mouth shut about what is in the images and profess your innocence, hoping that the charges don't stick but risking long jail time if they do, or you can decode the data for the Air Force, possibly getting you off the hook on the espionage charges but still getting you in hot water as a whistleblower, while at the same time possibly exposing other whistleblowers in the process (those who may have passed documents to you). But wouldn't the Air Force be able to do all this if they could decode the data themselves? Not really, since, if the documents weren't classified, they'd have a harder time getting you charged with espionage. Those charges alone are incredibly serious and will put intense pressure on you to roll over and cooperate.

Sorry, this message wasn't supposed to be a paranoid rant, but it turned into one along the way.
Patterns In The Static by shadowcabbit · 2004-01-10 22:52 · Score: 4, Interesting

For any such system to work, it would have to basically be the greatest code-cracking machine on the face of the planet. More than that, though, would be the implications of false-positives. Let's say I send a photoshopped picture of, oh, I don't know, Natalie Portman to a buddy who works for the Air Force. The system, working under the operating parameters it's set to work with, picks up on a specific pattern of bits in the picture and determines that it's a coded message. The coded message is decoded to, inexplicably, reveal GPS coordinates, a date/timestamp, and the phrase "Free XXXXXX" (or some equally suspect verbiage). What would YOU think the "message" meant?

Given enough processing power, even /dev/rand can produce terrorist messages. It's the million-monkey problem, except with thermonuclear weapons.

--
"Why Subscribe?" Good question...
1. Re:Patterns In The Static by Alioth · 2004-01-11 00:45 · Score: 2, Interesting
  
  Given enough processing power, even /dev/rand can produce terrorist messages.
  
  It would have to be an enormous amount of power. Consider we limit the possibilities merely to the alphabet.
  
  To come up with the word 'the' would be reasonably common place. The odds are 1 in 27*27*27 (26 letters plus space), or 1 in 19683, that any three outputs from a purely alphabetical /dev/urandom would give you that.
  
  But the word 'the' is hardly a meaningful message. Let's consider 'The quick brown fox jumps over the lazy dog', a fairly short message at 43 characters. The odds of that coming from an alpha/space /dev/urandom are 1 in 35370553733215749514562618584237555997034634776827 523327290883 - astronomically unlikely. Even if every single atom in the Solar System was working on generating the string at random, it's still very unlikely to show up!
  
  With a stegged message, where the entire ASCII character set may be used, the message such as what you speculate (some GPS coordinates and suspect verbiage) is even less likely.
  
  The example of Shakespeare with an infinite number of monkeys is cute, but there *isn't* an infinite number of monkeys, or infinite bytes in images for that matter. The odds are so infinitessimally small that it's barely worth worrying about.
  
  --
  Oolite: Elite-like game. For Mac, Linux and Windows
US Gov sponsored DRM by DigiShaman · 2004-01-10 22:55 · Score: 3, Interesting

Imagine if seganographic checking software was to be mandatory on all computers containing DRM. And, removing it would be a felon. Remember boys and girls, owning a computer is a privilege, not a "right".

Think it can't happen? Think again, we have the Patriot Act as the front runner for this kinda shit. Seriously, I'm voting Libertarian this election. I'm tired of the same old Demo/Repub bull shit!! Arrtrrggghhhhhhaaaa

--
Life is not for the lazy.
In general it's feasable though by Sycraft-fu · 2004-01-10 23:55 · Score: 4, Interesting

In audio that is. SAy you decide to start hiding stuff in live performance music, as in fan recorded data. Much of that is distributed in 24-bit format since we are talking about hardcore people here. Well, this is good already, seeing as you aren't going to find 24-bit converters that really get 24-bits of SNR. So you have plenty of inherant noise to begin with. Add to that the noise of a concert and you've plenty to mask the signal with.
Re:Interesting by saforrest · 2004-01-11 00:18 · Score: 3, Interesting

But embedding a message introduces redundancy, by an amount proportional to the capacity of the stego system.

I don't think you mean 'redundancy' here, since the added data is obviously not redundant. It can't be, since it has to encode the steganographic message.

I think you mean 'apparent redundancy', i.e. the container file would appear to be redundant to someone who doesn't know there's a secret message since it's larger than it needs to be.

However, this problem can be avoided if the encoder simply chooses a steganographic method which does not increase container size. As a trivial example of this idea, consider

this stegangraphic tool I wrote which is based on permuting HTML tag attributes.

Clearly, tag attributes must have some fixed order when written into a file. My program simply permutes them in a specific way within the file, thus encoding content without increasing container size.

The general idea is to make use of the existing redundancy of the container to encode data. The one caveat here is that the amount of container redundancy is bounded above by the size of the container, so there is a fixed maximum amount of data that can be encoded.
Of course this is feasable! by jetmarc · 2004-01-11 00:25 · Score: 5, Interesting

> I personally don't think that is feasible

Of course this is feasable! At least with todays steganography software.

What the software does, is to overwrite appearently insignificant portions of the "container" data (the audio/picture/text/whatever file that transports the smaller hidden file). The steganographers say (rightfully) that, by encrypting the hidden data with a strong-enough algorithm, it is indistinguishable from random data. Ie, no one (without the key used for encryption) would be able to tell if it's encrypted data, or perfectly random data.

However, the programmers of steganographic software now go one step further and say (wrongly!) that images and audio files carry random noise in their least significant bits (LSB). Certainly, the lowest of those 16 bits of CD quality audio does not carry much data. And granted, 16 bits give 96dB of dynamic range while analog master tapes (studio quality) only have about 80dB, and microphone technology hardly touches 96dB. The LSB of an audio wave file definately is noisy, no doubt about that.

But (big "BUT"), it is far from being perfectly random. In the LSB you might find 50Hz/60Hz hiss from the buildings electric cabeling. You might find characteristic noise that's typical for your brand of microphone, or even a kind of "noise fingerprint" that could be used to distinguish your microphone from others of the same brand (much like crime investigators can distinguish typewriters by analyzing the blackmail letter). Actually, an experiment showed that when cutting all but the LSB of a music wave file, the tune remains still recognizable!

What the stego programmers do is to replace that LSB (or even 4 least significant bits) with perfectly (pseudo) random data. That's a difference! I can just cut all but the LSB and check if it statistically matches perfect random data (whitenoise) or if "some of" the music tune is "somehow" in there (eg by correlation, a DSP technique).

The same applies for pictures. If the pictures were scanned, the lower bits will contain artefacts characteristic to the particular scanner used. Digital photos exhibit "signatures" of the CCD/CMOS chip used in the digicam. Etc.

The steganographers know this, while the programmers of stegano software deliberately ignores it. It's a solvable problem, but infinitely difficult. If you know what the stegano-detection software is looking for, you can easily avoid it. Just encrypt your hidden data to "perfect random" and then transform it (by adding data, thus loosing efficiency) to exhibit almost the same "fingerprint" signature as the data you are going to overwrite. In case of an audio wave file, impress a bit of the tune on your data.

But obviously, you can't reach perfection, because a 100% match means that you overwrite the original data with a 100% copy of it (-> you have stored 0 bytes of hidden data). Or you know how the detector works, what tresholds it uses to bin the file as "steganographic", and stay a little below the treshold. But that puts you on the risky side.. Will they change the tresholds? Will they check for other characteristics as well, something that you didn't address in your steganographic software?

That's why the steganographic programmers (not researchers!) ignore this problem. It has no practical solution. It's so much easier to just ignore it, and offer you the choice between 4 and 8 bits of hidden data per 16 bits of wave data (like eg "Scramdisk" does, a recommendable harddisk encryption software). This is better than nothing, but it is far from "not feasable" to detect!

Marc
1. Re:Of course this is feasable! by Shanep · 2004-01-11 03:28 · Score: 2, Interesting
  
  Actually, an experiment showed that when cutting all but the LSB of a music wave file, the tune remains still recognizable!
  
  Many years ago (10+), just out of interest in crypto, I XOR'ed a raw audio file (my own speech) with pseudo random data (all bits, from LSB to MSB). The result, was one very noisy audio file with the speech still audible! I thought "WTF!?"
  
  I figured that since, on average, 50% of bits would be toggled, some of the audio information would still be present in a form a human could recognise. I have been meaning to do this again and pass it through a low pass filter to see if I could make the audio come more to the foreground.
  
  perfectly (pseudo) random data
  
  This is a contradiction in terms. Pseudo random data cannot be perfect, that is why it is pseudo (fake). Although, based on reading your interesting message, I'm sure you know this.
  
  It has no practical solution.
  
  How about stego software that detects how many LSB's span the noise floor, replace those with real white noise and then replace lower LSB's with the stego? I wonder if one could go about the noise floor LSB replacement so that it was a gradual replacement near the bits which border between noise and information? So as to prevent detection of the sudden (obvious) change which would be a "stego fingerprint" in itself!
  
  --
  War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
2. Re:Of course this is feasable! by Travis+Fisher · 2004-01-11 05:11 · Score: 2, Interesting
  
  Thus spake Shanep: Many years ago (10+), just out of interest in crypto, I XOR'ed a raw audio file (my own speech) with pseudo random data (all bits, from LSB to MSB). The result, was one very noisy audio file with the speech still audible! I thought "WTF!?"
  Your thought ("WTF!?") was right on target. I don't know what you actually did, but it clearly wasn't XOR the audio file with anything resembling random bits. If you XOR a message with truly random bits, the result will consist of truly random bits. This is because for each bit of message there is a 50-50 chance that you will flip that bit, and these chances are all independent. So the output bit has a 50-50 chance of being 0 or 1, independent of other output bits.
  The same general principle applies when you XOR a message with pseudo-random bits. Provided the original message had no built-in correlation to the pseudo-random bit stream, the output will have as good random characteristics as the pseudo-random bit stream. In particular, it will sound like white noise when you feed it to a speaker.
  Contrast this with what would happen if you AND or OR the message with a (pseudo-)random bit stream. In this case each bit has a 50-50 chance of being left unchanged and a 50-50 chance of being set to zero (AND) or one (OR). This would produce an output like you describe; it would sound like a noisy version of the original file. If I had to guess, this is what you actually did.
3. Re:Of course this is feasable! by BigBadBri · 2004-01-11 08:40 · Score: 2, Interesting
  
  Nice logic, but I think there's a flaw.
  Because the audio has a fixed word size, and truly random data will contain a significant number of short runs (I'm thinking For example, the four most significant bits would be preserved in 1/2^4, or 1/16, of the file - the three MSBs in 1/8 of the file, and so on.
  I reckon the human brain, looking as it does for patterns in the world outside, would be able to find what remained of the original pattern in the data.
  I'm not the parent, but it seems to me that an XORed file would sound like a noisy copy of the original.
  I may even try it myself and see.
  
  --
  oh brave new world, that has such people in it!
4. Re:Of course this is feasable! by epsalon · 2004-01-11 09:51 · Score: 2, Interesting
  
  Maybe he XOR'd the entire 8 bits of each byte with the same bit, effectively XORing each byte with either FF or 00. In that case, a lot of the original audio is still there.
  
  --
  
  Make even shorter URLs - 8LN.org
I don't think this can possibly work. by dirt_puppy · 2004-01-11 00:28 · Score: 4, Interesting

As others stated, (as always in cryptography) if the stegging user isn't stupid (means he would encode before steg), the data to be stegged would be as random as the data that you steg it in. There is no possibility to tell one set of random data from another set of random data. I think they do it for discovering stupid spys.
Re:Feasible? by Anonymous Coward · 2004-01-11 00:39 · Score: 1, Interesting

The real question is why the Air Force is doing this at all. Who asked them to perform domestic investigations except on their own traffic? And if it is on their own traffic why aren't they providing their workers with the tools to defeat this sort of detection during file transmission (like reasonably secure encryption tools-- which would obscure the heck out of the files) in the first place? Encrypted file transfer ought to be the standard for government agencies. Otherwise you just make it all the more obvious which files are important by only encrypting those files. And while those files may be inaccessible, other information about them could be valuable to attackers.
Probably feasible because of STTR by JohnQPublic · 2004-01-11 04:39 · Score: 2, Interesting

The original poster doesn't believe that it's possible to detect steganographic content. There have been lots of technical follow-ups that suggest it might be possible, but almost nobody has mentioned the funding issue. The task is most likely possible simply because there's been an STTR solicitation published. Many of the STTR and SBIR solicitations are designed by their authors to fund existing projects known to the authors. These "solicitations" provoke very few proposal submissions, occasionally even just the one from the expected recipient of the funds.

Don't get me wrong - this isn't a scam. The funding groups are usually genuinely interested in having what they specify developed, sometimes wind up buying lots of it once the development is complete, and in most cases all qualified bidders are truly considered. It's just that the solicitations are often written so narrowly that only a select few bidders can qualify.

But hey, at least the bidders are required to be small businesses, not like those Halliburton contracts for Iraq!
Re:Hrm by starm_ · 2004-01-11 04:52 · Score: 3, Interesting

Actually this is not a good method. The least significant bit of text is not less random than images. It is often even more random.

I have read a paper on this and they used the opposite method than what you propose. They assumed images have sections which are not very random. (most images contains some areas with uniform color) If the least significant byte of an image is very random compared to the other bytes it can indicate steganography.

Of course you have to ajust the thresholds to account fo the differecence in randomness due to the different image compression algorithms.

Also you get a lot of false positive if the image has been taken with a inexpensive digital cameras. These cameras will put some noise in the whole image which makes it look like there migh be a message in there.

anyways this technique can filter out a bunch of images (something like 50%) that you can be pretty sure contains no steganogrphy. But the other 50% I don't know how you would find out.

The task is very hard when the hidden text has been encripted prior to encoding in the image, so you can't look for patters inherent in text.
Re:Finally... by WindowlessView · 2004-01-11 05:09 · Score: 2, Interesting

I wonder if anyone has done a statistical analysis of spelling errors in emails by American youth. Talk about undetectable ways to hide a message in plain text!

--
Leave the gun, take the cannolis.
Not quite that easy by wirelessbuzzers · 2004-01-11 05:38 · Score: 4, Interesting

The problem with the LSBs of an image is that they aren't quite random. Unless the image is raytraced or otherwise artificially produced, there's a fair amount of order there. Even a raytraced image might not be quite random.

The same holds with audio. For instance, crypted data is white noise, but concert noise is "pink noise" which has a characteristic spectrum. The noise produced by converters is closer to white, but it isn't quite either. People like Neils Provos have been studying this for a while, trying to find out which bits they can change without altering the statistics of the image or audio, but with limited success. As of last year (don't know how it is this year), all published steganography schemes at least a few months old had been broken.

--
I hereby place the above post in the public domain.
1. Re:Not quite that easy by Sycraft-fu · 2004-01-11 14:05 · Score: 2, Interesting
  
  Ahh, but the noise of converters is white noise. So all you need are some cheap 24-bit converters, and there's no shortage of those, and you are good to go. You get some cheap portable that has a SNR of sometihng like 102-105dB. Ok well that needs a maximum of 18-bits to actually encode that resolution. Now since there can be some signal below the noise floor, and since you want to be carful, take two more bits on that. That still leaves you 4 bits per sample to use that is going to essentially be pure white noise.
Re:Hrm by drooling-dog · 2004-01-11 06:14 · Score: 2, Interesting

For example, you would expect the least significant bits in a jpeg to be more or less random - any degree of organisation there could be a hidden text or something else.
Actually, I would expect relatively little randomness in a compressed image, because removal of randomness (along with redundancy) is what compression is all about. And since well-encrypted data should appear random, you'd get further by testing for bits that are too random, rather than for hidden structure.
Re:Hrm by Doomdark · 2004-01-11 08:35 · Score: 2, Interesting

But you can't detect steg with encrypted messages, because the encrypted messages seem as random as the normal data, so there's nothing to clue you into the fact that it means anything.
I'm not steg expert, but saying "as random as normal data" isn't of much help -- normal data is NOT random, statistically speaking. One of clues is that random data has highest theoretical amount of information that is, can not be compressed (as there's no redundancy to compress); thus, anything that compresses using some algorithm is somewhat non-random (non-uniformly distributed values of bits independent of how one looks at it; same number of 0s and 1s on any given subset of data).
Thing is; it'd be neat if some encryption (or compression) algorithm (or combination of two) could indeed hide (statistic) non-randomness of real data well enough to prevent steg analysis from working. I think encryption/compression in general does improve "white noiseness", but probably not enough to prevent analysis of whether something is "as random as it should".

--
I like paying taxes. With them I buy civilization -- Oliver Wendell Holmes
Like airport security, but worse by Brett+Glass · 2004-01-11 15:26 · Score: 1, Interesting

Searching for steganography is like airport security, and equally futile. Both assume that it's possible to recognize anything that can possibly used to do ill, even when you don't know what it is, how it works, or what it's for. 99% of the time, you'll have a false alarm; the other 1% of the time, you'll find a really dumb crook who wasn't competent enough to do any real harm anyway. (If he was, you wouldn't have caught him.)