Hydan: Steganography in Executables
An anonymous reader says "Ever wanted to hide a message into an executable? Now you can with Hydan. Presented recently by Rakan El-Khalil at Defcon and Blackhat, this tool lets you embed data into an application without changing its functionality or filesize! Check it out. Use includes steganography as well as embedding a program's signature into itself to verify it's not been tampered with."
If steganography is now in the hands of joe user, how useful is it really? It's not exactly a secret anymore, is it? ;P
Un-news
Discovered a copyright string that was also executeable 68k code .... and included it in my main initialization routines
Many executable formats include unused space for alignment purposes. For example, I've been working on a Mach-O equivalent of the super-tiny ELF executable mentioned a few days back. The executable produced by GCC includes 300 bytes of code and headers, and 8000 bytes of padding.
"They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
Unless you do it like this (an example is always easy to understand).
Say you have an executable:
1337PROGRAM
Your signature checking routine then does this:
1_3_3_7_P_R_O_G_R_A_M
and computes the hash
deadbabeca
And then sends:
1d3e3a7dPbRaObGeRcAaM
To reverse, we extract the hash (deadbabeca) and the "original" executable.
Then we compute the hash (of 1_3_3_7...) and check if it matches...
In summary, we embedded a checksum, but we removed it before we checked it. Simple, really.
My other car is first.
To decode.
I would still recommend publishing a separate public key, however, and include an encrypted signature in the program. As you say, it can always be changed and re-encoded.
On the other hand, this might be useful on a server, by encoding a public key and checker on a CD-R and checking all your programs periodically against the CD-R key. You could encode signatures in each program and be able to upgrade programs from a central encoding server without having to write a new cd each time.
It is typically assumed for these kinds of things that the signature itself is not part of the data being signed. However if the file can be modified, so can it's signiature.
You could easily solve that by using X.509 certificates, issues by a trusted CA, similar to the Microsoft "signcode.exe" program for signing CAB files and EXEs. However, that would only prove the integrity of the binary. It's still impossible to write a program which refuses to run when modified, because you can always just remove the code that checks the validity of the signature. That's the real paradox.
Can you say college algebra?
The only thing they have to solve is f(X+S) = S, where f is the algorithm for calculating the signature, X is the exe code, and S is the signature. Depending on f, it can either be completely trivial to calculate S or impossible.
The site is slashdotted so I don't know if this is how it works, but...
Some 8086 opcodes contain a bit that reverses the operation. For examble, with the bit set in the instruction "mov bx,cx", bx would be copied to cx instead of cx to bx. By switching the registers AND setting the bit, you effectively reverse the operation twice, creating different machine code that does exactly the same thing.
The A86 assembler used this bit to create a fingerprint that would make it easier to detect non-paying users.
Comment removed based on user account deletion
But then I started thinking about how effectively viruses are distributed by non-techies who do click on the attachments in their EMAILs. Perhaps viruses or spyware could be used to "broadcast" a message this way to different cells in a covert organization (terrorists, organized crime, chess club members, whatever). All you'd need is an unprotected PC to act as a tethered goat and catch all those infections for later reading.
For that matter, a sender could "neuter" a virus by disabling its reproductive code and then embed a message in it and send it through some anonymizer (either a formal anonymizer or using a shell account). When the recipient stores it in a quarantine directory, it would look just like an infected EMAIL that had been cleaned up by your antivirus program, not a covert message. Some variation of this using spyware infection would be even more effective as they tend (in my limited experience) to have even more variants than viruses - the obfuscated message would be more readilly confused with normal variation. Instead of posting your tampered executables to some usenet forum, you would simply have the reciepient visit a site running the spyware. New messages would be sent and old ones sterilized when the spyware reinstalls itself.
Just my 20 mills.
"Prepare for the worst - hope for the best."
Note that as far as I remember, stenography by definition is supposed to make it impossible to prove that there is data hidden there - one step further than normal encryption. It's not so much as about hiding the data as being able to deny its existance.
One reason for this is if you have encrypted data on your disk, then courts can demand the password for it. Stenography allows you to insist there is no hidden data.
Any sort of steg is vulnerable to "you just have to look for [insert technique that you already know about]". The purpose of steganography is not to make data unfindable, it's just to obfuscate the fact that it's there in the first place. If you know it's there, and you know how to look, finding the data is easy. That's how the extraction programs work, after all.
Mass detection of the presence of steg with unknown techniques usually relies on on statistics over the "normal" types of files. When the LSB of those image files is suddenly nearly random, instead of correlated with nearby pixels, you know something's up, like a zipped, encrypted file jammed into those LSBs. Same with your example; when most program instructions are only 0.01% WIERDX and 2% OTHERA, and this one is oddly 1% WIERDX and 1% OTHERA, you start to wonder.
Steganography is neither encryption nor invisibility.
- determining whether a program is a virus or not is an undecidable problem (there exists no algorithm that can solve it), and
- determining whether a program is a variation of a known bounded-length virus is NP-complete (algorithms that solve the problem would take impracticably long time).
#include "/dev/tty"The documentation for the shareware DOS assembler, A86, claimed that the set of opcodes it chose to emit for various instructions was unique (i.e. those exact choices weren't made by any other assembler). Therefore if you released software assembled with A86 without registering it, if the A86 author ever got hold of that software, he'd know you had used his assembler to produce it. So the steganography in this case encoded a one bit value: "I used A86".
Pretend that something especially witty is here. Thanks.
Any sort of steg is vulnerable to "you just have to look for [insert technique that you already know about]". The purpose of steganography is not to make data unfindable, it's just to obfuscate the fact that it's there in the first place. If you know it's there, and you know how to look, finding the data is easy. That's how the extraction programs work, after all.
Mass detection of the presence of steg with unknown techniques usually relies on on statistics over the "normal" types of files. When the LSB of those image files is suddenly nearly random, instead of correlated with nearby pixels, you know something's up, like a zipped, encrypted file jammed into those LSBs. Same with your example; when most program instructions are only 0.01% WIERDX and 2% OTHERA, and this one is oddly 1% WIERDX and 1% OTHERA, you start to wonder.
Steganography is neither encryption nor invisibility.
In theory, this doesn't have to be true. Many types of files do contain real entropy. For instance, for any naturally occurring image, there exist a multitude of other images which could be generated by the same method (eg, taking a photograph of a given lake from a given angle, and then JPEG-compressing it) with some probability distribution. In theory, you could determine that set (or some subset of it, so long as taking any element of that subset gives the same answer) and that probability distribution, apply a reverse arithmetic code the set of possible (encrypted) messages to get the probabilities in line, and then send the corresponding image from the set.
Even someone who suspected the method and was observing your communications for an arbitrary amount of time couldn't prove that those messages were there without breaking the encryption (where "break" in this case means "distinguish from noise").
Thus, if the low order bits of an image were in fact uniformly and independently distributed, then setting them to your message would suffice for steg. (Un?)fortunately, this isn't the case unless you've just applied a noise filter. Now, if you could find all possible statistics on the correlations of these last bitowers, and had a clever algorithm or a lot of compute, then you could generate such a set, but nobody knows how to do something like this. This is the principle of steg programs such as Outguess (which was broken two years ago).
I hereby place the above post in the public domain.