Slashdot Mirror


Hydan: Steganography in Executables

An anonymous reader says "Ever wanted to hide a message into an executable? Now you can with Hydan. Presented recently by Rakan El-Khalil at Defcon and Blackhat, this tool lets you embed data into an application without changing its functionality or filesize! Check it out. Use includes steganography as well as embedding a program's signature into itself to verify it's not been tampered with."

18 of 235 comments (clear)

  1. Re:without changing its functionality or filesize! by jdray · · Score: 5, Informative
    From the article:

    Hydan steganographically conceals a message into an application. It exploits redundancy in the i386 instruction set by defining sets of functionally equivalent instructions. It then encodes information in machine code by using the appropriate instructions from each set.

    --
    The Spoon
    Updated 6/28/2011
  2. Soon to be published PDF text. by Anonymous Coward · · Score: 4, Informative

    Hydan: Hiding Information in Program Binaries
    Rakan El-Khalil and Angelos D. Keromytis
    Department of Computer Science, Columbia University in the City of New York
    {rfe3,angelos}@cs.columbia.edu
    Abstract. We present a scheme to steganographically embed information in x86
    program binaries. We define sets of functionally-equivalent instructions, and use
    a key-derived selection process to encode information in machine code by using
    the appropriate instructions from each set. Such a scheme can be used to watermark
    (or fingerprint) code, sign executables, or simply create a covert communication
    channel. We experimentally measure the capacity of the covert channel by
    determining the distribution of equivalent instructions in several popular operating
    system distributions. Our analysis shows that we can embed only a limited
    amount of information in each executable (approximately 1
    110 bit encoding rate),
    although this amount is sufficient for some of the potential applications mentioned.
    We conclude by discussing potential improvements to the capacity of the
    channel and other future work.
    1 Introduction
    Traditional information-hiding techniques encode ancillary information inside data such
    as still images, video, or audio. They typically do so in a way that an observer does not
    notice them, by using redundant bits in the medium. The definition of "redundancy"
    depends on the medium under consideration (cover medium). Because of their invasive
    nature, information-hiding systems are often easy to detect, although considerable work
    has gone into hiding any patterns [1]. In modern steganography, a secret key is used to
    both encrypt the information-to-be-encoded and select a subset of the redundant bits
    to be used for the encoding process. The goal is to make it difficult for an attacker to
    detect the presence of secret information. This is practical only if the cover medium has
    a large enough capacity that, even ignoring a significant number of redundant bits, we
    can still encode enough useful information.
    Aside from its use in secret communications, an information-hiding process [2] can
    be used for watermarking and fingerprinting, whereby information describing properties
    of the data (e.g., its source, the user that purchased it, access control information,
    etc.) is encoded in the data itself. The "secret" information is encoded in such a manner
    that removing it is intended to damage the data and render it unusable (e.g., introduce
    noise to an audio track), with various degrees of success.
    In this paper, we describe the application of information-hiding techniques to arbitrary
    program binaries. Using our system, named Hydan, we can embed information
    using functionally-equivalent instructions (i.e., i386 machine code instructions). To determine
    the available capacity, we analyze the binaries of several operating system distributions
    (OpenBSD 3.4, FreeBSD 4.4, NetBSD 1.6.1, Red Hat Linux 9, andWindows
    XP Professional). Our tests show that the available capacity, given the sets of equivalent
    instructions we currently use, is approximately 1
    110 bits (i.e., we can encode 1 bit
    of information for every 110 bits of program code). Note that we make a distinction
    between the overall program size and the code size. The overall program size includes
    various data, relocation, and BSS sections, in addition to the code sections. Experimentally,
    we have found that the code sections take up 75% of the total size of executables,
    on average. For example, a 210KB statically linked executable contains about 158KB
    of code, in which we can embed 1.44KB (11, 766 bits) of data.
    In comparison, other tools such as Outguess [1] are able to achieve a 1
    17 bit encoding
    rate in images, and are thus better suited for covert communications, where data-rate
    is an important consideration. The 1
    110 encoding rate achieved by the currently implemented
    version of Hydan is obtained when we only use instruction

  3. Re:without changing its functionality or filesize! by Anonymous Coward · · Score: 1, Informative

    Think! Many EXE programs are so filled with NUL (00) that it becomes trivial to drop code in the holes.

    Remember "cavity virii"? Vesselin Bontchev discussed them in one of his papers. They take this approach, infecting EXE files by removing some of the ASCII NUL characters and putting themselves in. As long as they don't remove or add too many characters, it all works the same.

    And, as we all know, there is no storage difference between ASCII NUL and any other character in an EXE file.

  4. slashdotted allready... by deedude · · Score: 4, Informative

    Intresting. Allthough I didn't get a chance to RTFA, hiding encrypted data in an executable doees not seem all that practical to me. It may not change the filesize or functionality, but would it not also change other signature methods (like md5sums?). From my understanding, the main strength of steganography is the file with the encrypted data being indistinguishable from regular files. Since the diffrence can be detected with CRC or MD5, wouldn't that defeat the main purpose?

  5. Re:bologna by garcia · · Score: 2, Informative

    without changing file sizes... let me stick my pirated version of War and Piece in my Hello world application.

    According to the article that you didn't read it seems that the amount of text that you can imbed without affecting the filesize is determined by the original file's contents.

    You wouldn't be able to fit War and Peace into most files but you could fit about 1.44KB of text into a 500k file or so (according to their examples).

  6. Re:without changing its functionality or filesize! by Hi_2k · · Score: 4, Informative

    I was at a SANS conference a while back, and the instructor, Ed Skoudis, explained it as replacing certain operations with equivalents to represent bits. For example, "add 0002h" would be 0, "sub FFFEh", technically equivalent, would be 1. The more replaceable operations a program has, the more it can store. Hydan also encrypts the data with blowfish before storing it.

    --
    When life gives you crap, Make Crapade.
    Sluggy Freelance.
  7. How it's done.. by wfberg · · Score: 4, Informative

    The gist of it is that there are many instructions in x86 that have the same result. You can replace these, and based on which instructions you encounter you can find a hidden message.

    So much for theory. Here's an example; let's say we have a couple of synonyms, like so
    car, automobile; Robert, Bob; crashed, trashed; beer, whisky.
    Let's say we have a little story like so;
    "Bob got in his car. He crashed it, because he had been drinking too much beer. His car is now a total loss."

    Let's say we want to send a secret binary message "0110". Cunningly, we substitute the first of each pair of synonyms if we want to encode a zero, and the second for a one. So the story is now

    "Robert got in his automobile. He trashed it, because he had been drinking too much whisky. His car is now a total loss." (notice how not all key words changed).

    This is a bit harder with natural language, as many words aren't quite right to use in place of the other ("got in his automobile" just doesn't sound right), so it's actually easier to do for machine code.

    --
    SCO employee? Check out the bounty
  8. Re:bologna by Anonymous Coward · · Score: 1, Informative

    Maybe you should have RTFA. They explicitly discuss the fact that their upper bound is less than 1% of the executable size.

  9. Re:Information Theory by Carnildo · · Score: 5, Informative

    inc ax
    add ax, 1
    add al, 1
    inc eax
    add eax, 1

    All of these i386 instructions do the same thing, but they've got different binary representations. If you encode your information by which instruction you use, you can hide the message without changing filesize or functionality.

    --
    "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
  10. Re:embedding signiature?? by Anonymous Coward · · Score: 1, Informative

    actually, all you need to do is specify that the signature included is for the file without the signature. As far as changing the signature, your friendly PGP may make it rather difficult. Lets imagine the following scenario:

    1. Create a program
    2. Sign using your private key
    3. Embed the signature inside the program
    4. Send to the user
    5. User's OS goes to a predetermined place to get your public key (or has it embedded already)
    6. Checks the program against the key

  11. Re:embedding signiature?? by Anders · · Score: 4, Informative

    [...] am dying to hear more about the technology/mathematics behind it! Can you say Nobel Prize nomination?

    There is no Noble Prize for mathematics.

  12. Re:Information Theory by pclminion · · Score: 4, Informative
    These don't all do the same thing.

    Suppose eax = 0xFFFFFFFF.

    Result of "inc ax": eax = 0xFFFF0000
    Result of "inc al": eax = 0xFFFFFF00
    Result of "inc eax": eax = 0x00000000

    They don't do anything near the same thing. The carry bits get lost.

    However, you can substitute "add ax, 1" for "inc ax", and "add al, 1" for "inc al", and "add eax, 1" for "inc eax".

  13. How it works by EduardoFonseca · · Score: 3, Informative

    Since a lot of people is asking, here it goes:

    - How it works
    --------------

    Overview: Hydan finds sets of equivalent instructions in the binary,
    and uses that redundancy to embed data. The larger the set of
    equivalent instructions, the more bits can be embedded. For example,
    if instructions {a, b, c, d} are all equivalent, then we can embed two
    bits of information when any of those instructions are encountered.

    Embedding: Hydan goes through the application sequentially, and
    whenever it finds an instruction that it has equivalents to, it
    substitutes in the instruction that represents the bit(s) of data
    hydan is currently embedding. A simple example: "add %eax, 50" is
    equivalent to "sub %eax, -50". So this set is {"add %reg, $imm", "sub
    %reg, $imm"}. Whenever an instruction of the form "add %reg, $imm" is
    encountered, hydan can embed one bit of the message. If the bit is 0,
    it leaves it as an add instruction. Else it substitutes it to "sub
    %reg, -$imm". (and vice versa)

    Decoding: When it is time to extract the embedded message, every
    "add %reg, $imm" is taken to mean bit 0, and every sub instruction
    encodes the bit 1, and the embedded message is reconstructed that way.

    Encryption: Hydan first prompts the user for a passphrase before
    embedding or decoding the contents of the application. In the case of
    embedding, hydan prepends the length of the message to the message,
    encrypts that with blowfish in cbc mode, and embeds the result into
    the application. When decoding, hydan extracts all the possible bits
    from the application (since it does not know how long the message is
    a-priori; that information is encrypted). Hydan then decrypts the
    message properly since it is in CBC mode and need not know the total
    length first. The lenght is then used to truncate the message to
    size.

    Instructions: For a complete list of the sets of equivalent
    instructions, please refer to hdn_insns.c.

    - Attacks
    ---------

    There are three classes of attacks that are applicable to hydan:
    overwriting, detection, and extraction. The overwriting attack refers
    to the ability to overwrite the message embedded in the application,
    whether its presence was detected or not. An attacker should also not
    be able to detect the presence of a message in the application, nor
    decrypt it.

    The overwriting attack: hydan currently has no means to protect
    against this type of attack. Since hydan embeds the message
    sequentially, starting from the top of the application, an attacker
    could re-run hydan with a bogus text and embed that on top of the
    original message. The intended recipient of the application would
    thus be unable to retrieve the original message. One way this could
    be solved is to add an error correcting code to the encoding of the
    message, and distribute the message throughout the application in a
    passphrase specific manner. This way only parts of the original
    message would be overwritten, and the original may still be
    reconstructed. Of course, there is nothing that can be done if the
    attacker insists on overwriting with a message size that is the
    maximum embeddable in the given application. However, the computation
    required to overwrite each application on a large scale might be
    prohibitive enough to discourage this as a routine behaviour, at an
    ISP for example.

    Detection: Binaries produced by hydan should not exhibit obvious
    patterns. At the most superficial level, this is accomplished by not
    embedding any marker or other easily recognizable token. At best, the
    embedded data looks random (which is why it is bf encrypted). At the
    assembly level however, the current version of hydan makes no attempt
    at mimicing the original distribution of instructions in the
    application, and is thus vulnerable to statistical analysis. Indeed,
    although all the instructions are equivalent, some may appear more
    frequent

  14. Re:A shareware assembler used this trick. by The+FooMiester · · Score: 2, Informative

    I believe the program you speak of is a86. It was $50 to register it, which is why it was unpopular.

    It used the same method hydan uses. It used equivelant instructions that were "different" from the way the code was written. I'd used it a bit for myself, and noticed that was what it did when I opened the files later with debug.

    The documentation never really said how, it just said it "fingerprinted" the code.

    --
    The previous has been a secret message to my comrades.
  15. Steganography by SiliconEntity · · Score: 4, Informative

    In cryptography, steganography has a particular meaning. In the same way that the goal of encryption is to prevent the message from being read, the goal of steganography is to prevent the message from being detected. A successful steganographic embedding is one in which a third party would not be able to find out if it is there. If you gave him two files, one with an embedded message and the other unprocessed, he should not be able to tell them apart.

    For a method to truly be steganography, it's not enough just to embed some data into another. That's possible any time there's redundancy. The requirement is to make it so clever and/or subtle that there is no way to distinguish a processed file from an unprocessed one.

    I doubt that this new method passes the test. Generally, while there are many synonyms possible in code, both in single instructions and in short sequences of instructions, the statistics of how these are distributed in unprocessed files are probably not random. Chances are that one synonym is used more than another. If you embed random data in a straightforward way, you will then have equal usages of both alternatives. This is a highly unusual condition, and to someone in the know, files like these will be easily distinguished.

    Only if they have found a kind of synonym which already has purely random statistics, or where they are careful to precisely mimic the statistics of the original file as they add their data, can this truly be considered a form of steganography.

  16. From a steganographer- unimpressive by Anonymous Coward · · Score: 1, Informative

    I do steganography research and this technique is entirely unimpressive.

    The whole point of steganography is to make the hidden message undetectable. Making a message unsuspicious is being clever, but its not steganography. This is like renaming your pirated divx movies to "shareware.exe". Sure it makes it less suspicious, but its easy to check.

    Like the article mentions. The technique is completely vulnerable to statistical attacks. But these might not even be necessary, simply overwrite the embedded information with the default compiler values (surely the compiler would pick the same equivalent expression every time). Its only a matter of effort to create a detector.

    Real steganography is undetectable (at least below a certain probability of detection).

  17. Re:First Post and On Topic by Dun+Malg · · Score: 3, Informative
    Here's my question: How can you hide a message in an executable containing a single NOOP, or in the Perl program "" (without the quotes). You can't hide much in there.

    The answer there is "you can't". You need a compiled executable large enough to have multiple instances of "alterable sequences". The way I understand it, they fiddle with reversable/interchangeable opcodes to create "bits". Say a program has 500 mixed instances of: (this is all made-up assembly)

    JNZ $(foo) ; jump if not zero to address (foo)
    JMP $(bar) ; otherwise jump to address (bar)

    and
    JZ $(bar) ; jump if zero to address (bar)
    JMP $(foo) ; otherwise jump to (foo)

    As you can see, a sequence of a JNZ followed by a JMP can easily be re-written as a JZ followed by a JMP. The program only needs to go through and change each instance to match bitwise value of the "message", treating JNZ-JMP as a bitwise 0 and JZ-JMP as a 1. There are probably more instances of "two ways to do it" one can exploit in a given executable to yield even bigger "message spaces".
    --
    If a job's not worth doing, it's not worth doing right.
  18. Re:First Post and On Topic by WhiteDragon · · Score: 2, Informative

    One of the most obvious examples of a jump -> jump is in the interrupt vector table, back in the DOS days. Also, some processors (such as intel 386, which I learned assembly programming on) have short and long jump instructions, so you might need something like a conditional short jump to a location that contains an unconditional long jump. IIRC, i386 didn't have conditional long jump instructions.

    --
    Did you mount a military-grade, variable-focus MASER on an unlicensed artificial intelligence?