Slashdot Mirror


Hydan: Steganography in Executables

An anonymous reader says "Ever wanted to hide a message into an executable? Now you can with Hydan. Presented recently by Rakan El-Khalil at Defcon and Blackhat, this tool lets you embed data into an application without changing its functionality or filesize! Check it out. Use includes steganography as well as embedding a program's signature into itself to verify it's not been tampered with."

23 of 235 comments (clear)

  1. What ... by Anonymous Coward · · Score: 4, Funny

    "What are you doing?"

    "Oh, hydan out."

  2. Right now by Tsiangkun · · Score: 4, Funny

    it looks like the information is being hidden by a slashdotted executable.

  3. Signed binaries... nice idea by advocate_one · · Score: 4, Insightful

    especially if the OS goes off and double checks the executable is legit before executing it...

    --
    Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
  4. embedding signiature?? by guanno · · Score: 4, Insightful

    If you embed a signiature of the file into the file, this by definition changes the file's signiature. At best you can append the signiature. However if the file can be modified, so can it's signiature.

    If these folks have figured out a way of circumventing this innate paradox, I'm impressed and am dying to hear more about the technology/mathematics behind it! Can you say Nobel Prize nomination?

    1. Re:embedding signiature?? by jrockway · · Score: 5, Interesting

      Unless you do it like this (an example is always easy to understand).

      Say you have an executable:

      1337PROGRAM

      Your signature checking routine then does this:

      1_3_3_7_P_R_O_G_R_A_M

      and computes the hash

      deadbabeca

      And then sends:

      1d3e3a7dPbRaObGeRcAaM

      To reverse, we extract the hash (deadbabeca) and the "original" executable.

      Then we compute the hash (of 1_3_3_7...) and check if it matches...

      In summary, we embedded a checksum, but we removed it before we checked it. Simple, really.

      --
      My other car is first.
    2. Re:embedding signiature?? by Ioldanach · · Score: 4, Interesting
      If you embed a signiature of the file into the file, this by definition changes the file's signiature. At best you can append the signiature.
      1. Set the swappable instructions in the program to their bitwise equivalent of 0.
      2. Calculate a signature based on that number.
      3. Swap the instructions to encode that number.

      To decode.

      1. Find swappable instructions.
      2. Determine what bit setting they're at.
      3. Set their bit setting to 0.
      4. Recalc signature based on the new bit setting.
      5. Compare to the bit setting you just retrieved.

      I would still recommend publishing a separate public key, however, and include an encrypted signature in the program. As you say, it can always be changed and re-encoded.

      On the other hand, this might be useful on a server, by encoding a public key and checker on a CD-R and checking all your programs periodically against the CD-R key. You could encode signatures in each program and be able to upgrade programs from a central encoding server without having to write a new cd each time.

    3. Re:embedding signiature?? by Anders · · Score: 4, Informative

      [...] am dying to hear more about the technology/mathematics behind it! Can you say Nobel Prize nomination?

      There is no Noble Prize for mathematics.

  5. Re:without changing its functionality or filesize! by Carnildo · · Score: 5, Interesting

    Many executable formats include unused space for alignment purposes. For example, I've been working on a Mach-O equivalent of the super-tiny ELF executable mentioned a few days back. The executable produced by GCC includes 300 bytes of code and headers, and 8000 bytes of padding.

    --
    "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
  6. The question, answered. by caluml · · Score: 4, Funny
    Ever wanted to hide a message into an executable?

    Not really :)
    But I'd like to make that dog downstairs stop barking.

  7. Re:without changing its functionality or filesize! by jdray · · Score: 5, Informative
    From the article:

    Hydan steganographically conceals a message into an application. It exploits redundancy in the i386 instruction set by defining sets of functionally equivalent instructions. It then encodes information in machine code by using the appropriate instructions from each set.

    --
    The Spoon
    Updated 6/28/2011
  8. A new low... by ivan256 · · Score: 4, Funny

    Not only a dupe, but a link to the original story is listed on the referenced page.

    Wow.

    1. Re:A new low... by callipygian-showsyst · · Score: 4, Funny
      You're just not playing the game! I'll let you in on it:

      A bunch of folks who got pissed off that their stories never got approved on /. got together on alt.syntax.tactical and devised a plan. What they're doing is finding OLD slashdot stories and resubmitting them.

      So far, it's been moderatly successful with 4-5 dupes getting through each week. This story was particularly amusing because the article has a link to their /. mention! Good work to the folks at a.s.t!

      I suggest you start playing along too! It's fun to show how worthless the /. editors are.

  9. Hydan... by Anonymous Coward · · Score: 5, Funny

    The message retrieval method should be called "Hydan Seek"

  10. Soon to be published PDF text. by Anonymous Coward · · Score: 4, Informative

    Hydan: Hiding Information in Program Binaries
    Rakan El-Khalil and Angelos D. Keromytis
    Department of Computer Science, Columbia University in the City of New York
    {rfe3,angelos}@cs.columbia.edu
    Abstract. We present a scheme to steganographically embed information in x86
    program binaries. We define sets of functionally-equivalent instructions, and use
    a key-derived selection process to encode information in machine code by using
    the appropriate instructions from each set. Such a scheme can be used to watermark
    (or fingerprint) code, sign executables, or simply create a covert communication
    channel. We experimentally measure the capacity of the covert channel by
    determining the distribution of equivalent instructions in several popular operating
    system distributions. Our analysis shows that we can embed only a limited
    amount of information in each executable (approximately 1
    110 bit encoding rate),
    although this amount is sufficient for some of the potential applications mentioned.
    We conclude by discussing potential improvements to the capacity of the
    channel and other future work.
    1 Introduction
    Traditional information-hiding techniques encode ancillary information inside data such
    as still images, video, or audio. They typically do so in a way that an observer does not
    notice them, by using redundant bits in the medium. The definition of "redundancy"
    depends on the medium under consideration (cover medium). Because of their invasive
    nature, information-hiding systems are often easy to detect, although considerable work
    has gone into hiding any patterns [1]. In modern steganography, a secret key is used to
    both encrypt the information-to-be-encoded and select a subset of the redundant bits
    to be used for the encoding process. The goal is to make it difficult for an attacker to
    detect the presence of secret information. This is practical only if the cover medium has
    a large enough capacity that, even ignoring a significant number of redundant bits, we
    can still encode enough useful information.
    Aside from its use in secret communications, an information-hiding process [2] can
    be used for watermarking and fingerprinting, whereby information describing properties
    of the data (e.g., its source, the user that purchased it, access control information,
    etc.) is encoded in the data itself. The "secret" information is encoded in such a manner
    that removing it is intended to damage the data and render it unusable (e.g., introduce
    noise to an audio track), with various degrees of success.
    In this paper, we describe the application of information-hiding techniques to arbitrary
    program binaries. Using our system, named Hydan, we can embed information
    using functionally-equivalent instructions (i.e., i386 machine code instructions). To determine
    the available capacity, we analyze the binaries of several operating system distributions
    (OpenBSD 3.4, FreeBSD 4.4, NetBSD 1.6.1, Red Hat Linux 9, andWindows
    XP Professional). Our tests show that the available capacity, given the sets of equivalent
    instructions we currently use, is approximately 1
    110 bits (i.e., we can encode 1 bit
    of information for every 110 bits of program code). Note that we make a distinction
    between the overall program size and the code size. The overall program size includes
    various data, relocation, and BSS sections, in addition to the code sections. Experimentally,
    we have found that the code sections take up 75% of the total size of executables,
    on average. For example, a 210KB statically linked executable contains about 158KB
    of code, in which we can embed 1.44KB (11, 766 bits) of data.
    In comparison, other tools such as Outguess [1] are able to achieve a 1
    17 bit encoding
    rate in images, and are thus better suited for covert communications, where data-rate
    is an important consideration. The 1
    110 encoding rate achieved by the currently implemented
    version of Hydan is obtained when we only use instruction

  11. Re:First Post and On Topic by Ioldanach · · Score: 4, Insightful
    If steganography is now in the hands of joe user, how useful is it really? It's not exactly a secret anymore, is it? ;P

    If I transmit files out to my friends that include encrypted data using steganography, then the extra data should be indistinguishable, effectively hiding within the noise of random crap on the web/usenet/email. Thus, without the key, an intercepted message is difficult to detect, and even if detected, I have sufficient plausible deniability to say "nothing there".

    In order to detect an message encrypted and included inside another file, you either need to know its there and be looking for it, compare it to an existing file which should be identical, or statistically detect some aspect of the file. If you know it should be there, you just need to grab any file that looks like the file you're seeking, grab the relevant bits, and attempt decryption. If you have a file that should be identical, (say, an image that looks the same that was posted to usenet a couple days earlier), you can take the bits that are different and try and make some sense of them. If you are just doing statistical analysis, you might be able to find files which have a set of bits whose randomness is just shy of where it should be, and maybe those bits mean something.

    In short, unencrypted steganography isn't particularly useful, but encrypted, you can really hide things.

  12. slashdotted allready... by deedude · · Score: 4, Informative

    Intresting. Allthough I didn't get a chance to RTFA, hiding encrypted data in an executable doees not seem all that practical to me. It may not change the filesize or functionality, but would it not also change other signature methods (like md5sums?). From my understanding, the main strength of steganography is the file with the encrypted data being indistinguishable from regular files. Since the diffrence can be detected with CRC or MD5, wouldn't that defeat the main purpose?

  13. Re:without changing its functionality or filesize! by Hi_2k · · Score: 4, Informative

    I was at a SANS conference a while back, and the instructor, Ed Skoudis, explained it as replacing certain operations with equivalents to represent bits. For example, "add 0002h" would be 0, "sub FFFEh", technically equivalent, would be 1. The more replaceable operations a program has, the more it can store. Hydan also encrypts the data with blowfish before storing it.

    --
    When life gives you crap, Make Crapade.
    Sluggy Freelance.
  14. How it's done.. by wfberg · · Score: 4, Informative

    The gist of it is that there are many instructions in x86 that have the same result. You can replace these, and based on which instructions you encounter you can find a hidden message.

    So much for theory. Here's an example; let's say we have a couple of synonyms, like so
    car, automobile; Robert, Bob; crashed, trashed; beer, whisky.
    Let's say we have a little story like so;
    "Bob got in his car. He crashed it, because he had been drinking too much beer. His car is now a total loss."

    Let's say we want to send a secret binary message "0110". Cunningly, we substitute the first of each pair of synonyms if we want to encode a zero, and the second for a one. So the story is now

    "Robert got in his automobile. He trashed it, because he had been drinking too much whisky. His car is now a total loss." (notice how not all key words changed).

    This is a bit harder with natural language, as many words aren't quite right to use in place of the other ("got in his automobile" just doesn't sound right), so it's actually easier to do for machine code.

    --
    SCO employee? Check out the bounty
  15. Re:Information Theory by Carnildo · · Score: 5, Informative

    inc ax
    add ax, 1
    add al, 1
    inc eax
    add eax, 1

    All of these i386 instructions do the same thing, but they've got different binary representations. If you encode your information by which instruction you use, you can hide the message without changing filesize or functionality.

    --
    "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
  16. Re:Information Theory by pclminion · · Score: 4, Informative
    These don't all do the same thing.

    Suppose eax = 0xFFFFFFFF.

    Result of "inc ax": eax = 0xFFFF0000
    Result of "inc al": eax = 0xFFFFFF00
    Result of "inc eax": eax = 0x00000000

    They don't do anything near the same thing. The carry bits get lost.

    However, you can substitute "add ax, 1" for "inc ax", and "add al, 1" for "inc al", and "add eax, 1" for "inc eax".

  17. Re: For the slightly less knowledgable by Black+Parrot · · Score: 5, Funny


    > steganography: the hiding of a secret message within an ordinary message and the extraction of it at its destination.

    I thought steganography meant pictures of stegasaurs making little stegasarus.

    --
    Sheesh, evil *and* a jerk. -- Jade
  18. Been done for ages by A86 by iamacat · · Score: 4, Insightful

    This guy wrote his assembler to generate unusual form of MOV instructions at least 10 years ago. In this way, he can find out if a program is generated using an unregistered version of A86.

    Any CPU that has an instruction to exchange two registers will have some redundancy, but for X86 even basic mov (as well as add, sub, cmp and so on) specifies both two operands and a flag that specifies which one is source and which one is destination. The significance is that both operands can be registers, but only one can be a memory reference.

    A much more impressive use would be a program that reads its own code as data to save the last few bytes, especially if it has a real purpose, like fitting a game into a fixed-size ROM.

  19. Steganography by SiliconEntity · · Score: 4, Informative

    In cryptography, steganography has a particular meaning. In the same way that the goal of encryption is to prevent the message from being read, the goal of steganography is to prevent the message from being detected. A successful steganographic embedding is one in which a third party would not be able to find out if it is there. If you gave him two files, one with an embedded message and the other unprocessed, he should not be able to tell them apart.

    For a method to truly be steganography, it's not enough just to embed some data into another. That's possible any time there's redundancy. The requirement is to make it so clever and/or subtle that there is no way to distinguish a processed file from an unprocessed one.

    I doubt that this new method passes the test. Generally, while there are many synonyms possible in code, both in single instructions and in short sequences of instructions, the statistics of how these are distributed in unprocessed files are probably not random. Chances are that one synonym is used more than another. If you embed random data in a straightforward way, you will then have equal usages of both alternatives. This is a highly unusual condition, and to someone in the know, files like these will be easily distinguished.

    Only if they have found a kind of synonym which already has purely random statistics, or where they are careful to precisely mimic the statistics of the original file as they add their data, can this truly be considered a form of steganography.