Slashdot Mirror


Avast Launches Open-Source Decompiler For Machine Code (techspot.com)

Greg Synek reports via TechSpot: To help with the reverse engineering of malware, Avast has released an open-source version of its machine-code decompiler, RetDec, that has been under development for over seven years. RetDec supports a variety of architectures aside from those used on traditional desktops including ARM, PIC32, PowerPC and MIPS. As Internet of Things devices proliferate throughout our homes and inside private businesses, being able to effectively analyze the code running on all of these new devices becomes a necessity to ensure security. In addition to the open-source version found on GitHub, RetDec is also being provided as a web service.

Simply upload a supported executable or machine code and get a reasonably rebuilt version of the source code. It is not possible to retrieve the exact original code of any executable compiled to machine code but obtaining a working or almost working copy of equivalent code can greatly expedite the reverse engineering of software. For any curious developers out there, a REST API is also provided to allow third-party applications to use the decompilation service. A plugin for IDA disassembler is also available for those experienced with decompiling software.

48 of 113 comments (clear)

  1. Re: Wow! So many architectures! by Anonymous Coward · · Score: 5, Insightful

    Get over yourself and stop complaining about things being given away to you for free. It's a shame that people complain about open source software when it's being given to them for free. The decompiler could have never been released to the public or released as a closed source program. Your complaint about the architectures it supports or doesn't support totally rings hollow.

  2. Re:Wow! So many architectures! by J053 · · Score: 4, Informative

    ...but no x86_64.

  3. Re:Wow! So many architectures! by bws111 · · Score: 2

    Or any other 64 bit arch.

  4. A debugger does this by Snotnose · · Score: 2, Interesting

    Back in the late 70's I loaded TRS-80 games into my debugger, it also let me dump the results into a text file. Finding things like "jump to label_foo" helped, but was not the be-all end-all.

    The killer was when I debugged my TRS-80 BASIC interpreter in ROM. You'd have some 3 byte instruction, "jump here", then somewhere else you'd have a 3 byte instruction "jump into the middle of this 3 byte instruction to do something completely different". My understanding is Bill did those, but for all the evil he did I have major respect for his coding abilities.

    I beat a lot of games running my debugger on them. 90% sure it was called TRS-MON, but wouldn't bet my retirement on it.

    1. Re:A debugger does this by Snotnose · · Score: 1

      What part of "TRS-80" and "late 70's" would lead you to think anything else?

      I'm guessing the reverse engineered C++ code is gonna cost a hella amount of time to reverse engineer the reverse engineered code the tool generates.

      I've reversed engineered C. C++? Not seeing how a tool is gonna be a lot of help. Basing this on going from C to ASM is pretty straightforward. Going from C++ to C is problematic, especially as you are going C++ -> ASMas opposed to C++ -> C.

    2. Re:A debugger does this by suburbanmediocrity · · Score: 1

      Wow, I just came to the comment section to talk about using a disassembler on the trs-80 to beat games. Is that you capn K?

    3. Re:A debugger does this by Tony+Isaac · · Score: 3, Insightful

      One problem with a lot of those old debuggers and disassemblers was that they weren't that smart about what they were looking at. You often had to tell them a range of memory to disassemble, and they would blindly treat everything they saw as code, even if it was actually data. This was partly a problem because in those days, code and data weren't so neatly divided from one another, everything could live anywhere in memory. It was actually common for software to "poke" data into memory and then execute it. Ah, the good old days.

    4. Re:A debugger does this by AmiMoJo · · Score: 4, Interesting

      Indeed, poking code is often the fastest way to do stuff on those older systems where memory bandwidth and CPU clocks are very limited.

      We called it speedcode back in the day. Say you wanted to calculate and plot a load of points on the screen. Normally you would calculate the coordinates, store them and then later pass a reference to some plotting function. To do it faster you could turn calls to the plot function into an unrolled series of instructions, and instead of reading the coordinates every time just poke them directly into the immediate instruction op-codes.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    5. Re:A debugger does this by bobintetley · · Score: 2

      We called it self-modifying code. It was really useful for handling interrupts on low end chips like the 6502. In the same sort of way you described, you could STA/STX/STY the register values in the bytes after the LDA/LDY/LDX opcodes at the end of the interrupt handler to save intermediate storage.

  5. Should crossref with github. by shess · · Score: 4, Interesting

    Perhaps if you built a fingerprint based on the structure of calls across functions, you could map it back to source code from github. Not that malware is generally posted to github, but I'd be surprised if they didn't use a TON of third_party libraries, and factoring all of those out would make what's left easier to understand and also let you focus better.

  6. Re:Wow! So many architectures! by viperidaenz · · Score: 2

    It's accurate. According to retdec.com, RetDec only supports 32bit architectures.

  7. Re:doesnt work by hcs_$reboot · · Score: 1

    Try
    gcc --reverse prog -o prog.c

    --
    Slashdot, fix the reply notifications... You won't get away with it...
  8. Re:Wow! So many architectures! by hcs_$reboot · · Score: 1

    ...but no x86_64.

    yet.

    --
    Slashdot, fix the reply notifications... You won't get away with it...
  9. Re:This is why I use ExePacking... apk by 110010001000 · · Score: 1

    Why wouldn't you just release it as open source? Is it because APK Hosts File Engine 10++ 32/64-bit contains MALWARE and VIRUSES?

  10. combine with neural network by Gravis+Zero · · Score: 1

    One of the big issues with decompilers is that compilers do not generate the same output for the same input. In addition, multiple versions of a compiler and different flags yield different results as well. After some thought, I've come to the conclusion that the only viable solution is to build a neural network that can detect and compensate for all the idiosyncrasy using many different test cases (and their binaries) as training data. Ultimately be able to return not only the most likely version of the source code but also the compiler name, version and flags used to compile it.

    We have the technology to solve this seemingly impossible problem.

    --
    Anons need not reply. Questions end with a question mark.
    1. Re:combine with neural network by arth1 · · Score: 1

      How the FUCK are you going to recover the variable names, the preprocessor directives, and the comments?

      You don't need them. Really.

      And what if the original program had inline assembler? What are you going to do with that?

      It will will generate code that does the same - it does not have to look the same, as long as what it does is the same.

  11. double-edged? by 4wdloop · · Score: 1

    Probably also helpful when searching for vulnerabilities?

    --
    4wdloop
  12. other CPUs/archs are missing by 4wdloop · · Score: 1

    AVR, MSP and L106 (Tensilica/ESP8266) missing...

    Especially for MSP, there seem to be a lot of products using it (Honeywell thermostats, Ikea lighting)...

    --
    4wdloop
  13. Re:I've said why a million times here... apk by 110010001000 · · Score: 1

    What would stop someone from creating a malicious software and naming it APK Hosts File Engine 10++ 32/64-bit? I mean, different malicious software, because I am assuming your version of APK Hosts File Engine 10++ 32/64-bit is MALWARE. So why not just open source it, so we can see what it does?

  14. Re:You need to learn to read... apk by 110010001000 · · Score: 1

    I have no idea who "virustotal.com" is. Why not just show us the source code? Then we can find out if APK Hosts File Engine 10++ 32/64-bit is MALWARE or isn't MALWARE. I am assuming it is MALWARE since you are keeping the code hidden and obfuscate the exe just like a virus does.

  15. Re:You need to learn to read... apk by 110010001000 · · Score: 1

    According to virustotal.com it says:

    ClamAV Possibly Unwanted Application. While not necessarily malicious, the scanned file presents certain characteristics which depending on the user policies and environment may or may not represent a threat. For full details see: https://www.clamav.net/documen... . Symantec reputation Suspicious.Insight

    Sounds like malware to me.

  16. eFast by tepples · · Score: 1

    I am assuming your version of APK Hosts File Engine 10++ 32/64-bit is MALWARE.

    I'm guessing others have tested it in a sandbox for malicious behavior. Do you assume Intel and AMD CPUs contain malware? And if you do, do you use them despite said assumption?

    So why not just open source it

    If this post is to be believed, APK doesn't want people adding malware, building it, and distributing it, like eFast did with Chromium.

    The other option is for some Slashdot user to make a free replacement. Does the functionality described in this specification appear useful?

    1. Re:eFast by fisted · · Score: 1

      If this post is to be believed [slashdot.org], APK doesn't want people adding malware, building it, and distributing it

      Since you seem to have a little reading comprehension issue, let me copypaste the question again:

      What would stop someone from creating a malicious software and naming it APK Hosts File Engine 10++ 32/64-bit?

  17. Re:Addendum (couldn't fit all this in too) by tepples · · Score: 1

    Not every program packed with UPX is a virus.

  18. encryption by bugs2squash · · Score: 4, Funny

    unfortunately it de-compiles the machine code to perl.

    --
    Nullius in verba
  19. Interesting to watch this develop by tzanger · · Score: 1

    I ran some of my own ARM code through this. While I did build with -Os, I did not strip the .elf. The source it produced was a reasonable approximation of what I wrote, but it was far from legible. Little things like using hexadecimal for memory addresses are a minor nitpick, but I found it had trouble even with basic interrupt handlers. I would have expected something aimed at targeting embedded systems would do a better job of of this, but still... very interesting (and very fast)!

  20. Sub-architectures have value by OrangeTide · · Score: 1

    If it does PIC32 specific functionality like decode that chip's MMIOs, that's a nice feature of simply decoding MIPS object files.

    --
    “Common sense is not so common.” — Voltaire
  21. decompiles INTO WHAT ? by thygate · · Score: 1

    no mention in the article of what the decompiler actually decompiles to ..

    1. Re:decompiles INTO WHAT ? by Anonymous Coward · · Score: 3, Informative

      "no mention in the article of what the decompiler actually decompiles to .."

      According to https://github.com/avast-tl/retdec:
      Output in two high-level languages: C and a Python-like language.

  22. Re:Wow! So many architectures! by jonwil · · Score: 3, Insightful

    The thing is open source, if you really want x86-64, grab the code and write something :)

  23. UltraEdit is my file "detector" by Trax3001BBS · · Score: 1

    UltraEdit (Text editor) will show all text in a file, one can fairly call a files function with just that.

    Long ago there was a program called "Peek" that showed all text in a file none of the hex/high Ascii that UltraEdit also shows; W2K broke it and I've missed it every since.

    I'll be giving this program a try.

  24. Re:Wow! So many architectures! by ShanghaiBill · · Score: 2

    The thing is open source, if you really want x86-64, grab the code and write something :)

    x86 is hard to decompile. It doesn't have fixed length instructions, so it is difficult to figure out where opcodes begin and end. It is even possible to write code that can execute two different sequences of instructions by offsetting the instruction pointer by a byte. I don't think any decompiler could deobfusticate that.

  25. Re: Wow! So many architectures! by sad_ · · Score: 1

    Get over yourself and stop complaining about things being given away to you for free. It's a shame that people complain about open source software when it's being given to them for free.

    isn't that true. what i find even more amazing is that those same people mostly never complain about shortcomings of commercial software.

    --
    On a long enough timeline, the survival rate for everyone drops to zero.
  26. Re:I juat let him look like a lying SCHMUCK by sa666_666 · · Score: 1

    This is probably a waste of time, but ... When you're typing a message and say "see the p.s. below", it means you _know_ at that point that you will be having a p.s. But in that case you could just place the text where you are, and not _need_ a p.s.

  27. Re:Wow! So many architectures! by Hal_Porter · · Score: 1

    PIC32 binaries are pronounced with more of a guttural accent than MIPS ones.

    --
    echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
  28. Does care by DrYak · · Score: 1

    a decompiler won't care whether you compiled a C++, assembler, C or whatever language the program being reversed was compiled on.

    It will care, because some language (e.g.: C++) have specific data structures and ways (vtables) to handle some language specific features (object virtual member inheritance) which could be detected by the specific plugin (i.e.: instead of spewing a weird mess of nested "struct" and pointer-to-pointers, it can recognize that his is just a call to a virtual method)
    (for the few hipsters outthere : think the difference between vala and the corresponding GObject pure-C code).

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  29. Decompiler are not simple debugger/dumper by DrYak · · Score: 3, Informative

    x86 is hard to decompile. It doesn't have fixed length instructions, so it is difficult to figure out where opcodes begin and end. It is even possible to write code that can execute two different sequences of instructions by offsetting the instruction pointer by a byte. I don't think any decompiler could deobfusticate that.

    The simple code dumper that comes with garden variety debugger won't easily deobfuscate that. (You need to manually ask the debugger to start dumping from the 2 overlapping point).

    That why, the best decompilers available in the 90s used some sort of virtual machine to follow through the execution flow, and be able to distinguish such kind of "frame shifts" (that's actually a biology term, I've forgotten what the proper CS term is), and also be able to understand a bit of self-modifying code.
    (Basically, the decompiler will notice that various part of the code make calls into the same region but at an odd offset, and will automatically try dumping with from each overlapping point)

    Makes it also possible to put actually-useful label/names into variable. (call something "sound_frequency" instead of "var184" because by following the data flow, the decompiler release that this is the parameter the is output to the PC-Speaker tone generator).

    Sourcer by V-Com was one such good decompiler.
    (I managed to learn quite a ton of tricks like PCM play on the PC Speaker, tweaked graphical modes, etc. simply by using sr to inspect interesting executables.
    I even manage to desinfect a cracked game that was saddly being distributed infected with some virus)

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  30. Old technique, actually by DrYak · · Score: 1

    the structure of calls across functions

    Recognizing some code flow was a staple of the best decompiler back in the 90s :
    e.g. being able to recognize a certain code pattern (a sequence of ports smashing) as a high-level abstraction (initializing sound hardware).

    Your idea would certainly be the 2010s-era equivalent. (= This portion looks like code reuse from "Zstd" decompressor)

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  31. ASM? These kids don't need no stinking ASM! by Seven+Spirals · · Score: 1

    Of course, this only helps the 5 of us left who still code in ASM. "Kids these days" seem to think that ASM "sucks" because "it's old". If the language doesn't have trait based generics, zero cost abstractions, and a partridge in a pear tree then again it's "old" and it "sucks". It's entertaining to watch your average 20-something java/python/PHP coder try to take on ASM. Their efforts generally don't last more than about five minutes when they find out they have to build their own control structures, and mama's not gonna wipe their butts with Visual Studio tooltip hints. If this wizzbang tool decompiled code into Rust, then maybe the cool kids would want it. As it is, they will do what they always do with ASM based tools: hand-wave like they know exactly how it works and then promptly ignore it. Anyhow, back to my ASM-One environment on my 68k Amiga. If anyone needs me I'll be here squatting on this temporal nexus to the 1990s. :-)

  32. Re:I'll let /.ers speak for me... apk by fisted · · Score: 1

    "I personally use a HOSTS file blocker produced from a genius called APK." by 110010001000 on Friday October 27, 2017

    The irony in this is brilliant -- you're actually too stupid to realize that 110010001000 is the guy you're "arguing" with.

    You' probably even think that he genuinely thinks you're a genius rather then openly mocking you. Oh boy.

    Now, why don't you stop with your obnoxious ads? Wasn't one of your marketing points that your shitware removes ads? Does it remove your spammed ads on /.?

  33. Code signing with trust on first use by tepples · · Score: 1

    What would stop someone from creating a malicious software and naming it APK Hosts File Engine 10++ 32/64-bit?

    The fact that its hash wouldn't match that of the existing APK Hosts File Engine 10++ 32/64-bit posted all over forums.

    Now if you replace "10" with "11" in your question, you have a more interesting problem: how to distinguish subsequent versions of the same publisher's application from an impostor's malware. The publisher of the authentic application could generate a self-signed code signing certificate and sign each version of all of its programs. Then each user would configure his devices to "Trust other programs from publisher APK". In my opinion, Microsoft screwed up Authenticode for hobbyist programmers by requiring paid organizational validation of all certificates from a commercial certificate authority rather than allowing reputation to accumulate on self-signed publisher certificates.

  34. Other machine code decompilers by abacus1 · · Score: 2

    How does this project compare to the existing machine code compilers, namely Valgrind's VEX library and Qemu's tiny code generator (https://wiki.qemu.org/Documentation/TCG)?

  35. Re:No I'm not (it helps hide how I detect it) by o by arth1 · · Score: 1

    No I'm not (it helps hide how I detect it) by obfuscation hiding functions/methods where I summon an .exe size for even 1 BYTE in sizecheck (no virus is that small)

    You don't know much about writing viruses; that much is clear, because checking the size is a waste.
    One popular approach for viruses is to put the original file elsewhere (where elsewhere can either be elsewhere on a file system, or for file systems that support it, in a resource fork or attribute list of the same file), and then pad the virus to the desired file length.
    For weak CRCs, even change the padding to return the same CRC.

    But worse, you also then prevent the binary from running on systems where the binaries are always modified before being run, for example by rebase/prelink, or by adding library paths to the executable, or systems which depend on setting contexts on files, or requires nx bit set, or ...
    Making assumptions about the runtime environment is so 1990s.

    Security through obscurity is what you're attempting here, and you require people to lower their security if it's too high.

  36. Re:You definitely need to learn to read... apk by arth1 · · Score: 1

    Good programs use exe packers too as I said

    Name one that's from this decade.

  37. Re:Wow! So many architectures! by ShanghaiBill · · Score: 1

    Even if you ignore the few 32-bit instructions in thumb it is still common to interleave data with the code.

    The difference is that with x86 you can interleave code with code.
    You can't do that with RISC.

  38. Re:Does it matter? Others prove you wrong by arth1 · · Score: 1

    I don't have to prove a negative. That's like saying "prove that god doesn't exist". The onus is on those who make claims to back them up, not on others to disprove them.

    All it would take was one example to prove your claim. How hard would that be, if what you claimed were true?

    Hint: Instead of posting URLs to posts that nobody will bother to follow, try to actually back up your wild claims with some actual meaningful text. Without bolding random words, without changing the subject and referring to it, and without a P.S. at the end.
    If you looked less like a kook, perhaps some would take you more seriously. As it is, your postings elude that you probably have some mental problems, and are incapable of engaging in normal discussions like others here do. I can't help you with the former, but am willing to help you with the latter. But you have to be willing to learn posting etiquette.

  39. Re:PROOF you're wrong by arth1 · · Score: 1

    It's known protectionvs. reverse engineering: PROOF: "Packing an executable file is a way of compressing executable code firstly to minimize filesizes, but often it is also used to complicate the reverse engineering process"

    Also known as "security through obscurity", as I said in my post.

  40. Re:Others prove you wrong easily for me by arth1 · · Score: 1

    that proof's SO RIGHT you had to try "downmod hide it"

    Um, no. I only have one account, and don't post as an anonymous coward, so I don't get to downmod anyone in this thread. It's others that downmodded your post, likely because of your incoherent ramblings being, well, "so wrong".

    YOU DEFINITELY CAN'T, troll!

    Think about it: Are everybody who disagrees with you trolls, or could it be that you are a smidgeon paranoid?