Avast Launches Open-Source Decompiler For Machine Code (techspot.com)
Greg Synek reports via TechSpot: To help with the reverse engineering of malware, Avast has released an open-source version of its machine-code decompiler, RetDec, that has been under development for over seven years. RetDec supports a variety of architectures aside from those used on traditional desktops including ARM, PIC32, PowerPC and MIPS. As Internet of Things devices proliferate throughout our homes and inside private businesses, being able to effectively analyze the code running on all of these new devices becomes a necessity to ensure security. In addition to the open-source version found on GitHub, RetDec is also being provided as a web service.
Simply upload a supported executable or machine code and get a reasonably rebuilt version of the source code. It is not possible to retrieve the exact original code of any executable compiled to machine code but obtaining a working or almost working copy of equivalent code can greatly expedite the reverse engineering of software. For any curious developers out there, a REST API is also provided to allow third-party applications to use the decompilation service. A plugin for IDA disassembler is also available for those experienced with decompiling software.
Simply upload a supported executable or machine code and get a reasonably rebuilt version of the source code. It is not possible to retrieve the exact original code of any executable compiled to machine code but obtaining a working or almost working copy of equivalent code can greatly expedite the reverse engineering of software. For any curious developers out there, a REST API is also provided to allow third-party applications to use the decompilation service. A plugin for IDA disassembler is also available for those experienced with decompiling software.
Get over yourself and stop complaining about things being given away to you for free. It's a shame that people complain about open source software when it's being given to them for free. The decompiler could have never been released to the public or released as a closed source program. Your complaint about the architectures it supports or doesn't support totally rings hollow.
...but no x86_64.
Or any other 64 bit arch.
Back in the late 70's I loaded TRS-80 games into my debugger, it also let me dump the results into a text file. Finding things like "jump to label_foo" helped, but was not the be-all end-all.
The killer was when I debugged my TRS-80 BASIC interpreter in ROM. You'd have some 3 byte instruction, "jump here", then somewhere else you'd have a 3 byte instruction "jump into the middle of this 3 byte instruction to do something completely different". My understanding is Bill did those, but for all the evil he did I have major respect for his coding abilities.
I beat a lot of games running my debugger on them. 90% sure it was called TRS-MON, but wouldn't bet my retirement on it.
Perhaps if you built a fingerprint based on the structure of calls across functions, you could map it back to source code from github. Not that malware is generally posted to github, but I'd be surprised if they didn't use a TON of third_party libraries, and factoring all of those out would make what's left easier to understand and also let you focus better.
It's accurate. According to retdec.com, RetDec only supports 32bit architectures.
unfortunately it de-compiles the machine code to perl.
Nullius in verba
"no mention in the article of what the decompiler actually decompiles to .."
According to https://github.com/avast-tl/retdec:
Output in two high-level languages: C and a Python-like language.
The thing is open source, if you really want x86-64, grab the code and write something :)
The thing is open source, if you really want x86-64, grab the code and write something :)
x86 is hard to decompile. It doesn't have fixed length instructions, so it is difficult to figure out where opcodes begin and end. It is even possible to write code that can execute two different sequences of instructions by offsetting the instruction pointer by a byte. I don't think any decompiler could deobfusticate that.
x86 is hard to decompile. It doesn't have fixed length instructions, so it is difficult to figure out where opcodes begin and end. It is even possible to write code that can execute two different sequences of instructions by offsetting the instruction pointer by a byte. I don't think any decompiler could deobfusticate that.
The simple code dumper that comes with garden variety debugger won't easily deobfuscate that. (You need to manually ask the debugger to start dumping from the 2 overlapping point).
That why, the best decompilers available in the 90s used some sort of virtual machine to follow through the execution flow, and be able to distinguish such kind of "frame shifts" (that's actually a biology term, I've forgotten what the proper CS term is), and also be able to understand a bit of self-modifying code.
(Basically, the decompiler will notice that various part of the code make calls into the same region but at an odd offset, and will automatically try dumping with from each overlapping point)
Makes it also possible to put actually-useful label/names into variable. (call something "sound_frequency" instead of "var184" because by following the data flow, the decompiler release that this is the parameter the is output to the PC-Speaker tone generator).
Sourcer by V-Com was one such good decompiler.
(I managed to learn quite a ton of tricks like PCM play on the PC Speaker, tweaked graphical modes, etc. simply by using sr to inspect interesting executables.
I even manage to desinfect a cracked game that was saddly being distributed infected with some virus)
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
How does this project compare to the existing machine code compilers, namely Valgrind's VEX library and Qemu's tiny code generator (https://wiki.qemu.org/Documentation/TCG)?