Mystery of Duqu Programming Language Solved
wiredmikey writes "Earlier this month, researchers from Kaspersky Lab reached out to the security and programming community in an effort to help solve a mystery related to 'Duqu,' the Trojan often referred to as 'Son of Stuxnet,' which surfaced in October 2010. The mystery rested in a section of code written an unknown programming language and used in the Duqu Framework, a portion of the Payload DLL used by the Trojan to interact with Command & Control (C&C) servers after the malware infected system. Less than two weeks later, Kaspersky Lab experts now say with a high degree of certainty that the Duqu framework was written using a custom object-oriented extension to C, generally called 'OO C' and compiled with Microsoft Visual Studio Compiler 2008 (MSVC 2008) with special options for optimizing code size and inline expansion."
To tag along - it's hard to tell data from code, and it helps the decompiling app to detect what is code vs. data if it knows which compiler created it.
It looks like the original blog used IDA Pro, which has library signatures for different compilers. It can identify functions and auto-comment the code, making disassembly easier. Auto-identify stack variables and keep track of them through lots of PUSH and POP and RETURN X statements, it's quite powerful.
In this case, IDA probably gave a lot of erroneous warnings or disassembled data or refused to disassemble code, requiring lots of manual work. The classes apparently were done inconsistently, making it hard to even write a plug-in to automatically detect them (scripts exist to identify MSVC objects through their RTTI properties, and do a decent job identifying non-RTTI classes, but this would not work with this code).
http://www.hex-rays.com/products/ida/index.shtml
When reverse engineering, and your tool basically says "WTF do I do with this?" it's one of those moments where you want to know how the attacker made it.
Is it hand-rolled? Or a new attack creation kit that script kiddies can cobble something together using?
And "unknown language" was not a really good way to describe it. "Unrecognized output" would have been better. The assumption is that a language like C would compile to a C-like syntax, C++ would do things differently. But it could have been just C++ with an unknown compiler.