Mystery of Duqu Programming Language Solved
wiredmikey writes "Earlier this month, researchers from Kaspersky Lab reached out to the security and programming community in an effort to help solve a mystery related to 'Duqu,' the Trojan often referred to as 'Son of Stuxnet,' which surfaced in October 2010. The mystery rested in a section of code written an unknown programming language and used in the Duqu Framework, a portion of the Payload DLL used by the Trojan to interact with Command & Control (C&C) servers after the malware infected system. Less than two weeks later, Kaspersky Lab experts now say with a high degree of certainty that the Duqu framework was written using a custom object-oriented extension to C, generally called 'OO C' and compiled with Microsoft Visual Studio Compiler 2008 (MSVC 2008) with special options for optimizing code size and inline expansion."
I guess allens don't exist.
A link to the actual code snippet would've been nice; I'd love to see the structure and logic behind it.
they may have learn MASM to avoid detection.
How did they deduce it was an unknown programming language? By looking at the compiled machine code? How could they tell this wasn't just regular C?
A well publicized article featuring Microsoft Development products of all things, I think they should use that PR in their Microsoft Visual Studio Ads...
"Enjoy what you're doing! If it becomes drudgery, you're doing it wrong!" - Jim Butterfield
If you can disassemble it then who cares whether it was written in OO C , C++ or Logo? I don't see why it mattered so much. Just follow the assembler.
Objective C but then for the MS platform?
Here is an older post about it: http://lambda-the-ultimate.org/node/4476
FTFA:
Why did the authors of Duqu use OO C? While there is no easy explanation why OO C was used instead of C++ for the Duqu Framework, Kaspersky experts say there are two reasonable causes that support its use [More control over the code & Extreme portability]. These two reasons indicate that the code was written by a team of experienced ‘old-school’ developers
Why OO C? Because it worked, because they new how to use it, because they knew it would throw Kaspersky for a loop, because they thought it was cool. There are many many reasons and they do not all have to be logical.
Kaspersky experts might want to consider that the programming wheel of life may have turned and that what was once old-school is now new-school. Whose to say that the under-estimated script-kiddies cannot grow up to be formidable adults with a whole new bag of tricks?
More like a lightweight open source framework.
Just means the Aliens made MSVC 2008.
Then what country were they from?
For O'Reilly's "Mastering Duqu"?
Why does this matter? If it is a compiled program it is just a bunch of instructions. If the OS lets the instructions to run it doesn't much matter what compiler/language was used other than how efficiently it will do the crap it is told too.
The bizarre claims by Kaspersky about how Duqu's authors had invented their own language were patently idiotic
It could well be that the Duqu authors wrote a macro language framework on top of C, for tighter code generation, greater control, the ability to easily add trace statements and/or experimental code during development runs, make it more difficult to trace to a specific commercial compiler, etc.
Agreed, but the article does seem to indicate ooc is an existing, lightweight object oriented extension to C that the programmers compiled themselves. I didn't get the impression they think the programmers threw something together on their own.
Who keeps Atlantis off the maps?
Who keeps the Martians under wraps?
We Do, We Do...
"Happy families are all alike; every unhappy family is unhappy in its own way." -- Anna Karenina by Leo Tolstoy
It was too consistent to be compiler intrinsics, but not consistent enough to be straight assembly. That's the impression I got from the original blog post.
No question it would have been possible, but given the rest of the code was compiled in MSVC it made sense that some sort of macro, framework, toolkit, or something was in between the course and the output.
Anyone who uses C for a significant project writes their own framework on top of C. Sometimes those frameworks grow into gargantuan monstrosities, like glib, gtk, of libevent 2.0.
Smart programmers write minimalist frameworks that do simple, straight-forward things: simple object containers, simple event loops, etc, that can be reused without the cost of extra baggage. Extra baggage just gets in the way, because for highly specialized applications you will inevitably need to hack and refactor your libraries. A good library is one that can be hacked on, not one that tries to keep you from hacking on it by trying to do everything.
Smarter than you think. I remember reading somewhere that US radio controllers in WW-II used a native american language to communicate with each other. No amount of analysis will give you any insight, if the other party is careful to not use any trails. To translate on language into another mechanically requires deep knowledge of both the languages.
If you rolled your own language with its own grammar, you can be secure in the fact that *even* deep analysis will not yield any clues, not atleast by the current technology. I am not sure such a thing can be even done by a turing machine. People with better knowledge of it are welcome to correct me If I am wrong. All the current technology is concentrated on modifying bits for security, but if you do on a sufficiently high level(aka another language) there is no way to crack it.
This case however has a achilles heel; you can still modify the binary and see what results would be by running it. After a sufficient number of trials, you should be able to decode it.
You will never have experience until after you needed it.
This is +1, Normal? Come on. This is a worthless comment, a waste of space.
I don't think the article implied that this framework was pre-existing or not. They don't know. It could have been a custom written framework to help the author(s) specifically build this virus. Rudimentary OO C frameworks are easy enough to write from scratch.
What they saw was a regular pattern common across many different functions in the binary, which suggested that the same source was used to create it throughout the code. And if the source code is that regular, either the author is a serial copy-paster (which implies bad coder, which the rest of the code provides evidence that he isn't), or the more likely answer is that macros were used to produce it.
They are probably hoping it's a particular OO C framework that's been used elsewhere, but not far and wide. They might be able to fingerprint the libraries that are in use around the world, and further narrow down the list of suspects. They might be able to compare this to different published software package fingerprints to see if any are a potential match. Maybe they'll find out it was downloaded from sourceforge, which leaves a short list of only a few thousand people or IP addresses who've ever downloaded it before (depending on what kinds of logs the servers and ISPs kept.)
The chances are really good that it won't lead straight to a specific Joe Ocaml of 123 Main Street, Fairview, Connecticut. The chances are high that it won't give them anything useful at all. But if there's even a 1% chance it might tell them exactly who wrote it? That's totally worth it to pursue the leads, especially since it's so little work to run that small bit of investigation.
John