Researchers Seek Help In Solving DuQu Mystery Language
An anonymous reader writes "DuQu, the malicious code that followed in the wake of the infamous Stuxnet code, has been analyzed nearly as much as its predecessor. But one part of the code remains a mystery, and researchers are asking programmers for help in solving it. The mystery concerns an essential component of the malware that communicates with command-and-control servers and has the ability to download additional payload modules and execute them on infected machines."
NSA Property, Keep Out.
The mystery code isn't really much of a mystery- it's just how Duqu communicates with the sith lord.
"That's the way to do it" - Punch
hmmm yes, your average script kiddie can totally create a custom language and totally stump the entire computing universe. my daughter did it last week while looking for proxies to get around my facebook ban. no government needed!
Learned INTERCAL from Guy Steele in the Comparative Languages course at CMU.
An imperfect plan executed violently is far superior to a perfect plan. -- George Patton
It's in ROT-13 Pig Latin.
I'll take my paycheck in gum, Trident Layers to be specific.
I kid, I kid...
Why? Its entirely possible that this snippet of code is a piece of in-line assembly. It may have started out coming from some higher level language, but been tweaked or completely rewritten in assembly and its origin is no longer recognizable.
Have gnu, will travel.
...and here's me thinking that compiled code has already been reduced to machine code.
Or even self modifying assembly....
That would be a real pisser to figure out.
Check your premises.
Who would be insane enough to write OO code in assembly?
If I have been able to see further than others, it is because I bought a pair of binoculars.
that's just a guess
but the level these guys are working at here, something well above script kiddie and slightly below elder neckbeard, it seems entirely plausible to me
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
My dad did. Maybe he's behind this. But he was a first generation programmer. Trying to get him to move on from assembly was a pointless endeavor.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Objective-Brainfuck or Brainfuck with Classes
If you do what you always did, you get what you always got.
I'm sure he did write assembly. But Object Oriented assembly? Come on now...
> from assembly was a pointerless endeavor.
ftfy.
help me i've cloned myself and can't remember which one I am
By the sounds of the article they haven't ruled that out. They're just checking to see if it could be a higher level language that would help identify the writer(s).
Any sucker can tell it was written in Linda.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
No seriously. When OO became a fad he figured out how to build up macros to support an OO model.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Right. That limits the suspects to... um... just about anyone who took a second-year computer science course above the level of "See Spot Run".
Could it be possible that the authors came up with their own language, and/or compiler?
It's out there. I remember reading about a engine control algorithm running on a 68HC16 microcontroller in Circuit Cellar Ink back in the mid-90s, and it was written in object oriented assembler. It caught me so off-guard that I still remember it almost 20 years later.
Program Intellivision!
That's beside the point. Who the fuck cares what is the imaginary high level language this stuff was written in? They are analyzing the somewhat annotated disassembly anyway. To me it looks like it may be the output from some PLC environment. Perhaps it's CoDeSys output. It doesn't matter anyway, there are no tools that will take this and restore the source. It's not like you need something uber-fancy anyway to help with what's the key here: figuring out what the code does.
A successful API design takes a mixture of software design and pedagogy.
Actually, I'll reverse the joke and gun for +1 Insightful.
Ready?
Literally why does this story even exist? This code takes out nuclear reactors and "researchers ask programmers for help"? Really?! (Does "Ask" imply they want the answer FREE?!)
So the Dept of Homeland Security is busy helping yank down file share sites and they have no time for this?
Ladies and Gentlemen and AI's, this is your answer to why we're spiralling into a mess.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
Why's that so hard to believe? I've been programming almost exclusively in object-oriented languages for 15 years now, give or take. Chances are no matter what language I write in, whatever I write is going to include many object-oriented features. If I was working with a complex assembly project, a type system would be one of the first things I came up with. From there, it's not much of a stretch to imagine you'd want to associate data with instances of that type, and functions that can operate on them. Bam! Object-oriented assembly. Private members and inheritance are just small steps from there.
<xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
DHS, conspiracy theories aside, is likely conducting their own investigation into DuQu, the details of which are unlikely to be shared with the general public. TFA is about Kaspersky Labs, an independently owned security firm, asking for help from the general public.
My powers have doubled since the last time we met, DuQu!
I did when I was college.
I *wish* my day-to-day job was OO asm development. I've done a fair amount of x86 OO programming, and it is quite easy (if you're fluent in asm).
Or just consider that the Borland Turbo Assembler did have "native" OO support, and that there are a ton of MASM OO macros than can be somewhat easily adapted to any modern assembler.
My guess is that it's probably erlang. It fits all the descriptions of how erlang works. Erlang is used in all sorts of realtime systems, it wouldn't be a stretch to see that it was used in a virus library. Someone that is in the Telecom or Network infrastructure industry might be familiar with Erlang and that type of person might also be the same type of person that knows enough about networks and network vunerabilities to architect a framework for virus distribution.
Literally why does this story even exist? This code takes out nuclear reactors and "researchers ask programmers for help"? Really?! (Does "Ask" imply they want the answer FREE?!) So the Dept of Homeland Security is busy helping yank down file share sites and they have no time for this?
Why would DHS have anything to do with this? DuQu so far hasn't done anything to American interests (in fact, so far as I can tell, it has helped them). The people in TFA looking at the code are Kaspersky: a Russian anti-virus company. They don't even recognize the language the code is written in, much less how it works, and they are wondering if anyone of the billions of people on the Internet knows (specifically, if it is a a specialized language used in some niche industry or something). If no one does, they can be pretty sure it was a custom created language, and proceed accordingly. They aren't asking for someone to do their work for them: they are saying "hey, this look like anything anyone knows?" DHS might be looking at it too, if they didn't create it: but the story has absolutely nothing whatsoever to do with them, in any way. Not even the same continent.
Also, I don't know where you got "takes out nuclear reactors." Stuxnet did damage to nuclear centrifuges. AFAICT all DuQu seems to be doing is stealing data (private keys, actually). Bad for people who get infected, yes. Not like it is causing nuclear meltdowns or something.
"None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
That clearly looks like perl to me.
They don't know the language? Why are they concerned with the language it was written in? What if it was written in C++ or C on ARM and cross compiled for x86, would it look funky like that? Or is it possible it's compiled in TASM and they are actually looking at a 16-bit code segment where most of them have never seen less than 32-bit code?
I am Bennett Haselton! I am Bennett Haselton!
"Be sure to drink your Ovaltine."
Like, who cares? This code is like Sputnik in orbit above your country - it sets the new legal framework that its ok to mess with industrial computers world wide and get to hide behind state support.
Domestic spying is now "Benign Information Gathering"
I use to do that for fun 15yr ago, it is not that hard. There are still some old tutorials on this floating on the net:
http://webcache.googleusercontent.com/search?q=cache:TIHCSoP4378J:yanaware.com/com4me/createcom.php-author%3DErnie%2520MURPHY%26mail%3Dernie%40surfree.com%26url%3Dhttp---here.is-ComInAsm%26idTute%3D39.htm+masm+COM+component&cd=2&hl=en&ct=clnk&gl=us&client=firefox-a
Jehovah be praised, Oracle was not selected
The researchers care what high level language was used. It could help identify who wrote it since it's likely that the language has a small user base.
Good point, although compared to mainstream tools like MSVC, almost everything has a "small" user base.
A successful API design takes a mixture of software design and pedagogy.
I have also. TASM (Borland's Turbo Assembler) had support for it. The assembler would manage a vtable for you among other things. I've also programmed OO in Korn shell. Why OO in assembler or ksh? Because it was the right tool for the job and OO principals can be used anywhere they make sense and help the effort. It's not as far out there as you make it seem.
"Happy families are all alike; every unhappy family is unhappy in its own way." -- Anna Karenina by Leo Tolstoy
I only took a glance so don't blame me if I am wrong, but it looks like the SCADA variant
More info available at http://en.wikipedia.org/wiki/SCADA
Muchas Gracias, Señor Edward Snowden !
DHS, conspiracy theories aside, is likely conducting their own investigation into DuQu
No need for that unless they snuffed the original developer before securing the relevant docs.~
Upward mobility is a slippery slope - the higher you climb the more you show your ass.
Because it annoys the PhDs, that's why they care.
Think about it. We use high-level languages because it expresses an idea in fewer words. If I call a TextBox control in C#, that's simpler than the equivalent in Assembly. These people, of course, are annoyed, because without knowing what the higher language was (assuming there was one used), it will take their minds years to analyze what exactly the code is dong; whereas if they knew what the higher language was, they could create a decompiler, and have something approaching the original source code in a few weeks / months.
I am John Hurt.
This looks a lot like "Spin" from a company called parallax. It's a proprietary programming language used to control their pic and hyperpic processors.
No, no DuQu does not, and has never attempted to, 'take out nuclear reactors.' That was a different piece of malware.
It would benefit us all - as well as yourself - if before you commented you educated yourself on the subject of the submitted story.
I don't understand why they are avoiding this option like the plague. C'mon... practically every compiler compiles its language into assembly and runs that through an assembler for final object code creation. (tho some will then run THAT through an optimizer etc) There's absolutely no reason for them to insist it can't be written in native assembler. I wrote many things for the 6502 that way - if you want it fast and small, that's the way to go.
And sorry, if they have to reverse it back into C++ or some other higher level language to figure out what it does, they're idiots, no better than script kiddies. I don't care of they have ten CS masters degrees. Assembly just takes a little more time to work out, it's not like it's encrypted and they don't have the key.
None of this should come as a surprise to anyone. The authors are black-hats. They make their living on buffer overflows and bug exploitation, they damn well know how to code in assembly, and specifically how to tear it apart and analyze it in fine detail. Why can't these "experts" do that?
I work for the Department of Redundancy Department.
DHS, conspiracy theories aside, is likely conducting their own investigation into DuQu
No need for that unless they snuffed the original developer before securing the relevant docs.~
Hey, everyone makes mistakes. That drone was supposed to have been loaded with tranquilizer darts, not Hellfires. Boy, there were some red faces in the office when we found out what happened.
You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
I'm sure he did write assembly. But Object Oriented assembly?
I'm incredulous that you are incredulous. I thought I saw a book about that somewhere. So I walked over to my tall stack of random language books and there it is:
Object-Oriented Assembly Language, Len Dorfman, McGraw-Hill, 1990
I hereby thwack you upside the head.
My other car is a 1984 Nark Avenger.
Not the whole app. According to TFA, it was written in C++. They even know which implementation. But this particular function (subroutine, method, whatever) appears not to be written in that.
Its something different. Or someone banged out some ASM by hand. If they can figure out what language this routine was written in, they can narrow down the list of possible authors*.
* Come on now. You didn't think the NSA wasn't scraping all the developers' resumes from LinkedIn to build a skill set database to figure out who wrote what, did you?
Have gnu, will travel.
Creating a "decompiler" isn't exactly trivial. The types of analyses you have to do on machine code compiled with today's optimizing compilers are fairly generic, they will give you some higher-level representation of the code no matter what was the underlying language. Those tools recognize certain patterns to provide even higher level information, but at a basic level they pretty much repeat what a compiler would do: there's data flow and control flow analysis, and a whole lot of inference based on those. I'm sure there's a whole lot of techniques and tricks published...
A successful API design takes a mixture of software design and pedagogy.
I think most are missing the point. They probably already know what it does (if they don't, given the effort they have expended, then they are boobs). What they want to do is find what the language was *in order to track down the authors* on the premise that it was some strange language only used in a few places and if they find it, they can narrow the range of likely candidates .
Ok, you and someone on the article both said the same thing, with absolutely nothing to back it up. Care to elaborate? I'm particularly curious how a .NET bytecode executable ends up as baroque machine code as opposed to CLI bytecode like most .NET languages.
Program Intellivision!
One of the comments on the page already said that.
I remember I disassembled Forth a lot of years ago.
It comes in 2 flavours: interpreted and compiled.
It relies on RPN heavily.
It's a very compact language, both in source and in compiled form.
You extend the language by using "words", and it's like OOP.
It's one of the weirdest language I ever used.
Yeah it looks like the output from a PLC development kit, the original code might be written in STL http://en.wikipedia.org/wiki/Structured_text.
Actually looks like the result of a macro assembler module. The MOV functions gives it away. The only reason for doing that is to make it faster or to reduce the code size, not necessarily to obfuscate. The programmer is old school.
Don't be apathetic. Procrastinate!
... it's Java!
Defining Statistics and Social Research
Two points:
1.) No one said it was trivial, but for a capable researcher who has spent a fair portion of their life dealing with decompilers (and writing a few of their own), they probably have an idea how to do it fairly quickly.
2.) While it's possible to walk-through reams of Assembly code, it's painful. Extremely painful. A 100-line 'function' in Assembly code will cause most programmers to pause, and a 100,000 lines of Assembly code ('functions' and all) will break even the most vigilant of programmers. The human mind just does not like holding 100,000 variables actively in working memory (whether it can actually do that is open to some debate). 100,000 lines in Assembly is 2,000 lines in a higher-level languages, segmented properly so the human brain doesn't crash.
I am John Hurt.
Just for reference, I have a PhD, I work on compilers and runtime systems. People like me program in assembly, we program in HLLs, whatever works. We will (for actual example) pore over 88 pages of assembly-language output from a compiler in order to find the register allocation bug. Other people I've worked with on compilers (some with PhDs, some not) do things like diagnosing a C optimizer bug based only on the C++ input to cfront (later run through a C compiler) and the busted output, or, after pondering the full set of symptoms, (correctly) diagnose a "compiler bug" as a lack of thermal compound between CPU and heat sink (over the phone!)
You get good at this stuff when you do it full-time, and a compiler/debugger person who had come across this language in the past would probably ID it confidently. It's probably NOT something that passed through a C compiler, because those still have calling conventions, structure layout conventions, and other funny idioms that would still be obvious to someone familiar with C compilers on that platform (which presumably the guys at Kaspersky are).
teach me obi wan that stuff sounds cool.