Morphing Code to Prevent Reverse Engineering?
ptolemu writes "Cringely's latest article discusses a new obfuscation technique currently being researched called PSCP (Program State Code Protection). An informative read that concludes with some interesting insight on the software giants that heavily depend on this kind of technology."
delete all the white space, and comment in Hungarian
"If you think you have things under control, you're not going fast enough." --Mario Andretti
I've done mostly server-side work where:
- the jar files were secure because they were on the server and
- bytecode optimization and jar size was the least of our problems
Obfuscation seems to be useful only for client-side Java applications that contains super-secret valuable algorithms. I mean, who cares if somebody decompiles your code to see how you did sortable JTables or whatever?
The Army reading list
This technique might be interesting for stopping people from stealing your closed source code, but as far as security goes it's pretty much worthless. 99% of the vulnerabilities in MS's code were found before their code was leaked, and if you believe them, even the major exploit found after it was leaked had more to do with bad code than someone finding the existing problem by reading the code.
Don't blame me; I'm never given mod points.
The code I write is obfuscated enough as it is. I'm my own anti-piracy mechanism.
The Blaster Master Fighting for Truth, Justice, and Evil Pie since 1979
Form of, illegible code.
Shape of, encrypted executables.
Not sure where the monkey fits into all of this.
"Have you ever thought about just turning off the TV, sitting down with your kids, and hitting them?"
It just won't work. Any code that can be run can be reverse engineered. So-called sophisticated coding techniques only lead to unreadable code..
The problem with Microsoft's code being readable is that there are only Microsoft people reading it. Half the time they wouldn't see the forest for the trees (since they are so involved with it all the time anyway), and the other half they would miss things that other people might pick up.
With Open Source, *everyone* gets to look at the code, so there any many eyes, and the bugs get shallower.
libertarianswag.com
The medical profession deals with viruses by identifying our weaknesses, and exposing them to the viruses (the ultimate "reverse engineering"?). If there were a biological DMCA, developing vaccines would certainly violate it on the illegality of "hacking into the body".
With software, though, people still insist on trying hide and pretend as if there were no viruses out there and that we would be impervious to them.
Can we finally just open all of our code so we can vaccinate it against all these exploits?
This looks vaguely like self-modifying code, like back in the old days of copy protection.
.net runtime engine (or maybe it's loaded and spews bytecode to the runtime), then it can be removed...or the output intercepted. .
The thing I don't understand about the article (and how it describes the PSCP process) is this: how will this make reverse engineering more difficult?
When you're starting to crack something, you work backwards from system calls, library calls, and known behaviors. "Known behaviors" are, well, patterns of code that people (or compilers) use to do things. Anyone good at low-level stuff can probably identify the compiler used to build the code. Likewise, if you think about something enough, you can probably figure out three or four ways to do something, and look for that pattern in the code.
PSCP prevents this...how? By making this process happens as the program runs? How else do you reverse engineer something?
Anyway, it sounds like this thing sits right before the
What am I not getting here?
Just like all the hubbub over proprietary signal encryption to "protect" digital audio streams, all you need here would be the CPU-equivalent of the old Analog Out jack.
Break it down to the Universal Turing Machine and tape analogy. The program code is the tape, and the state of the machine is in the tape-executing device. If the tape were to somehow morph itself dynamically, and yet execute properly by morphing to a well-designed program at the moment it is read for execution, all you have to do is to watch the read/write head of the UTM itself.
If they find ways to monkey around with bytecodes so that they're shifted around between disk and executor, just run it with a special version of the executor. Shouldn't be hard... the standard for what the unencrypted bytecodes are capable of accomplishing are standardized. Execute the code once, and take "notes" of what is being accomplished. Run through a code coverage test suite, even a crude black-box analysis, and you should get an unscrambled bytecode equivalent.
It just doesn't make sense. If obfuscation, i.e. obscurity, is your only security, it is no security at all.
[
Cringely has really outdone himself that time. I can't even follow this poorly thought out mess. He seems to totally misunderstand every single concept he touches on.
Compilation to bytecode and an "interpreted language" are NOT THE SAME THING. Both the CLR and a compiled java class are effectively machine code for a machine that doesn't exist. These abstract machines have machine code that reveal *MORE* information to a disassembler/reverse engineer than, say, x86 or PPC assembly, but it is still far, far from being code. This is reaction one that I have. The rest of the article is so confused I don't even know how to respond to it.
Reverse engineering is good, and each coder should try it. This is the way to learn how someone else code is working, when that code is closed source. I don't think you can fool experienced assembler code with messing code around.
Think about R.E. like about game. It's like cracking, but it's good. And it's about creating, not about destroying.
how come for every new technology that comes out that is suppose to "secure" us i can think of a way it can be used "malicously"
.NET is so "young" can't you see this used for creating a perpetual virus that can't be removed? you wouldn't even be able to ID the bug that caused this to virus to run itself.
ex. I write YourDoom.A and i write it using this new code morphing obfuscator. how exactly are Anti-virus programs 1. suppose to remove this? 2. identify this?
Given the numberous amount of VB/Outlook bugs and considering that
When a computer program runs, the computer can follow millions of paths to get the job done. We leverage those millions of paths and transform them into billions of paths instead
Millions of paths implies some sort of jump instruction, whether or not that translates to millions of function calls, i don't know. assume it does. then instead of making millions of function calls, your making billions of function calls. Going from millions to billions is a large step, bigger than just swapping an "m" for a "b" in marketingspeak. So are they planning on passing this performance hit to the legitimate consumer? No thanks, I'll take my Free source code and like it.
Would code that was changing itself while running (polymorphic) be nailed by a heuristically-scanning anti-virus program? I would hate to de3velop something, and then all of a sudden get seriously bad press for releasing what seems to act like a virus. Just food for thought.
So legitimate software is going to take on the functionality that virus software has been using for years? And companies are patenting these techniques as if they are somehow new? Virus writers are the true innovators here. They pioneered the infamous Mutation Engine. I would consider off the shelf software that used those techniques innovative, in fact I find it creepy. Honestly, if the time wasted trying to protect so-called intellectual property was used instead to invent things to simplify our lives, we (as in humanity) would be better off.
Yet, I have no doubt that if someone came up to them and warned them about the dangers of IP theft and showed them this solution, they would bite.
If they really wanted to do maximum damage to their competition they should have just released the source code and hoped their competitors tried to used that as guidance.
There are probably some rare instances when a specialized software technique is developed and you want to keep its implementation specifics secret. I have yet to run into a single instance of this after many years in the industry.
I wonder if they've seen the proof of the impossibility of obfuscating programs?
There is nothing new under the sun. These Java and .NET obfuscators are just the same old anti-SoftICE sections, which were just the same old Amiga/Atari copylocks, which were just the same Spectrum/C64 turboloaders, and so on.
Every single one of these is broken. Almost all good programmers are capable of deciphering the standardised, retail-boxed algorithm used for the obfuscation, and can easily un-obfuscate it. Are all the Java variables named "a"? Diddums! You don't have a Java decompiler with the option to ignore that simple tweak.
All that matters is:
1) How important is the code behind the obfuscation?
2) How much time and effort is the reverse engineer willing to spend?
If you use a company's retail-box obfuscator, anyone with the "'Brand X obfuscator' deobfuscator v1.0" can get straight at your code. It's a technological arms race, nothing more.
Does my bum look big in this?
I don't love microsoft, but I think this article makes several claims without backing them up or offering any explanation as to their merits. Such as:
And "You can write a program in C# or Visual Basic.NET." while factually accurate, ignores Delphi.NET, C++ managed code using the CRL, and other implementations of the CRL (COBOL, etc).
I think the basic premise of the article, where if someone is using your objects it is obviously a bad thing/security breach, is flawed. If you need to secure your objects, SECURE them! Seal them, see who is calling you, etc.
Lastly, As shown by previous posts, Obfuscation is not the end-all panacea to security. In my opinion, it's barely a detour. Otherwise, Open Source literally could not be secure.
If you blog it...
Seems to me that stuff like this would make it quite difficult to debug once an application has been released - also, how would things like a memory dump on application crash help to debug anything here?
Find a job you like and you will never work a day in your life.
It sounds to me like the author of the article is talking about two completely different issues. The first is code decompilation and static obfuscation. The second is about runtime obfuscation.
In theory, if you don't run the binary you have, you don't need to worry about it modifying itself. The same techniques that work on obfuscated byte code now should work on the the binary. Now if you were trying to reverse engineer a program by running it and tracing it, that's where PSCP seems like it would help.
If it changes how it executes every time, it sounds like it would be a fantastic way to introduce unreproducable bugs.
I'm sure this would make QA testing a nightmare.
"He's lost in a 'floyd hole"
Once the virus writers get a hold of this viruses will be much harder to catch, unless anti-virus writers start looking more for virus-like activity.
Of course, virus writers have been using this since the early 1990s. One particular virus called Ontario III (there might be others before it) used this trick. An interesting part from the virus writeup: "The Ontario III virus uses a very complex form of encryption with no more than two bytes remaining constant in replicated samples."
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
Okay: for argument's sake we'll say that PSCP works like a charm to prevent nasty evil crackers from debugging my program effectively.
.NET program's IL instructions are JITted into machine code at runtime, thereby making it pointless to modify the IL -- unless these people are invalidating the JITted code every time the IL code changes (is that even allowed?) or providing a translator between IL and machine code that inserts code-morphing instructions into the OUTPUT machine code (which I seriously doubt).
.NET assembly which is bursting with code, how does PSCP prevent him from taking the assembly apart and dissecting? Answer: PSCP *doesn't* provide any such protection.
.NET assembly. I may need to hunt back and forth for a little while to find the specific place I'm interested in, but I can't imagine that PSCP is able to change offsets or locations of instructions by very much.
We'll ignore the obvious problem presented by the fact that your
We'll ignore the fact that instructions which modify other code are generally very easy to spot -- because they must refer to regions of a program's address space where code resides -- and it should be easy to find these code morphing instructions and turn them into no-ops.
We'll set both of those tricky issues aside and focus on the crux of the matter. How does this PSCP protect the program *before* it starts running? When the cracker gets his hands on my juicy
So, the school of crackers that likes to use a debugger to deduce program behavior may find themselves having trouble. But in the worst case, all I need to do is run the morphing code in a debugger, record the location of the program counter at the point in the program's execution in which I'm interested, and then consult the corresponding section of the program code that resides in the original
Think about it: if PSCP wanted to change the location of a jump target, for example, it would need to track down every other instruction in the program that jumped to that instruction, and modify the jump to point to the new location of the jump target.
Cloakware also has some nice obfuscation technologies
Obfuscation does also provide a speed bump to those attempting to disassemble your code. Without obfuscation, anybody with a casual interest could just glance at your code using javap, etc. Retroguard fits saemlessly enough into the build process that adding a simple level of protection to the code is usually simple and transparent.
-----
Free P2P Backup, Windows & Linux
Examples of Stupid Obfuscator Tricks include:
- Scrambling exception ranges so they don't nest.
- Inserting non-structured GOTOs
- Inserting never-executed exits from synchronized blocks
There are others, these are just the ones that I recall. A compiler (static or JIT, it does not matter) must deal with all of these.There are two outs that I know of. One is to only use interpreted code and morph it on the fly (still seems vulnerable to an observant interpreter, but perhaps the amount of necessary observations can be made extravagantly large), the other is to require use of a "trusted" compiler (which, in turn, requires use of a "trusted" OS to prevent substitution of an untrusted compiler, which in turn requires "trusted" hardware to prevent substitution of an untrusted OS).
Like Ultra-Wide-Band networking and enterprise XML integration, this column fits a Cringely mold of writing an entire article about the business plan of one small company most people haven't heard of, and passing it off as an important insight about the IT industry as a whole. It works for the most part because there are a lot of neat-sounding business plans out there. Every start-up company in the world has a story about how their vision, fully realized, would shake up the entire industry. It makes for great column-fodder, but provides poor analyses.
If you read the whole column here twice, you immediately become aware of the fact that Cringely's entire "argument" turns on the idea that security rests on keeping source code secret. Because "interpeted" code "always" discloses code secrets, "interpreted" platforms like .NET will require the intellectual property wrapped up in schemes like PSCP. Therefore, the "inventors" or PSCP hold an important position on the chess-board of the entire IT industry. Microsoft and Sun will launch bidding wars to ensure they control the PSCP IP.
Of course this is just crazy-talk. Just for a moment, leave aside the argument that something like PSCP can really prevent reverse engineering. In the post-PSCP world, all security rests in a distributed repository of millions of lines of source code "locked up" in an organization that spans 45 buildings and untold tens of thousands of people in Redmond. You can't keep source code secret. Closed source is a speed-bump to dedicated attackers, who will break into networks, find corrupt insiders, or even get janitor temp positions in order to get the code.
Nobody working in security seriously believes that the source code for Windows 2000 wasn't floating around the computer underground years before the most recent disclosure. 'Twas ever thus: most of the SunOS and Solaris exploits that powered attackers in the mid-90's were derived from stolen Sun source code. Stolen source trees have always been the most stable currency in the computer underground for exactly that reason. What you do with the compiled product of that code makes no difference if the blueprints are already in enemy hands.
I'm not sure it's even worth confronting Cringely's argument (that PSCP is a strategic technology that is crucial to .NET security) head-on, but I think I can make a decent response simply by evoking video game copy protection. Companies went through all sorts of contortion to devise copy-protection schemes. Kids with the Microsoft Macro Assembler bible thwarted them, because, just like in the DRM/Media battle, when you control the entire player architecture, it is impossible to completely secure the content. Regardless of whether PSCP makes it harder to grep out the cookie cutter exploit from the .NET IR, the payoff in the "battle" between code-obfuscation and exploit generation is much higher than the payoff to defeat copy protection, and nobody has ever won the copy protection battle.
Cringley is right every once in awhile (business plans occasionally do pan out!), like with Eolas and Burst. I normally wouldn't care enough to comment, but this time he's inadvertantly promoting a damaging and popular misconception in his article.
-----
Free P2P Backup, Windows & Linux
Writing code and/or comments in finnish or hungarian would be the ultimate obfuscation technique. People who know english have a bigger chance to guess what words could mean if they were written in persian.
Programming can be fun again. Film at 11.
...that obfuscator had better be completely bug-free.
Just suppose that every once in a while the obfuscated version of the code just isn't exactly 100% functionally equivalent to all the others.
How are you ever going to debug that?
It's far worse than a bug in a compiler optimizer.
Worse yet, this could even be used to attack competitors. Let's say the obfuscator has the ability to distinguish code from different vendors in some way... (well, for example, let's supposed the code is signed). It could subtly sabotage the products of certain vendors so that they seemed to be buggy or unreliable... and the victim would never know what had happened or have any way of knowing what had happened (assuming the victim could not reverse the obfuscation).
"How to Do Nothing," kids activities, back in print!
The GPL says "The source code for a work means the preferred form of the work for making modifications to it." Some obfuscated derivative of the source code doesn't count.
I don't believe it. This stuff can't cost "almost nothing" if it works with threads. If you have multiple paths of execution running through the same code, and the code is being dynamically morphed as the threads run, then either:
The morpher is fully thread-aware, to keep morph operations for thread A from pulling the rug out from under thread B (or C, D, ...). This implies extra sempahores, locking, unlocking, and the overhead of handling them.
The morhper is not fully thread-aware, and every so often the morpher for one thread will clobber another thread.
Am I missing something here?
To a Lisp hacker, XML is S-expressions in drag.
Umm no. You're way off, sorry.
Finnish and Hungarian are related, but not very closely. They're both Finno-Ugric languages, but the relation is roughly as distant as that between, say, German and Greek for instance. And probably less apparent, since German has quite a few Greek loan-words, particularly in scientific fields, but Hungarian and Finnish don't borrow from each other noticeably.
Other Finno-Ugric languages include Mansi, Khanty, Udmurts, and Mordvin, the balto-finnic languages (or dialects, depending on who you ask) which includes Finnish, Estonian, Karelian, Izhora, Veps, Vod, and Liv; and the closely related Saami languages spoken in the far north of Sweden, Norway, Finland, and northwest Russia.
This group is in turn more distantly related to the Samoyedic languages spoken in parts of Siberia.
Basque isn't closely enough related to any of these for linguists to have established any relationship, although many have suspected there was one and put a lot of time and energy into trying to find evidence of one.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Friends don't let friends enable ecmascript.