Morphing Code to Prevent Reverse Engineering?
ptolemu writes "Cringely's latest article discusses a new obfuscation technique currently being researched called PSCP (Program State Code Protection). An informative read that concludes with some interesting insight on the software giants that heavily depend on this kind of technology."
There are reasons beyond "theft" for wanting to obfuscate your code.
For instance, consider Quake. Quake is a great deal of fun, so long as everybody is playing fair. However, when somebody cracks the game and develops an aimbot (they're real), it's not fun anymore. Even if Quake were open source, some kind of run-time obfuscation would be great just to help prevent cheaters.
I recall reading about an exploit for Age of Empires (or was it Age of Kings...) where in a networked game, you could run a monitor program that would let you see what resources your opponent had. Then, by watching changes in their resource supply, you could guess what units they were building. That was automated for you, of course. "Ah, they keep spending 45 wood and 25 gold, they must be building archers! I should build cavalry."
Anyway, even when we're not talking about greedy corporations protecting their intellectual property rights, there are still good reasons for keeping what's going on in your program hidden from prying eyes.
We don't have a state-run media we have a media-run state.
how come for every new technology that comes out that is suppose to "secure" us i can think of a way it can be used "malicously"
.NET is so "young" can't you see this used for creating a perpetual virus that can't be removed? you wouldn't even be able to ID the bug that caused this to virus to run itself.
ex. I write YourDoom.A and i write it using this new code morphing obfuscator. how exactly are Anti-virus programs 1. suppose to remove this? 2. identify this?
Given the numberous amount of VB/Outlook bugs and considering that
If you need a tamper-resistant client-side binary, don't use Java. It's that simple. A good engineer understands many different tools and selects the best one for the job.
Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
Would code that was changing itself while running (polymorphic) be nailed by a heuristically-scanning anti-virus program? I would hate to de3velop something, and then all of a sudden get seriously bad press for releasing what seems to act like a virus. Just food for thought.
I remember a primitive attempt at this in the copy protection routine for dBase III way back when the DMCA was but an industry wet dream. This was the ProLok Disk, the one with the laser burn.
There was a section of code hidden by about forty layers of byte-by-byte XORing against bytes looked up in a table. At each level, it would intercept the Debug and Single Step interrupts, XOR the next layer, and jump into it. In those floppy-only days, it had to be reverse engineered a layer at a time, each step producing a disk with one less layer. Approximately the 40th disk had the actual copy-test code...which turned out to be pirated code!
This was also before BIOS shadowing in RAM, and the BIOS executed straight from ROM. The test for the laser burn required hooking into it, which of course they couldn't do in ROM. Instead of working out their own shadowing routine they copied some 700 bytes of the IBM Fixed Disk BIOS, inserted their hooks, and then made a weaselly attempt to cover their tracks by interchanging logical-shift with arithmetic-shift instructions wherever it was guaranteed that nothing would go through the carry bit.
And all that meshugass was there only to hide the publisher's own piracy...the copycrack consisted of a two-byte change elsewhere on the disk.
rj
Well, that seems a bit simplistic. However, when I take a look at running code, there are several things that don't jive with the article:
.NET framework itself in a sandbox. You watch as the OS is fed an EXE, it identifies the type and starts to run it. The CLR (potentially) starts up and checks permissions, loads all the JIT stuff, etc. Then, bytecode is churned. You are stepping one instruction at a time, interrupting at each of the CLR's instructions. One notices the buffer used for the JIT and the "feed" going into it. Tools are written that do this watching and drop items (portions of files, by instruction) into logical "components" and each pass becomes a little clearer. By running the application and matching behavior to component, you begin to learn how the application is designed.
.NET can be both "open" and yet "secure". Unless the bytecode SPEC is proprietary and unable to be reverse engineered (it is neither, hence the "open" nature of it), one can form a CLR that processes valid-yet-obsfucated code and rebuild a logical image of how the program is designed. I can take the MONO bytecode runtime have it start to partition the code into blocks and examine, for the calls each block makes to layer beneath it, what it is allegedly doing.
One forms logical boxes around things. For instance, a good cracker knows to identify the boundary between the JIT and the bytecode, know where the security check call is made, and what threads are monitoring the heap and garbage collector.
When cracking, you initially "freeze" the code, the machine, the stack, and the registers. You're working at such a low level, it begins with a step-by-step of understanding how everything fits together.
For example: Imagine the
Also, you look at the program file itself. THIS is what the article seems to be saying: the bytecode is obsfucated...without context clues you're not going to discern how it works. But you can snap up context many times with a cracking tool. In this article, they seem to imply that each snapshot will be different by scrambling the variable names, or program locations. By seeing how all the names have been crammed, a pattern develops.
Also, I take issue that
Lastly, what makes these tools immune from reverse-engineering themselves? If I know the patterns this DASH-stuff uses, I can begin to reverse them. Unless there's one-way hashing or hardward/networked keys flying around, everything to solve the puzzle is right there, for me and my friends to examine at our leisure. This is done today by virus writers to try to avoid detection by checkers; they know how they work.
If this tool becomes actually as valuable as he claims, then I expect it's own design (stolen or RE'd) to appear in the cracker circles like any other.
But perhaps I'm missing something?
Okay: for argument's sake we'll say that PSCP works like a charm to prevent nasty evil crackers from debugging my program effectively.
.NET program's IL instructions are JITted into machine code at runtime, thereby making it pointless to modify the IL -- unless these people are invalidating the JITted code every time the IL code changes (is that even allowed?) or providing a translator between IL and machine code that inserts code-morphing instructions into the OUTPUT machine code (which I seriously doubt).
.NET assembly which is bursting with code, how does PSCP prevent him from taking the assembly apart and dissecting? Answer: PSCP *doesn't* provide any such protection.
.NET assembly. I may need to hunt back and forth for a little while to find the specific place I'm interested in, but I can't imagine that PSCP is able to change offsets or locations of instructions by very much.
We'll ignore the obvious problem presented by the fact that your
We'll ignore the fact that instructions which modify other code are generally very easy to spot -- because they must refer to regions of a program's address space where code resides -- and it should be easy to find these code morphing instructions and turn them into no-ops.
We'll set both of those tricky issues aside and focus on the crux of the matter. How does this PSCP protect the program *before* it starts running? When the cracker gets his hands on my juicy
So, the school of crackers that likes to use a debugger to deduce program behavior may find themselves having trouble. But in the worst case, all I need to do is run the morphing code in a debugger, record the location of the program counter at the point in the program's execution in which I'm interested, and then consult the corresponding section of the program code that resides in the original
Think about it: if PSCP wanted to change the location of a jump target, for example, it would need to track down every other instruction in the program that jumped to that instruction, and modify the jump to point to the new location of the jump target.
Examples of Stupid Obfuscator Tricks include:
- Scrambling exception ranges so they don't nest.
- Inserting non-structured GOTOs
- Inserting never-executed exits from synchronized blocks
There are others, these are just the ones that I recall. A compiler (static or JIT, it does not matter) must deal with all of these.There are two outs that I know of. One is to only use interpreted code and morph it on the fly (still seems vulnerable to an observant interpreter, but perhaps the amount of necessary observations can be made extravagantly large), the other is to require use of a "trusted" compiler (which, in turn, requires use of a "trusted" OS to prevent substitution of an untrusted compiler, which in turn requires "trusted" hardware to prevent substitution of an untrusted OS).
Like Ultra-Wide-Band networking and enterprise XML integration, this column fits a Cringely mold of writing an entire article about the business plan of one small company most people haven't heard of, and passing it off as an important insight about the IT industry as a whole. It works for the most part because there are a lot of neat-sounding business plans out there. Every start-up company in the world has a story about how their vision, fully realized, would shake up the entire industry. It makes for great column-fodder, but provides poor analyses.
If you read the whole column here twice, you immediately become aware of the fact that Cringely's entire "argument" turns on the idea that security rests on keeping source code secret. Because "interpeted" code "always" discloses code secrets, "interpreted" platforms like .NET will require the intellectual property wrapped up in schemes like PSCP. Therefore, the "inventors" or PSCP hold an important position on the chess-board of the entire IT industry. Microsoft and Sun will launch bidding wars to ensure they control the PSCP IP.
Of course this is just crazy-talk. Just for a moment, leave aside the argument that something like PSCP can really prevent reverse engineering. In the post-PSCP world, all security rests in a distributed repository of millions of lines of source code "locked up" in an organization that spans 45 buildings and untold tens of thousands of people in Redmond. You can't keep source code secret. Closed source is a speed-bump to dedicated attackers, who will break into networks, find corrupt insiders, or even get janitor temp positions in order to get the code.
Nobody working in security seriously believes that the source code for Windows 2000 wasn't floating around the computer underground years before the most recent disclosure. 'Twas ever thus: most of the SunOS and Solaris exploits that powered attackers in the mid-90's were derived from stolen Sun source code. Stolen source trees have always been the most stable currency in the computer underground for exactly that reason. What you do with the compiled product of that code makes no difference if the blueprints are already in enemy hands.
I'm not sure it's even worth confronting Cringely's argument (that PSCP is a strategic technology that is crucial to .NET security) head-on, but I think I can make a decent response simply by evoking video game copy protection. Companies went through all sorts of contortion to devise copy-protection schemes. Kids with the Microsoft Macro Assembler bible thwarted them, because, just like in the DRM/Media battle, when you control the entire player architecture, it is impossible to completely secure the content. Regardless of whether PSCP makes it harder to grep out the cookie cutter exploit from the .NET IR, the payoff in the "battle" between code-obfuscation and exploit generation is much higher than the payoff to defeat copy protection, and nobody has ever won the copy protection battle.
Cringley is right every once in awhile (business plans occasionally do pan out!), like with Eolas and Burst. I normally wouldn't care enough to comment, but this time he's inadvertantly promoting a damaging and popular misconception in his article.