Morphing Code to Prevent Reverse Engineering?
ptolemu writes "Cringely's latest article discusses a new obfuscation technique currently being researched called PSCP (Program State Code Protection). An informative read that concludes with some interesting insight on the software giants that heavily depend on this kind of technology."
Java (and subsequently .Net) bytecode made a reverse engineer's life a bit easier on a whole, because of the way it could be decompiled into source that was extremely similar to the original.
All this seems like it would do, is remove that benefit and cause the reverse engineer to approach it the same old way one would approach a compiled C program (as you described, with a debugger and hooks on syscalls). Or bust out a new type of disassembler to emulate traces, and dump that to an assembly listing.
But you're right, it's not really that mind blowing if the reverse engineer has worked on non-java/non-.net binaries before.
I wonder if they've seen the proof of the impossibility of obfuscating programs?
There's an already excellent virtual machine debugger used for exactly this purpose by a few crackers.
Self-modifying code is ENTIRELY obsolete. Has been for ten years. Sorry.
I have found that most code generation tools (the kind you program boubles and arrows in, like this one) will give you C code that looks like it's been obscurified on purpose.
E.g. all states and variables are in an array called n[][] and the program is basically a big loop.
Quite impossible to know whats going on
These abstract machines have machine code that reveal *MORE* information to a disassembler/reverse engineer than, say, x86 or PPC assembly, but it is still far, far from being code.
Have you ever seen decompiled java bytecode? It's almost indistinguishable from the original source code. The problem with x86 assembly is that each instruction doesn't map 1:1 with the source. With java bytecode, it *is* close enough to a 1:1 mapping that perfectly legible code can be produced from an regular class file.
Um... yeah
Once the virus writers get a hold of this viruses will be much harder to catch, unless anti-virus writers start looking more for virus-like activity.
Of course, virus writers have been using this since the early 1990s. One particular virus called Ontario III (there might be others before it) used this trick. An interesting part from the virus writeup: "The Ontario III virus uses a very complex form of encryption with no more than two bytes remaining constant in replicated samples."
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
Deobfuscation is in NP. That is, for any type of obfuscation, there is a method to reduce it to a deobfuscated copy. It may take polynomial time, but it can be done.
Read the paper if ya don't believe me.
Time is an illusion. Lunchtime doubly so. --Ford Prefect
Check out:
Retrologic awarded Java byte code obfucator (Open Source! and free!)
not free but you can try before you buy
ZelixKlassMasterYet Another Java Byte Code Obfuscator (YAJBCO)
But I'm not sure they really work - just provide level of security similar to classical machine code. Btw. the MyDoom virus was BurnEye encrypted - so what?
You can defy gravity... for a short time
Having worked with Java bytecodes when I took compilers, I will say that you can get really close to the original program by looking at the bytecodes. You can't tell if someone used a while loop or a for loop, but you can still reconstruct the loop from the code.
The Java Virtual Machine is a stack machine - there are no CPU registers. There's a seperate memory store for local variables. That tends to make it easy to tell exactly what data is being operated on at any given time.
I've seen Java decompilers that return very clear, readable code.
the question is, why was such data allowed to be controlled by the client in the first place?
Because Quake is old, and when it was released, most players were still on 28.8 modem connections. In other words: a lot of stuff was left on the client for the purpose of saving bandwidth.
The Quake 1 source wasn't released until a couple years later. So there was no reason to obfuscate it, since it wasn't out there* (yeah don't get me going on the history of id Software).
* wasn't _supposed_ to be out there. There was a copy of at least part of the Quake 1 source floating around the warez channels for awhile, courtesy of crack dot com.
Cloakware also has some nice obfuscation technologies
I actually did this to write code to plug in to a commercial Java application. The documentation for writing plugin modules was so poorly written it would have been impossible without decompiling some of their existing modules.
Although I'm no expert at Java bytecodes, I didn't have any problems, and the only tools I used came with the Sun JDK.
On the other hand, some other code we got from them was put through a C obfuscator and it was almost impossible to reverse engineer. I gave up. Of course now that they provide unobfuscated code I'm able to make improvements to it for our project.
Depends. Some versions of javac vary the position of the test (start or end of the loop) according to the loop construct.
The GPL says "The source code for a work means the preferred form of the work for making modifications to it." Some obfuscated derivative of the source code doesn't count.
Minor nitpick, that is not completly correct.
An object can still have references to it, and still be elegible for gc'ing. That is what Weak- and SoftReferences are used for.
From the API docs:
If only I could come up with a good sig
Why would you need to search for "i" in code set up in this way?
Empirical study shows that such protection mechanisms are very weak.
Most game exploits could be stopped outright if every-so-often the well-known memory maps of the active data sets were MD5(ed) and transmitted to the server. As the hit-points and unit-statuses (like the unkillable peon hack for Starcraft) are well-understood by the server the faults can be easily detected and removed.
Remember that most game hacks involve an exterior program that twiddles the in-game parameters after the session is up and running. If the changes were treated as a proper database update journal then things are easy. As the server and the client "play their journal" out at one another a "checksum" operation can be requested and the two memory maps had better match. The errors don't have to be "corrected" after all, they just need to be punished.
This isn't un-crackable but it is un-crackable in psud-realtime. The theoritical cracker would have to have, essentially, a second game engine running to maintian "the image that ough to be there" along with the engine of the real game. Then there would have to be a reconciler of some sort. At a minimum the machine doing the hack would have to be at least three time (yes, oversimplified math 8-) as powerful as the gamer's gaming experience. (That is, if the hacker wants to watch untextured wireframes "kill eachother" at 4 frames a second... he could probably devise a cheat. 8-)
Even so, as the server-side is applying the remote journal some very simple interger checks (c.f. if ((StartingHP + RepairHP) Turns) then EjectCheater(); if ((Pedometer / Turns) > MaxSpeed) then EjectCheater();
Online game hacks almost invariably exploit the kinds of design errors that come from hiring programmers who have only ever programmed games. Simple distributed data integrity checks (and a suspicous mind, and an understanding of why windows programs are never secure) could pretty much cut them down to nill.
(And before anybody starts narfing, I fully understand that, what with the distributed processing model the above math would need "fudge factors" and some adaptability. These too, are techinques that are well understood by people who work with distributed processing and data collection and synchronization tasks understand. Lossy environment and everything. This also wouldn't involve any real CUP hardship if designed correctly. Compared to the time to compute and render a frame, doing an MD5 over the domain of core data every few seconds isn't that hard to schedule. And it wouldn't necessarily have to be even as strong as an MD5. But gawd people, these games arn't even doing a data domain XOR... They don't get to cry over it when people do an exterior memory image patch hack. It's like leaving your car running with the doors open in Flatbush and then whining when it gets stolen. 8-)
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
Give credit where credit is due. Granted, for all I know the one I linked is ripped too, but still...
Time to filter out the new redundant / trolls, relevant or not.
/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i