Reverse Engineering?
codec7 asks: "Ever since I read the article a about Australia legalizing reverse engineering, I've been curious -- How DO you reverse engineer software? I'm an average programmer really interrested in computer graphics, and would love to get into some software packages to see how they work. Nothing underhanded, strictly educational. I get off on algorithms. Anyway, am I in over my head even contemplating it? I have a feeling that by the time I could really reverse engineer anything (even with help) the information would be grossly obsolete and I could pick up better tips and tricks from some gaming mags. I would appreciate any direction I could get from readers who know a little about this kind of stuff." I figure it's probably best to discuss this now while it is still legal someplace in the world.
Actually, they would've been better off re-implementing them from scratch--instead of coming up with that lame AGA. I don't see how they'd be "reverse-engineering" the chips considering they'd have all of the docs laying around "somewhere" in a dead tree version.
Any time you circumvent "normal" program control and run under an interpreter or some alternate state-driven scheme, there are worlds of headaches...
Do compilers embed a string inside the binary identifying the compiler used? How would you find this?
A common technique for reverse engineering 3D ...) which vary
games is to write wrappers around the rendering
DLLs (Direct3D, OpenGL, Glide,
the way that the scene is rendered.
E.g. you might write a wrapper which ignores
textures so that you can see the lighting, or
one which ignores the lighting, or one which
renders everything in wireframe so that you can
see the geometry more clearly. And so on.
Does anyone know the exact legal status of reverse engineering? Is disassembly legal? is "black-box" legal? How about in other countries?
i even went a step further and wrapped the code into vb for a really great OCX~!!!! its free if you write me at cybersurfer976@hotmail.com
Hello?
If they had the docs, as you claim, then they would not have had to reverse engineer.
Fact is they lost the docs/design info, so they DID have to reverse engineer (and despite your protestations to the contrary, reverse engineer is exactly what they did).
REVERSE ASSEMBLERS ARE OLD HAT ! WE NEED A GOOD **REVERSE COMPILER** FOR MICROSOFT "C". AS I SEE IT, IT WOULD HAVE SEVERAL PASSES AND EMIT SUGGESTED "C" AND ASSEMBLER. IT WOULD BE GREAT FOR LEARNING JUST HOW THAT OBSCURE DEVICE OR PROGRAM WORKS. WHO KNOWS MAYBE IT'S ALREADY THERE AND I JUST DON'T KNOW WHERE IT IS OR WHAT IT'S CALLED :-)
http://www.suddendischarge.com
There a plenty of sources of knowledge about graphics programming available. If it is the algorithms you are looking for, then most of the mathematics can be found in any good bookstore (I recommend university bookstores) in either the computer book section or mathematics section. Gaming magazines are also decent, though they provided tips for certain things, and a good book will give you a solid mathematical foundation.
no offense folks, but most of the answers here do not come close to 'real' "reverse engineering".
logic analysis and substituting data in real time work much better.
a device can be built from surplus equipment (it requires an oscilloscope and a number of logic analyzers) and the process is too lengthy to explain.
a small board is created that sits between the processor and motherboard. the event lines from the analyzer are connected to the processor pins via this 'board'. a breakout box with switches is inserted between the event lines and the analyzer.
with everything connected and the target software being run, one can now observe the address map and note what is active and what is not active. at this point one would begin looking through filters for various 'states' in ram. an extra device can be built to change instructions 'on the fly'.
now is when you start using the software tools mentioned in the other responses.
What *linux* tools exist? I'd like to find a tool for watching/logging/grepping serial port transactions, and something similar for the parallel port. (Preferably with a display in binary, ascii, hex, etc.) Does anything like this exist? ...or is it so simple under linux that special tools are pointless? For instance, if I was running an app under wine or dosemu, how could I spy on what it did with the modem, what files it reads/writes, etc.
One of my latest projects was disassembling the PROM boot code from an Apollo DN3500 workstation, which I spent probably two man-months on and still don't have fully commented/documented... but I do have a far better understanding of what is going on. Figuring out what certain things were doing, though, like the video registers, would've been next to impossible in any detail. I did find some documentation on the video cards and the SCSI controller which was very helpful.
If you've never worked in assembly, and never tried to understand someone else's assembly code then I would say you are in for a real experience. I've done this with programs on my old Z-80 machine, several programs on an old minicomputer, the PC, and the Apollo, and its not for the faint-of-heart. But, when you lack documentation on something and you need to know what is going on, it can be a very useful way of getting the info you need.
If you understand how compilers work and the code a compiler will generate in assembler from, say, your C code, it is possible to reverse the process farther and generate a C routine that will do the same work. Of course, if you are patching an existing binary program this probably isn't very useful.
Sigh... I remember patching my old "TimeTrek" program so you could start out with 10 torpedo's instead of 3, and the base would recharge you back to 10 instead of 3. Took me a month to figure out where the stuff was that I needed to change...
i remember deu! sheeeiittt. those were the days. you are my hero and shit. fuckin a'.
English is not Fravia's native language, which might be obvious if you READ some of the pages where he actually mentions it! sheesh..
English is not Fravia's native language, which explains some strange wordings he uses. This might be obvious if you READ some of the pages where he actually mentions it! The guy definitely "knows his ass from a MOV instruction".. Don't be so judgemental next time.
I have slowly learned over 3 years various graphics algorithms, and never with the help of reverse engineering. It's been stated quite often that RE won't help a person learn new graphics algorithms. My suggestion is to use OpenGL in conjunction with books such as the Graphics Gems series to learn the more advanced algorithms. The reason I say this is because OpenGL is a relatively full-featured and advanced graphics API that can easily help you see the results of the various algorithms quickly and have a model to work against. When you write the code by hand, it is often very hard to know if your results are correct, or sometimes you might get an algorithm/article with no accompaniment. Thus, use the books and OpenGL together as a learning tool.
Wow, you wrote DEU??? You are so cool!!! ;-) [flashback]... Wow, all this talk of DOOM is bringing back the memories... I can almost smell that rusty shotgun, hear those demons in the distance... DOOM had such an atmosphere... it was so frightening. It got to you. I remember playing late at night, and being SCARED... No game since have had the same effect. I remember for several years DOOM/DOOM editing occupied the central position in my life. It was just the coolest thing imaginable... Each new level I downloaded was a whole new world to explore, filled with traps, secrets, and hidden terror. And the multiplayer! Each new level was a totally different experience... I still maintain that no game can equal a DOOM ][ deathmatch. DOOM will always have a special place in my heart... I would continue but I have to go to class.
I started RE 25 years ago on an Apple ][, getting down into the heart of assembly & DOS. Now sometimes I do it professionally when proprietary data gets in the way. At my last job, our GPS system for our plane was too painful to set up via the dials for our flight assignments, so I dumped the GPS memory to a PC & spent two weeks fiddling with dials and DIFFing the dumps to figure out what was going on. Now a flight plan is easily generated in the PC & sent off to the GPS.
At my current job, I RE'd some data that a competitor had, and allowed our company to read their proprietary format. These are not simple undertakings, and you really need to decide if you (or your company) can justify the time & cost.
hi.. i install linux but it say 'root' all time i cannot use it / log in! i need crack around this st00pid cop0 p0rntection... i need crack for it i been trying i put linux on floppy but you know hexed for dos keep say 'disk not ready' when i try load file plz hlp!
And now all these talents go wasted. I'm working with MS crap all the time, were back engineering is almost impossible, or too time consuming. Speak for yourself, but I found that disassembling bits of the Windows Entertainment Pack is a relaxing occupation. For example, you can go to any level in Pipe Dream without typing in the password.
Firstly, let's get some terminology straight. People who reverse engineer software or remove protection from software call themselves 'crackers' and refer to people who hack into computers as 'hackers' - which is more confusion to the hacking vs. cracking war. You guys at /. don't help much either ;) (just go into #cracking on #efnet and watch ppl come in asking how to hack a box and then getting told to go to #hacking instead ;) AFAIK, there is not a single program that hasn't been 'cracked' (i.e. deprotected). Cracking a program is not the same thing as hacking into a computer (though there are some skills overlaps e.g. with programming buffer overflows some ASM knowledge is required). It's not all talk, we reverse programs all the time. We even understand some programs more than the original programmers themselves! ~~ Ghiribizzo ghiribizzo.tsx.org
This is complete bull, plenty of people do it every day. For a good link check out www.fravia.org if its up. Failing that try: http://www.faizal.com/singapura-mirrors/fravia/
In contrast, the Amiga's boot floppy was laughingly simple. Just two or three instructions IIRC and it jumped back to the ROM. Since the boot sector was actually two sectors, 1024 bytes, there was ample room for writing your own. I wrote one, for instance, that said "Good Morning, Sir" when I put it into the drive (regardless of the time of day :-)
Also, there was an *awesome* book for the Sinclair Speccy called "the complete Spectrum ROM disassembly". This was really great for learning what maked the Speccy tick. For example, one could easily see that the Spec executed a HALT instruction (essentially halt-until-interrupt) after executing *each* BASIC statement, so that it would be impossible to execute more than 50 BASIC instructions per second!
Not only that, but hackers and crackers usually work together, crackers need hackers to protect them, hackers like crackers because of the software... I've heard that cracking does not require programming skills and that _most_ crackers can't program or wouldn't know how to begin to program, but they can read assembly and use disassemblers and debuggers and follow the logic of a program and isolate where some protection/encryption algorithm is and disarm it, like no programmer could.
Used book stores seem to have a lot of assembler books for various processors. People are probably getting hired to do Java, or C++ programming and getting rid of the Assembler books.
If you can get it, get IDA Pro, its the best disassembler I've ever had a chance to play around with. It automagicly resolves external functions, so you can actually see a program call Win32 functions from assembly.
:)
Reverse engineering under normal conditions is not hard to do, just very time consuming, I've dabbled in it, and was converting assembly into C code, but it was taking forever to do, its best left to smaller things like DLLs, or if you can hunt down a particular function you want to reverse engineer. I remember one time I was having difficulty, due to bad documentation from an SDK, of creating a plugin, so I reversed engineered one of their example plugins, and found out I was doing something in the wrong order, but this was not in the documentation, and would have taken me much longer to figure it out if I was not able to reverse engineer it.
Again its not that hard, and there are patterns, like loops that compare strings or get string lenghts, stuff like that is easily spotted and replaced. The other thing is that most functions use stack based calls, some functions automaticly pop the parameters off the stack, while others you'll see that after the function is called they are pushed off the stack.
Anyway there are a lot of interesting things to see..
Excuse my ignornance, but what does NOP mean? Is that assembler opcode? I've seen that used in programs that create assemlber code but I never understood what is meant.
Thanks and I apologize for asking you what, in the context of your experience, must be an excrutiatingly stupid question.
I've seen several comments about people taking a month to find the specific location in the program that they wanted to modify. If it's taking you this long, you're doing something wrong.
As an example, I wanted to figure out an audio codec that was buried in a big program distributed by a company that wasn't about to tell me what codec they were using. Here's how I did it:
First I set up my debugger to single step the program and record each instruction executed. I then ran the program without access to the audio data. This produced a ~100MB list of instructions.
Then I did the same thing except I let the program get the audio data. Once it output some sound, I killed it and got my second list of instructions.
Next I extracted from each file a list of unique instructions only, and finally diffed the files to get a list of only instrucions in the second file that were not in the first file. This produced a disassembly of the codec, and only the codec.
After that I just had to fix up a few data structures and I was able to recompile a stand-alone version of the codec, and figure out how it worked.
Crackers are not really programmers, they are very good at what they do, but I've heard they can't really program, or are not interested in programming. Maybe they would make great debuggers, in isolating problems in free software, but right now most commercial ware that they crack is better then a GNU/free equivalant, but again with out cracking they would probably do something else and it probably would be coding, maybe debugging.
Crackers are not really programmers, they are very good at what they do, but I've heard they can't really program, or are not interested in programming. Maybe they would make great debuggers, in isolating problems in free software, but right now most commercial ware that they crack is better then a GNU/free equivalant, but again with out cracking they would probably do something else and it probably would be coding, maybe debugging.
I don't think this is what you're talking about, but there is a variant of chess (endorsed by Fisher in fact) where you start with no pieces on the board and spend the first 16 moves placing your pieces in any arangement you like (or maybe the pawns are fixed but you can place the other 8?). The theory is that if you use the regular starting positions, there are only so many really good openings and they have been anylized to death, but if you can create your own starting position you have to think on your feet and make up the opening as you go.
There's an excellent article describing how a group of programmers reverse-engineere d NetNanny, the "censorware" package, to find its hidden list of blocking keywords. They also found a backdoor the makers had left in place, allowing anyone to subvert the program with a master password. I recommend the article both for people interested in reverse engineering and censorware.
- Mskala
The group, the site, and the scene.
IDA
It has always puzzled me why rev-e is 'illegal'.
No one is threatened with arrest then they disassemble other complex 'things' like auto transmissions.
It would seem that the only thing that may be even remotely contestable is the application of what is learned from such an endeavor, but please note that I say remotely and I mean it.
Could it have anything to do with the new technology religion that tells us that things are somehow 'different?' Of course, the lawyers don't mind...
---
Q: "what's the difference between a dead skunk in the middle of the road and a dead lawyer?"
A: "the skunk has skidmarks in front of it."
In most software licenses, there seem to be clauses prohibiting it (I said *most* licenses, not most GNU licenses, so don't go on a tirade about open source).
I remember a Triumph of the Nerds documentary in which they interviewed engrs. describing how Compaq reverse engineered IBM's PC, and how they had to be legally ignorant of the product when they started off, and go through various loopholes to land with the PC design thru reverse engr.
Also - what steps does a company normally take when a competitor reverse engineers their software? Has this happened in the past? (Retired engineers - give us your juicy stories).
BTW - re-engineering doesn't mean anything, it's just a buzz word used before drawing up a new reorg chart. Don't fall for that crap.
http://www.userfriendly.org/cartoons/archives/99se p/uf001080.gif
I agree with the last part of this posting: >>If you want to learn, check Fravia's Pages of Reverse Engineering. While there's lot's of crap there, there's also some nuggets of good information. Fravia's as eddy says. Quite the contrary: I am mining that site for valuable information since more than a year and I still seem to be only scratching the surface... Giglio
Rational Rose will reverse engineer all type of code. www.rational.com.
search for fravia and mammon's mirrors, read a lot from over there to get used on cracking/reverse engineering techniques/tricks - i saw url upper, so i wont repeat them. Cracking and reverse engineering is like anything else, it needs practice. On the other hand, ORC Tutorials are definitively the bests papers you can read around. they cover theory and are not so much about practice as they look. Programs you need: ** soft-ice - THE debugger around for dos/windows platforms. and gdb for un*x systems. you can do nearly all you want with these. ** IDA (Interactive DisAssembler) - THE disassembler. dont trust its name, it can reverse anything via an interresting technique called flirt. you have the compiler, you can decompile. (java, c, pascal, whatever...). works as well for un*x systems and support several processors. you can get both from http://protools.cjb.net. anything else needs apropriate tools, so look at you favourite search engine.
it may sound much less exciting, but why not pick up a graphics book? once the material found in those are old hat, you can move on to recent SIGGRAPH articles, etc...
Errrm, no. Rational Rose is just an Object Team clone that allows to go from code to design diagrams. Yes, that's called reverse engineering too, but it's not the same thing that is being talked about in this thread (going from binary to source code).
The snow blind alliance reverse engineered the software needed to upload/download mp3s to the Diamond Rio. They basically wrote a VxD to monitor traffic thru the parrallel port while running the windows version of the software. Interestingly they found any files can be uploaded or downloaded from the thing, essentially making it a small hard drive.
Check out these links for more info
Rio support under linux (GPLed)
http://www.world.co.uk/sba/rio.htm
How they did it
http://www.world.co.uk/sba/rio.txt
This is a perfect example of how reverse engineering benifited everyone including Diamond. It only increases Diamonds customer base if linux. NT, DOS users can use the Rio. Diamond only supported win9x.
Zenor is a Dork
Reverse Assembling a program is a tedious process where you must single step through machine instructions, analyzing each register of the machine at each step, and, analyzing the data sent to each address accessed by the program. The unfortunate truth is that, even though you may have the patience of Job, and painstakingly record every piece of data and instruction in the program, you will have achieved only a small portion of the result you need. If a program is anything less than some trivial routine that you probably already understand, its behavior will be dependent on the data sent to the routine, and upon the state of the system at the time the routine is run. It could also be dependent on asynchronous data that could arrive via interrupts caused by some unrelated event. In those cases, program control would move to sections of code that might have nothing to do with the target routine, however, it would be extremely difficult for you to know this in advance, and it is very difficult to see real time interrupts during controlled execution. The act of placing breakpoints in the code could actually alter the result. You could spend quite some time in a rathole, analyzing program instructions that have no relevance to your problem. Reverse Engineering is a process where you first understand the output of a "black box" system for each possible input and state condition, and then you independently create your own black box that produces the same output for each possible input. This is a mathematically very difficult process that increases exponentially as more variables are introduced into the system. How could you know how a complex thing, say a person, will react to every possible set of circumstances? You may be able to observe a person in many situations and get a pretty good understanding of their behavior patterns, but there is no way you could test all possible combinations of input/output. Graphics systems are very complex. OpenGL, for example, is a huge state machine, where the value of each state variable controls the way each function behaves during execution. It is not even possible to test every combination of input under every possible state condition, never mind be able to reverse engineer the system that created the output. If you are like the rest of us, and only have one life, I'd suggest you spend it studying the relevant literature until you become an expert in the field. At that point, maybe other people can waste their time reverse engineering your work.
It's called back-solving. There are on or two, possibly more, chess programs that use a similar method when given a game to analyze/annotate.
The basic premise is somewhat simple. Start at a won position, and work backwards to a known opening/middlegame position.
That's commonly known as random, or Fisher chess.
Wow, I haven't heard the name Raphael Quinet for a long long time. I used DEU for a ton of DOOM maps back in the day. I'm pretty sure I still have the zip file I downloaded off a BBS in San Diego, before I knew what the hell FTP was and the only exposure I had to the net was a Usenet feed that I could access through the UCSD library. Jeez, it's a big trip down geek memory lane today.
..trying to work out what is on a billboard by using a microscope. At that level everything looks the same...
Anyone who has spent time with a disassembler knows what I mean.
--
Simon
I couldn't live without this app. Cracking is a crapshoot without it.
Also.. being that it's a cracker's tool.. it's readily available as an "evaluation" version.. heh.
"If the original IBM PC bios had been patented, we would probably still be forced to use it to this day."
I think that it would be more likely that other competitors of that day would have been in a better position to compete and to expand their platforms. The Mac, which was always expensive, would have gained more acceptance. If the IBM were the only x86 game in town, the price would have been kept high, allowing for more sales of Mac's. The Apple II and Comodore would have continued to capture the home market. Clones meant more "action " for everybody and kept the price lower, meaning dad could buy the PC for the home that he uses at the office.
Just my opinion and a darn good one at that!
Internet.com defines Reverse Engineering as: "The process of recreating a design by analyzing a final product", which is what I always thought it was. It does not mean disassembling, decompiling, or tracing. I don't understand how reverse engineering can possible be illegal anywhere that even pretends to maintain freedom of thought.
well, it depends what you're doing. If you want to reverse engineer an executable, it's basically going to be in assembler without any comments. So, if you're good with assembler, you might be able to derive something from it. If you are going to reverse engineer some sort of protocol, where you can sniff it, it's much more useful.
DEU was THE best 2d FPS editor EVER. Good work.
--
-- Chris Dunham -- chameleo@xcelco.on.ca -- Chameleon --
This is tale is not true. It is now an exageration of what was initally just a rumor.
Yea right: They lost all 3 custom chip "designs".
Believe what you may but the sheer mega-bytage of CAD data involved on so many seperate manufacturing and design processes makes it practicaly un-destroyable.
Every PET/C-64/Amiga chip that CBM's (east coast!) wafer fab ever made, (The fab makes high power/mixed signal HDD chips presently) is backed up on 9track&8mm&CD&MO&etc. and well professionally archived & indexed thanx to Amiga zealots who made it safe & secure, thru thick or thin. Thanx again to Petro T.
I personally assisted in the above, so I know.
Joe Torre
Sr. HW Engineer
Amiga Inc. 1998
Joe Torre - X - HardwareEngineer @ Amiga Inc & ZapMedia Amiga, AmigaDE, BeOS, Linuxz, QNX, Rebol, Windoze, ZME: So
Reverse engineering software is NOT quite the same as in other industries. It is MAINLY for just figuring out interfaces, formats and the like for interoperability purposes.
E.g., the Samba team does this in order to figure out what a Windows client expects from an NT PDC/SDC Server, etc...
You don't want to, ethically at least, start disassembling software and reusing code in your own. That is a GOOD WAY TO INVITE A LAWSUIT. Besides, it is pretty damn hard to learn an entire broad concept (like 3D graphics) from dissassembling. At most, you would use a dissassembler in such cases just to see how OpenGL, or DirectX handles a specific function or object, but not the entire subsystem.
If anything, only disassemble to see how things work. Otherwise, get yourself a good book, or, if you are a professional with a budget, license a toolkit from an established vendor with a proven product.
Good luck ...
-- Bryan "TheBS" Smith
Independent Author, Consultant and Trainer
It's also one of the most common practices in silicon chips development.
While attending university lessons, my Adv. electronics professor candidly stated that something like 20% of all the R&D costs of a chip manifacturer goes into reverse engineering competitors' products....
He didn't supply any proof. But being him one of the most respected teachers in Italy's biggest Engienering University - he's got to have some connection, hasn't he?
I'd consider him a trusted source on this one.
For example is sniffing packets a'la the ICQ clones RE?
Or figuring out how a database or file format works?
I highly recomend the book by Michael Abrash, The Black Book of Graphics Programming. It has 'The Zen of Code Optimization' and 'The Zen of Graphics Programming' in it as well as most of his other work for DDJ and some unpublished stuff in the 2nd edition. Very good reference, it's easily readable, that covers most of the basics of 2d and 3d graphics. You will need to learn assembler regardless b/c you'll need hand optimize a few loops that eat cpu cycles and so forth. If you just want source look at some oss stuff, it will at least give you ideas if you're work on win32 or a good code base for *nix.
>[snip] back in the days of 8bit cpu's and 16k ram :P
:)
>one could easily disassemble code to see how for
>example a parallax scrolling routine was
>implemented in a game...not anymore I'm afraid
>
Well, yes and no.
You just need to be reasonable about what you look for.
Trying to look at an ASM dump of Quake.exe for instance, and figure out that it uses BSP trees to store 3D surfaces would be next to impossible. Especially if you didn't know what a BSP tree was, or how it could be used like this.
But, looking at the same ASM dump of Quake.exe, looking at the texture-mapping routines, trying to see how they got such good speed, wouldn't be a waste.
In the second case, you already know how the task works, and exactly where to look, so you're just looking at their refinements.
In this case, perhaps, trying to see how Carmack got his 'free floating-point divides'.
This is all assuming that you don't want to just go and buy M. Abrash's book _The Graphics Programming Black Book_ where he tells you all the secrets he and Carmack used.
For the curious... The texture mapping used fixed-point math, which is just integer numbers that you pretend and real numbers (ie, the CPU treats them like integers) which was a tradeoff for speed, but sacrificed accuracy. This was nothing new, people had been doing this for a long time. The innovation was using some floating-point math, which was slow but accurate, at the same time to correct the results.
For instance, imagine doing a long string of calculations and rounding off to one decimal place after each one. Your answer will be fairly impercise, less early on, and very far off later.
Now, what if you rounded off your answers for speed, but had a friend give you a value every sixteen numbers which corrected your answer. This way you could do the problems really quickly, but you would only drift a little, and the answer would be corrected, drift a little, be corrected, etc.
In this case, you would be doing the texture mapping, doing 'fuzzy' calculations to texture those sixteen pixels very quickly, and your friend would be doing one very accurate calculation in the background (the floating-point pipeline) to put you back on track after those sixteen pixels.
(This whole correction thing is needed because when you look at a wall from the side it doesn't look rectangular anymore. The farther from rectangular, the more your line is likely to drift if you use fuzzy calculations. But fuzzy calculations are sometimes hundreds of times faster than accurate ones... You do the math.)
One of my favorite books is Win95 System Programming Secrets by Matt Pietrik. It is a good example of reverse engineering put to constructive use (rather than copy protection removal). Matt provides psuedo code for many of the system calls and a good understanding of what is going on under the hood. There are mountains of useful information in this book about reverse engineering, check it out...
-- Virtual Windows Project
Since nearly all compilers that produce executable code from source first generate assembly, it's a really good idea to understand assembly in the first place.
The next step (which really applies to C programs but includes some other languages as well) would be to run 'gcc -S yourprog.c' on some C code that contains common constructs such as do {} while, for loops, function calls and library calls.
You can then have a look at the output and see how a compiler produces assembly from C source.
Compiler methods differ of course, but many are similar. After enough study, you'll find yourself almost "seeing" the C code that an assembly dump was made from.
This is a very simplistic method but works well if you're prepared to put in the time and effort.
--- Hot Shot City is particularly good.
Check out the Decompilation Page at http://www.csee.uq.edu.au/~csmweb/ decompilation/. These guys have published papers on decompiling programs back into semi-readable C code. I'm not sure how well it worked on "real-world" programs. Also, do a search for "binary editors" or "executable editors" (e.g., EEL, ATOM/Alto, Etch for SPARC, Alpha and x86 respectively). These tools edit binaries and have to do some form of decompilation to figure out control flow and so on. But they were not designed for reverse engineering. You could use them for making small tweaks to a binary for which you have no source (or optimizing it).
Yes, good point - the guy did say exactly that. He was curious about reverse enginerring, but if you want info on just graphics algorithms, try the gimp code!!
Juln
Reverse Engineering?
I think you will really need to use windows (not NT) for this.
As much as I like linux and that, Windows actually
lets you have control (mostly).
Dos Debug is a very handy tool for this and there
are many many tutorials out on its use.
Firstly it would be best to learn assembler (a hint- dont touch AT&T, keep with Intel)
BTW, anyone here into low level stuff also? ive been trying recently to learn how to write a micro kernel (ie OS), and its verry hard. Once id gotten the hang of real mode, then i had to learn pmode (currently) and im really getting stuck on this paging vs segmentation stuff.
Any assembler gurus here got any pointers to webpages/stuff for me?
Penguin aka Spatula
If it's graphics you're into, most of the algorithm stuff (other than some 3D games rendering) is very much in the public domain. Get one of the many excellent books and implement the algorithms therein, you'll learn a lot more.
Foley & Van Dam's "The Art of Computer Graphics" is still widely regarded as a definitive text.
Greetings to each and everyone,
a s in creating a macro that would simulate the systrap function) ?
I prefer working on the Palms themselves (Palm IIIx and palm V). I do not use pc tools. I work with:
Quartus Forth (a forth IDE)
RsrcEDIT (a resource editor)
LispMe ( a lisp shell on the Palm / occasionaly, but more to test routines...)
Insider (the best diassembler/hex editor on the Palm)
and of course the complete Palm OS SDKs on the Palm in Isilo (Html reader/converter)...
I do not patch as much as I used to... when I do, it usually takes me less than a few hours to get the bugger(s).. Have you had any probs with self modifying code ? I had...
Did any one found a way of having a debugger on board ? Debuffer is a good one (PC/MAC app)http://www.pagesz.net/~sessoms/debuffer/
Has anyone here found a way of implementing systraps in LispME http://www.geocities.com/SiliconValley/Lab/9981/(
Kind regards to all...
a really kewl site ---> http://palmwarez.backroom.net/index2.html (I have not checked it for some days.. I dunno if it's still up.. Darken if you read this, I'm still expecting an answer.. 2 patches for Vrubix not 8 !!! *smiling, teasing)
"No matter how idiot proof you make the program, there will always be a smarter idiot."
I have done a little RE in DOS, and I have found Borland's Turbo Debugger to be helpful. I was trying to modify an old game so it would run off the hard drive. It had a strange copy protection scheme that relied on a bad track on the 5.25" disk. Turbo debugger isn't the best debugger, but it lets you step through the source code while its actually running, and change opcodes and memory contents on the fly. Once you figure out what you want to do (like insert a jump instruction to bypass a test), you can make the changes in a hex editor.
-- 2 + 2 = 5, for very large values of 2
Check out DCC:
http://www.csee.uq.edu.au/~csmweb/dcc.html
...developed by Cristina Cifuentes, who was instrumental in making this kind of thing legal in Australia.
This mans site taught me it all. I havent been here in soooo long.
-Kancer
loser
hey man lets be k-rad and speak like a p00f73R!!
Yay, im a HACKER now, goodie!
Another Java decompiler that I've found useful is wingdis
If graphics programs were written as they were 10 years ago, disassembling the programs might prove useful. However, graphics applications nowadays and any application that runs on a windowed system is a big bundle of API code. Reversing from the assembly code to API code is, basically, impossible.
Esperandi
I've always wondered about alien reverse engineering. comic here
On the other hand, I'd say it is like chess moves, only rather than normal chess, it's a game of Kriegspiel, in which you don't get to see your opponent's pieces, only your own...
--
--
Wait a minute, this sounds like rock and/or roll. - Rev. Lovejoy
Maybe I misread the original, but i got the impression you were talking about graphics manipulation algorithms. If that's the case then check out the gimp (http://www.gimp.org/). Under the GPL you have full access to every aspect of the code, and I can't think of a better way to get a look at graphics manipulation algorithms. No need to RE from assembly--you've got the source.
Sounds interesting. But where did he come up with a figure like 20%. Did he supply references for his "candid" claim?
You forgot one:
4) View the system's behavior under a hardware emulator, or tap into the data/address bus with a logic analyzer.
To do this these days, you're best off using an older platform for your testbed. Say a slow 386 machine. I don't even want to think about how expensive a Pentium-class emulator is, and you can't just tap into the PCI bus with the logic analyzer *I* can afford (mine is an HP1630G). Obviously the "big boys" can afford some of this more than some kid.
For the sake of completeness, though, real-time hardware monitoring schemes need to be mentioned.
I got some serious flashbacks when you mentioned deu. I can still remember making my first DOOM map. :) Interesting how I just happened to notice a post on slashdot from someone like yourself. Oh well, have a nice day.
- Wiglaf [IoStream Productions]
I've rarely seen crackers and hackers working together. Although it is true that you can crack without being able to program, it almost the equivalent to being a 'script kiddy'. Many crackers are programmers by profession and all the good ones understand how to program and understand assembly. Take a look at http://win32asm.ownz.com, it's probably the best site for win32asm programming.
On RISC machines which have delay slots due to pipelining, the NOP is needed. e.g.
JSR r1 <-- branch instruction with delay slot
NOP <-- delay slot, this is executed before the branch is taken so if we don't want to execute anything here, we place a nop
Some of the best Reverse Engineering Tools
IDA Pro From Data Rescue
www.datarescue.com
Soft-Ice From Numega
www.numega.com
See also Fravia's site!
I would have to say the best place to start is with Fravia's Pages of Reverse Engineering
Fravia's Site of Reverse Engineering
trust me i know
Signed, 53 68 61 72 70
This is quite helpful in reverse engineering networking stuff, and other fun stuff.
---------------
---------------
Do not discount the fact that you have free will.
One friday night a friend and I sat down with a mac boot floppy and reverse engineered it. (If you have never done something like this you are not a true geek and should get off of /. IMHO
We sat down with a hex editor motorola's 68000 book. After a while I could recignise a mov command and where it was moving to/from just by the hex value.
I remember clearly that it set up a few registers, and then did a JSR to something in rom, then a few more registers and anouther JSR, and then a few more things. We decided to test our work out by modifying things just a little. (I think we put a yellow square on the screen) IT DIDN'T WORK! after much futher analysis we discovered that the first JSR never returned on that machine. It seems that on some other macs there were ROM bugs that the boot disk would fix, and the resgiers the first JSR had set up was just enough to tell the ROMs if they needed to be patched or not.
Overall this was fun, but it took us 6 hours to deal with a 512 byte sector. We didn't work with the ROMs at all (we are just guessing what happened in the non-returning JSR case above because analysis revelaed that the rest of the secotor set things up to disable rom, and write a couple fixes to obviously ROM loactions, now copied to ram so it was writeable)
We all remember Commodore - makers of that much adored Amiga (of which 3 still reside in my bedroom - 1 of which I still use). Well, I remember hearing that Commodore lost the design of the Custom Chips and had to reverse engineer them. Not quite a Competitor reverse engineering someone elses software/hardware, but I thought it was interesting.
And some people wonder why Commodore went down the tubes...
As a Tech Support guy, I'm often forced to "reverse-engineer" my own company's products, when the engineers don't provide documentation on how the stuff works, and when everybody's too busy working on the next release to talk to a lowly support rep.
It's the FUN part of troubleshooting.
It was MUCH more fun on Novell, because it had a built-in debugger. It's a PAIN IN THE ASS with NT, because they not only don't have a built in debugger, but so far I haven't found a decent one I can give to a customer free of charge to have them do something over the phone to gather info when a process has gone into the weeds.
I'll check out some of those links that are provided in some of the other messages on this topic, maybe my prayers have been answered. . .
(funny, I can understand assembler, but not C++)
"The number of suckers born each minute doubles every 18 months."
These are my friends, See how they glisten. See this one shine, how he smiles in the light.
-Malachi
"Life is all about strategy, mathematics and psychological perceptiveness."
I realy didn't expect to se this question on
And the answer is simple, you have to master the underlying technology, no matter what you are going to reverse engineer/hack.
There is no tool that will do this for you, there is no magic bullet. There are tools of course, to help, but you will have to apply the brainpower yourself.
So, in case of reverse engineering, get yourself a book on assembler and programming. Everybody has his/her motives to learn certain things, and curiosity is a wonderful excuse to learn a new thing.
Happy hacking.
--
Why pay for drugs when you can get Linux for free ?
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
> How DO you reverse engineer software?
:-)
Using disassembly/decompilation, debugging and/or probing (as in "black-box"). Have a look at this essay I've written. It's about my analysis of the program "Net Nanny", but the techniques used are fairly typical.
>Anyway, am I in over my head even contemplating it?
No. But it depends on why you want to do it. This is not a good way to pick up on new graphics algorithms, unless there is something very specific that you are after. However, you should give it a try if you think you might enjoy this kind of low-level puzzle (for me, it's a puzzle).
> I have a feeling that by the time [...]
Possibly, but I really think you would have the sense to give up before spending that much time
I might aswell tell you what tools I use:
The most powerful tool is NuMegas SoftIce. It's a systemlevel debugger for the Win32 platform (would love a linux version).
After SoftIce comes IDA. IDA is a very competent disassembler. It runs under Win32, but it supports many different processors and file-formats (MZ/NE/PE/ELF/DLL/etc).
Of course, you also need a good hex-editor. I use HIEW.
I primarily use reverse-engineering techniques to discover backdoors and extract encryption algorithms in commercial software (Me and a friend reversed the censorware CyberSitter earlier, which lead to the downfall of the Scientologists "ScienoSitter").
I also use the techniques to explore unknown file formats, see for example the project to reverse the fileformat used in the game Baldur's Gate. When doing this it is much less "debugging/disassembling" then it is hanging around the hexeditor.
If you want to learn, check Fravia's Pages of Reverse Engineering. While there's lot's of crap there, there's also some nuggets of good information. You can also use his messageboard to interact with competent reversers, but beware, you will have to show that you are working on your side too. Don't ask for ready solutions.
Hope this was of help, be in touch if you have any questions.
Belief is the currency of delusion.
Two things. .java from .class files, rather than the other way round, etc. (There are ways of circumventing this, too.) Other stuff exists for DOS / Windoze binaries, etc, where you can get your hands on the assembler underneath. :) ;) - it's only possible to get it into assembler and from there you have to have the linker's .map file.
/lot/ of time to spare...
First, there are several disassemblers out there - things like Mocha for java which produces
(Been there, done had, had the shareware "you've used this thing 10, 11, 12.. times" counting backwards by flipping one bit in the executable
I don't know of anything that'll take winword.exe and give you the source though (thank heavens
Second, though: how much of this is just talk? If you consider the various "hack this machine & keep the box" sites around, how long have they been up for? You'd think someone with their finger on the pulse of the underground "cracking" world would actually have managed to *do* something about them by now.
So you might well be better off with gaming magazines, if that's what you want, unless you've got a
~Tim
--
Rushing on down to the circle of the turn
Soft-ICE from NuMega technologies is the only tool you need for reverse engineering under Windows.
Reverse engineering is inspecting how existing software works, typically so that you can change it in some manner, usually by integrating your software.
I've had to do some reverse engineering. A modem was having problems with a third party communications package. A large customer used this package exclusively, and we needed to temporarily patch it for a demo for them. Since the modem used a rommed microcontroller, it had to be done in the software package to meet the demo date.
So first I used a protocol analyzer to figure out where the modem and package were going wrong. This was pretty easy, turned out to be a timing problem with one of the AT commands.
Then came the task of disassembling the package, which was written in Pascal. This part wasn't so easy, since there was no better way to find the spot that needed patching than to figure out what the code was doing until we happened upon the part in question. I used DOS's DEBUG.COM for the disassembly. It took 3 days of about 18 hours each to find the spot, and I ended up fully disassembling about a quarter of the program. The patch (1 byte change) worked, as did the demo, and all were happy, especially after the modem's rom got fixed.
In terms of skills, it helps a great deal if you have looked at a lot of assembly generated by high level compilers before. Then you can more easily see the ifs, fors, and cases instead of strange assembly sequences, and you're familiar with how parameters pass into and out of routines thru the stack. You also have to be pretty familiar with what the code is supposed to be doing to have much hope of recognizing the function of blocks once you've disassembled them.
It was a challenge, and kinda fun for that reason, but it's most certainly not something I'd like to do for extended periods of time. Most of it is very boring grunt work, with a rare "aha!" to lighten the mood. Some parts are *very* opaque when you only have numeric addresses and field offsets. Lots of things remain guesses for a long time, and you can easily go down blind alleys by assuming wrong things.
Another piece of reverse engineering that was much more fun was discovering the protocol and CRC generator polynomial used in another PC communication package, so I could write something that would file transfer with it from a VAX/VMS system. That was mostly a mathematical problem, and much more interesting. No code disassembly there, just probing with test blocks and watching the CRCs returned with a protocol analyzer.
There was a recent show on NPR about chess playing which involved the players starting off with a blank board, then placing the pieces on it one by one, effectively playing backwards, until finally both ended up with the "normal" initial placement. It struck me as quirky and intriguing. I believe this was on a Harry Shearer show, so it was most likely a parody, but it seems remarkbly similar to reverse engineering on a logical level. Not to mention interesting - have to try it out sometime and see what it's like.
L.
There are two methods being talked about here:
reverse assembly: This takes executable code and produces source code
reverse engineering: This is where a programmer works to replicate the functions of a program without referencing the original.
The latter is by far the hardest to do. The original BIOS clones were done this way. They knew that an interrupt call produced certain end results, so they wrote new code to reproduce this effect.
This is where software patents come in. Reverse Engineering doesn't affect copyright, unless you have been very unlucky and wrote the code exactly as the original programmer.
Patents protect methods. This means that if you have a patent that protects "a method of using x to produce y " even if you produce a system that contains no code from the original program, as long as x produces y you would still have to pay the patent holder a fee (or even be blocked from selling your code) and face a legal battle.
This is why laws to legalise reverse engineering are useful. It means that people can produce systems that are functionally compatible with existing systems and usually are better or less buggy.
If the original IBM PC bios had been patented, we would probably still be forced to use it to this day. Things like this are why I consider software patents A Bad Thing.
If you are familiar with assembly language, reverse engineering is "merely" very difficult :-)
In the early days of microcomputers, it was relatively easy (with sufficient knowledge of the relevant assembly language) since all the games (which were the only thing one wanted to hack) were "monolithic" blocks of code - no shared libraries, everything in a single self-contained block of code (aside from the calls to what was humourously refered to as the OS!)
Things are somewhat different now. Often one can find clues through mistakes (nt service pack 5 for example) made such as forgetting to remove (strip) details about variable names and other identifiers. (This was where the infamous "NSAKEY" idenfifier came from). Programmers are (usually) human and tend to use logical names for variables; once compiled and stripped, these names are lost.
Basically, reverse engineering takes a LOT of effort (=time=money)
S.
You need to be a genuine super-hairy assembly wizzard to even contemplate RE anyway, so if you don't understand what they're saying, it's best to go away and read up on the whole assembly thang.
http://www.phrack.com/main-index.html
They do recommend a few really groovy tools, if you hunt around the site a bit.
I'm an old RE... first the C64, then the PC (DOS), then Palm... never cared much for Win. Anyway...
:-)
:-)
Reverse Engineering to discover an algorithm is MUCH more difficult than reverse engineering to get around a timebomb or serial number check or dongle. If you want that, go to Fravia's site... It's been mentioned a thousand times here already and is very very good for teaching you how to think like an RE. Too bad it wasn't around when I was in the heyday of RE, I would probably be a lot smarter.
RE for algorithms starts out the same as RE for cracking -- you need to identify the code that is performing what you want to discover. Is there a button you click to invoke the function? Perhaps when it goes to save, you want to see how it encrypts... Find out what triggers the algorithm you're interested in or your job will become much harder.
After that's been done you fire up the debugger (many of them have been mentioned, use whatever you feel is best) and trap for that action. When the debugger comes back, you'll be looking at raw assembly or, if you're lucky, pseudo-commented assembly. Since you're interested in the algorithm, start dumping this info out somewhere where you can play with it later.
Now comes the fun part. Here's where you start using your brain. Identify the inputs and outputs. Try to identify what the registers and memory locations are being used for. Since you've dumped out code regarding the algorithm to a file, try to assemble it with some stubs at the start and finish to feed it your data and deal with the output. This process is iterative. You'll make many many passes, with the code becoming more and more obvious as you go about this. Printouts and pencils are your friends. Don't be afraid to scribble and question mark and feed it data, try forcing loop updates, etc. Remember you're trying to understand what the memory locations represent.
After some amount of time, you will have a chunk of code with scribblings all over the place and comments and hopefully only a few question marks left. Try to understand what the "small steps" (the assembly instructions) come together to form, and you can then rewrite the algorithm in a higher level language and see if you understood it. That was the goal, wasn't it?
Obviously, a good solid working knowledge of assembly is required to understand the code. For mathematical functions which use the MMX and 3DNOW! instructions you will need to get the books from the chip guys to figure out what they do, since they're not simple instructions. The single most useful thing I ever used in my RE days was the knowledge of how C, Pascal, etc. created stack frames and how they manipulated the stack, both from a called function and on a calling function standpoint. Not many programs were written in assembly, and without that knowledge you might still be able to deal with the code but it will be much more confusing when you see things referencing [EBP+8] and the like.
That, and when your debugger throws you in the middle of a function and you wanted to be just before that, you can analyze the stack frame and see where you should have placed the break statement.
Back when I hacked/cracked/whatever you want to call it, I did deal with a lot of assembly programs. I probably knew the int21h/25h/26h/27h calls better than Microsoft. I disassembled many BIOS' and learned lowlevel hardware control. It's an innate knowledge now that I still possess, although I guess it dates me now. Nobody much cares how to access the keyboard controller to toggle A20 or program the PIC to change DRAM refresh rates or look at the actual data stream coming off an RLL hard drive.
There's an old, old database out there called HelpPC. I've used it since it came out and have added to its database extensively for all the Mode-X VGA graphics, hardware controls, etc. ftpsearch should find a version for you.
Since most people these days take the easy way out with trying to thwart reverse engineering you generally only have one or two layers to get through right at the beginning. However if the original author was wise, the anti-cracking code will be sprinkled throughout, possibly including the code containing the algorithm you want to learn about! It's unlikely, but you may have more of a task than you first thought.
Could be useful in other areas too, like embedded hidden compressed code.
Read this article for more.
Reverse engineering is not a problem, in an open-source community. But sometimes you just want to make sure your backers get a return for their investment. (We'll get back to the open-source model in a minute.)
I work for a company which invested hundred's of millions of dollars, that nine integers worth of dough, in a row folks, and fifteen years in the doing, developping a complex monster of a financial data model. This thing is REALLY complete.
Now you may just think that anything less than a billion is pocket change for Bill G. But we're not Bill G. Neither were our investors and some of us sweated blood to evolve this beast.
How would you feel if YOU and a couple of hundred of your friends had worked for years on something only to see your potential for break-even vanish to null, zip, nada, nothing, by somebody swiping a copy of your database, publishing the data dictionary and reverse-engineering the software you worked fifteen years on to build interfaces to all the data tables.
I'd venture to offer: "Very broke and broken hearted." Not to mention angry enough to sick a law firm full of angry paperwork at the perpetrators to get them to "cease and decist."
No, Reverse engineering is not a problem in an open-source community. Because it shouldn't happen. The development should have been collaborative from the get go.
Open-source is a great concept, if a project was started as open-source and everybody chips in to improve the product and its place in the market and doesn't rip-off the concept or the source code depriving the originators of revenue by contributing nothing and reaping the rewards.
Also the project has to come first and be acknowledged as THE project. Its no good if we have another Apache project competing for web services or another Samba competing for intersystem operability. You have to contribute to Apache and Samba and not just grab the code and, uh, fork off.
That's what the corporate world, the backers and users of the fruits of our labors are really worried about. The technical issues don't bother them. Like everybody else, they don't understand them.
I'm still a little leary of all these Linux distributions. I'm not the only one. Luckily, GNU/Linux, Apache, Samba, (Mozilla some day, I hope,) and a host of other products were well controlled and evolved in a collaborative yet well-controlled atmosphere.
That's rare and I'm going to OpenSource / OpenScience 99 at Brookhaven labs tomorrow to see what is being done to spread the faith.
Because its the the competitive aspects of the development process for all of the other 'stuff' that's a real worry.
That's why there's a hundred lousy accounting packages out there rather than just ONE great one... That's why there's a hundred lousy payroll packages out there rather than ONE great one. We haven't yet learned to share and play nice with the other children.
Say we learn to spread the wealth, that begs the question "How do you spread the cost?"
So far, lets face it GNU/Linux, Samba, Apache and a whole lot of other software out there is at the beginning of the cost curve. We're not talking millions of dollars here. The development has to date been very Mom-and-Pop and devoted hobbyist.
Will the development slow to a crawl when its not something that's universally needed, like an OS or a Web server, but gets into niches, like financial models or if something get really expensive to build.
Can an open-source approach work in the alleys off of the Bazzar? That's THE question and we have to come up with a right answer, a complete answer.
Because if we're to reach farther by standing on the shoulders of giants, let's make sure the giants are not heading in separate directions and leaving us, the development community (not just the hackers) hovering precariously over a growing chasm.
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
Despite the rather black box connotations of reverse engineering, it is a legitamate R&D exercise. Car manufacturers reverse engineer their competitors, chefs try to dissect recipes, etc. In the computer context according to the Centre for Software Maintenance,
....).
Reverse Engineering is the process of analysis of an existing software system to create representations of a different form or higher level of abstraction.
and
Reengineering is the process of analysis and modification of an existing software system to reconstitute it in a new improved form.
Given the preenial occupation of engineers is to make things better, faster, or cheaper, tinkering with the electronic toys or source code is a natural pasttime. It is only the marketeers and financial managers that want things to be "hidden" so that the cost (and thus in their mind = value) is higher (basic economics, remember scarcity == higher price). Obfuscation of code is an obvious mechanism to exclude competitors, however, it significantly adds to the long-term cost of maintenance and also reduces the potential market. How many times have you've been given a piece of code with the design specs/architecture residing in someone's head who've just left? Wouldn't it be nice if some intelligent bit of software did the analysis and gave you the answer (yeah, wishful thinking but still
Perhaps people don't realise it but there are 2 information monopolies, one when one party controls everything and there is no alternative, the other when everything is freely available so that there is no competition (and thus no alternative).
LL
Reverse engineering is just like science. You pose hypotheses about the system you are reverse engineering, then you find ways to test those hypotheses.
Like Raphael, I have been involved in reverse engineering a number of fun systems -- Quake network protocols, Quake map formats, OpenGL programs, LEGO Mindstorms. All of these systems required the same general strategy but different tools and background knowledge.
My experience has been that the hardest step of reverse engineering a system is getting started. You typically find yourself needing some tool to analyze a system that you just don't have.
For the Quake network protocol, that tool was a UDP proxy that dumped data in a format I could understand. For OpenGL programs, getting a tracing infrastructure set up was required before meaningful analysis of how programs use OpenGL could proceed.
For LEGO Mindstorms, the hardest part would have been figuring out the baud and bit encoding of a serial stream, since I didn't have easy access to an oscilloscope at the time, and I do not like trial and error when something unrelated -- like my serial port setup -- could go wrong; however, somebody had figured out the serial encoding already, and the starting hump ended up being obtaining a serial line data analyzer. (I ended up using a SGI Indy as a serial proxy.) Later Mindstorms reverse engineering required a disassembler/assembler/compiler tool suite.
Quake map files were easy; the tools were a hexdump program, a program to factor numbers to find strides, an HP calculator, and some programs to convert number formats.
The second part of reverse engineering something is finding useful ways to sort through the data that gets collected or generated. A lot of times I found that this boiled down to writing a program to analyze and print out the data, which I could then look over and study.
For example, the Quake 2 network protocol included some compressed information whose presence or absence was indicated by a bit vector; to figure out which bits mapped to which data, I used a program that tabularized and printed out the data in a really wide format; I then looked for patterns in the compressed data across many, many packets. By lining up columns of numbers that were clearly the same data, it was possible to infer which bits mapped to that data. Kind of like playing a really long game of Mastermind where somebody else gets to choose most of the guesses.
For Quake map files, after figuring out the basic layout of the records in the file (which hasn't changed much from version to version), the important part was figuring out the meaning of all the data. Early on, a useful tool was one that started at a given offset and printed out the range of numbers located at a particular stride from the starting point; this helped associate records of different types to one another. Later, and by far the most useful tool for analyzing Quake map files, was a level renderer used to verify the meaning of the map data. Related tools verified not only the meaning of certain data structures, but also high-level aspects of the algorithms that used these data structures, e.g. collision detection.
A single-stepping, single-buffered OpenGL trace player helps enormously when trying to figure out what algorithms an OpenGL program uses.
In any event, along with these common aspects of reverse engineering (getting started, developing the right tools), the general strategy of posing hypotheses and testing them holds throughout. Once you think you have figured out something new, you need to come up with a way of testing and verifying (or rejecting) the new idea. Unverified knowledge is just a guess, it's not really valid until you have confirmed it with at least one test; the more independent tests the better, as this leads to more confidence in both the new and the established knowledge. Hacking is of the essence here; the faster you can test an idea, the faster you can move on to testing new ones. Not only that, but the results of testing one new idea often opens up more questions and leads to further progress, at least early on.
This is just like science. The only difference is that when you are reverse engineering something, presumably the underlying mechanisms are already known by others -- the original engineers.
Since the original poster was interested in graphics, I will add that for OpenGL programs, I use a "DLL proxy" replacement for SGI's OpenGL Stream Codec based on ideas from a program called gltrace. The proxy dumps a trace of OpenGL/GLX/WGL calls that can later be replayed, single-stepped, run through a simulator, etc.
-Kekoa
Quite frankly, if you're interested in graphics algorithms, you'll learn a LOT more by reading a book such as "computer graphics: principles and practice" by foley et al:
http://www.amazon.com/exec/obidos/ASIN/02018484
Beware though...don't even bother reading this without a knowledge of matrix algebra in the very least...and it won't hurt to know some multivariate and vector calc. The book gives the algs. in C so you can use them in any way you want. Honestly, I think you can probably learn all the math and algorithms you need to be a CG whiz quicker than trying to even partially RE any graphics package out there by looking at asm code.
Trying to reverse engineer graphics packages will be a pretty big waste of your time...you'll definetely be a PRO at gdb though by the time your through
The reason is, compilers can do some pretty crazy optimizations of the code, and trying to understand what's going on can be nearly impossible (given you have other things to do besides trace through jump tables and stack ptrs all day). Disassembling code is mainly done for easy things such as cracking software that requires reg. keys and such...where in the simple case, you're just making sure some conditional (ie: if (keycheck) blahblah ) always evals to true so that the system thinks you have a valid key (it can get much more complicated than this, but this is the easiest and most common case).
So, if you get off on gfx algorithms, buy a good book such as the one I mentioned and don't bother trying to disassemble anything...the complexity of today's software systems has led to a big decrease in the use of disassemblers....back in the days of 8bit cpu's and 16k ram one could easily disassemble code to see how for example a parallax scrolling routine was implemented in a game...not anymore I'm afraid
-dr0ne
Here's a *gasp* GPL'ed decompiler for Java, of all things: Homebrew Decompiler. I came across it while searching for GPL'ed software on Freshmeat. The annoying thing comes in when you decompile something, you just get the straight source.. no comments. Because comments and what have you are stripped out during compile time, for hopefully obvious reasons.
Of course, if you're bothering to decompile something, chances are likely that you're doing so because you know code inside and out. If not, the added benefits comments give to code readability are /really/ going to hit home.. and how.
~ Kish
I used to work at Chipworks, which reverse-engineers integrated circuits. Here's how *hardware* RE works; I'll get to software later.
You remove the chip from the package by popping it open or (if it's a plastic package) dropping it in boiling sulphuric acid. You prepare several samples, etching each one to a different level of interconnect. The last sample is etched down to the transistor level.
You then create large photomosaics of the chip. If you do it with conventional film cameras, you end up (for simple memory chips) with huge "carpets" of images about 8 metres long and 1.5 metres high.
You get a team of engineers to crawl around on the photos for a few months, marking interconnect, labeling signals (first with tentative names; then with real names) and extract circuitry.
You get a team of engineers to eyeball the schematics for a few months and organize them. Gradually, the picture of how the chip works emerges. Note that this is for simple chips like DRAM or Flash memory chips. It's totally impractical for a complicated chip like a microprocessor.
If you're dealing with flash memory, you have to worry about programming algorithms. These chips usually have on-chip ROMs or PLAs which control programming signals. You spend another few months decoding the PLA's and coming up with the algorithms.
For software, you have to know the background. It really helps if you know which language, compiler, OS, etc. was targeted. Most compilers produce standard assembly blocks for common constructs, so this helps you recognize things.
In my youth, I partially reverse-engineered the ROM of the TRS-80 Color Computer. Since this was written in assembler, the reverse engineering was not too hard. It's basically a lot of staring.
Fravia's Pages of Reverse Engineering are perhaps the most comprehensive pages on the net that cover all sorts of different aspects of the subject. It's great - everything is written in a friendly way and there's absolutely loads of information there.
--
Everything I know in life I learnt from
There is no simple answer to the question -- it depends on the platform and on what your specific goals are. To completely reverse engineer a program, ie. generate source code that can be compiled to exactly the same executable files, is extremely difficult. Commercial dissasemblers are generally expensive. So it's usually better to define a smaller section that you want to analyze, instead of trying to recreate equivalent source code. Since I only have experience disassembling/analyzing programs running on x86 DOS/Win/Win9x, I'll have to limit practical tips to those platforms -- and in most cases, you can use a combination of the methods. You'll also need to at least be able to follow assembly code, and will need to understand the function calling conventions that the particular program and operating system use.
1) Watch the program under a debugger. This is probably the most time-consuming method, as you've got to single step until you find the section of code you're interested in (and this assumes that you can recognize what you're looking for). Most modern Windows debuggers allow you to break when a DLL is loaded, and you can then set breakpoints in the loaded module.
2) Use an API Spy program (ala Matt Pietrek, which is unfortunately out-of-print). Windows programs make heavy use of calls to functions in DLL's -- it is often possible to intercept these calls. You can find out what DLLs a program is linked to by either disassembling it, or by looking at the executable under a hex editor. To get the source code to the original API Spy from Pietrek's MSJ article, look for MSDN Knowlege Base article Q122274.
3) Rename the DLL that you want to intercept calls to, and write a "wrapper" DLL with the original name of the target DLL. The "wrapper" should have stubs for all functions in the target, which simply log information about the function call, and then call the intended function in the target DLL. But if you don't have header files for the functions you want to intercept, you'll need to watch at least one call to the function under the debugger to determine the number and type of arguments, as well as the calling convention.
---
"Go Metallica. Die RIAA." -- Linus Torvalds
Reverse engineering is very important. It is important in some cases, where the owner of a software refuses to tell you how a certain future works. When you reverse engineer, you work on low level, you cannot work with high level languages like C. You will need to understand the assembly langauge of the platform you want to reverse. When you obtain the software, you will have to load it into a debugger or disassembler, dump it into assembly code, and figure it out.
The best way to build this skill is quite easy, tho it takes time and dedication, write lots of small programs in C. compiler then, but generate assembly output instead of executable output. gcc -S. Take a look at the assmebly output and study it, with time you will easily be able to recognize how compilers generate their source, you can take a look at an assembly source, and easily tell if a loop is a while, for, do-while loop, and such.
Reversing this source into C is a whole new story, if you don't have access to include files and if the binary have been stripped, then it makes it harder. Here is an example of such C source
_0x8024639c()
{
_0x8032d584 = 1;
_0x8032d588 = 0;
_0x8032171c();
_0x8032174c(2, 0x37a);
_0x8024922c(90);
}
This tells us that in address 0x8024639c we have 5 instructions. We load 1 to address 0x8032d584, and 0 to the next address. We call a function at 0x8032171c, we then call another function at 0x8032174c with two arguments which are 2 and 0x37a, then we call another function at 0x8024922c with the argument 90. Not pretty, but with time we will be able to understand the structure of the program, and be able to assign the functions meaninful names.
Reverse engineering is a task which requires utmost patience! It takes pain and pratice and time. If you want to figure out algorithms, it is better you read papers and come up with your own. If you want to figure out closed algorithms, or learn how a software works when it is closed. then reverse engineering is for you. By the way, the sample snippet of source was done by a friend, and is part of mario64 reversed enginerred.
In the sample snippet below, he was able to make out some of the variable and this code is much more readable.
_0x8024b13c()
{
if (!_0x8032ddd0) {
if (mario->Power > 0) {
block = mario->Power / 256;
} else {
block = 0;
}
if (Level > 0) {
DisplayStats |= 2;
} else {
DisplayStats &= 0xfffffffd;
}
if (CoinCount Coin && _0x8032d5d4 & 1) {
if (mario->_0x0c & 0x00006000) {
a = 0x38128081;
} else {
a = 0x38118081;
}
CoinCount++;
SetSound(a, mario->0x54);
}
if (mario->Life > 100) {
mario->Life = 100;
}
if (mario->Coin > 999) {
mario->Coin = 999;
}
if (CoinCount > 999) {
CoinCount = 999;
}
StarCount = mario->Star;
LifeCount = mario->Life;
_0x8033b268 = mario->_0xac;
if (PowerBar _0xb2 > 0) {
DisplayStats |= 0x8000;
} else {
DisplayStats &= 0xffff7fff;
}
}
}
If you think you can handle such stuff, then jump aboard and have fun.
------ Curiosity killed the cat. {satisfaction brought it back | it didn't die ignorant | lack of it is killing mankind
Unless you're really after ultra-secret proprietry algorithms, you're time would probably be much better spent looking at some of the recent research literature down yer local Uni library. You won't get full implementations, but you will get explanations, and it'll be easier to understand than pages upon pages of disassembly.
There are at least two things that you can do when attempting to reverse engineer a piece of software. The first one (not legal in several countries) is to decompile the code: take a debugger or decompiler and check what instructions are executed. The second one (legal in most countries) is the "blackbox" approach: consider the software as something that produces some output(s) depending on its input(s), and try to guess what is inside.
This second approach is the "real" reverse engineering. By carefully crafting some inputs and observing the outputs, you can often draw some conclusions about how the software behaves. With some patience and a lot of trial and error on simple inputs, you can find some patterns in the software: stuff that does not change, stuff that changes depending only on one of the inputs, and so on.
In the good old days (well, five years ago), I was the author of DEU (Doom Editing Utilties), the first program that was able to create new levels for Doom. I also contributed to Matt Fell's Unofficial Doom Specs and Olivier Montannuy's Unofficial Quake Specs, the documents that describe the WAD and PAK file formats and other internal details about Doom and Quake. Almost everything in the Unofficial Doom Specs was gathered by reverse-engineering. It was only later (with the release of Doom II) that id Software released some information to the community, presumably after they saw that editing Doom levels was a very popular activity. I am grateful for id Software's support of the editing community in their later games, but the first informations about Doom had to be found the hard way.
Most of my efforts in decoding Doom's WAD file format (and later Quake's PAK file format) involved an hex editor for viewing and editing the raw files, and custom tools that I built along the way for making editing easier (or tools that I received from other people, like DEU 3.0 from Brendon Wyber). A key thing is also to share as much information as possible with other people who are progressing on the same front because you often get more in return than what you found by yourself. For WAD files, it was easy to find that the file was organized a bit like a tar archive: a header, a directory containing names of objects and offsets within the file, and the data for the objects. Then the trial and error starts: try to guess what an object might be, modify a few bytes, run the game and see what happens. If your changes produced something useful, write it down and share the info with others. If the game crashed, try again. Repeat until you have understood everything.
Sometimes, you will find data structures that you do not understand. That was the case for Doom's NODES, SEGS and SSECTORS data. If you share enough information with others, maybe someone will have an idea and find that the data structures are related to something that they know. This is exactly what happened for Doom: Alistair Brown and a group of students from Bradford suggested that the unknown data might be a BSP tree. After reading some papers on that topic (I didn't know anything about BSP trees), I was able to implement a first BSP builder in DEU. And then it became possible to create brand new levels for Doom, instead of only changing the textures and location of the monsters as we did in the first few months. Releasing the source code for the tools has probably helped a lot. Other people were able to create their own tools based on that, and then the next reverse-engineering steps became much easier when the other games based on the same engine were released (Doom II, Heretic, Hexen, Strife,...)
Ah well... The good old times... Sigh!
-Raphaël
First of all you need a target program, something that you'd like to reverse. Initially I'd suggest writing a smallish C/C++ program yourself, compiling it, then reversing that - I say this because it'll be small, and you should know how it works.
Once you have a program to reverse - Around 20-40k would be a good size for a start, then you'll need a dissasembler there are several around, mostly commercial ones, and some free ones.
Heres the few that I've heard of / used:-
Anyway by now you should be able to decompile most executables, and study the assembly language.
Much of this is going to be strange to you, so try to seperate out the different parts of the assembly - such as the startup code, the function calling, and the error handling.
After a bit of study you'll soon realise what a lot of the common code is doing.
Heres a small example of the sort of thing the DIS.exe will produce:
:00402001 E8AA220000 call 004042B0
:00402006 83F801 cmp eax, 00000001
:00402009 7434 je 0040203F
:0040200B 6A00 push 00000000
:0040200D 68A0034100 push 004103A0
(StringData)"Startup Message"
:00402012 6878034100 push 00410378
(StringData)"Program Starting In Interactive Mode"
:00402017 6A00 push 00000000
:00402019 C705F839410000000000 mov dword[004139F8], 00000000
:00402023 FF1560644100 call dword[00416460]
:00402029 EB0A jmp 00402035
From this you can see the names of the win32 function calls that the program is making - this will help you "copy" the program back into C.
This is what I've done - with a good read of the assembly language you can see which Win32 API calls the program is making, and that should give you a good head starting into reimplimenting the code... *grin*
Of course if you are just interested in cracking, (Removing protection from programs, etc), then the same things apply - you just search through your listing till you find "Incorrect Serial", etc, and change the conditional jumps appropriately - But thats' bad so I'm not going to encourage you.
Once you have your program, you can then try to translate it into C
Another to decompiling via static analysis is to study the program inside a debugger. Without a double NuMega's Soft Ice is the best debugger - but its also very, very terse, and quite hard to learn.
To give you some idea of the power of soft ice, when it is loaded you can set a breakpoint on a function such as "MessageBoxA", (Called from AfxMessageBox, et al), with
bpx MessageBoxA
Then when any running program calls this function Soft-Ice will pop up, allowing you to study / modify the running process.
Anyway thats enought encouragement for now. Just have patience and it will all come to you.
Steve
--
Okay...first of all, the most common reason for reverse-engineering something is to remove or bypass the copy protection scheme. I know this because I see the results float by every day on IRC channels. I bought every game Blizzard ever made, but yet I am extremely glad some talented person reverse-engineered their copy to get rid of the damn CD checks...which I just happened to acquire as a "offsite copy for backup purposes".
/. then you are smart enough to have access to a Windows box or know how to VMWare one.
/. effect) because it is easy to understand.
.BIN files and a file called "yearly.prc.s"
;What is this? ;Successful ;Failed
;Our memory address! ~~~~~ ;Leave 0 or make 1?
= -=-=-=-=-=-=-=-
In the interest of education about reverse-engineering, I'm going to discuss a step-by-step process as it relates to the most popular use for it...copy protection. If you want to flame me, or moderate this down to -2, or post hateful comments go ahead...your local library has instuctions on how to make bombs so I see no reason to feel guilty for teaching something that requires at least ten times the brain power of bomb making.
Not to mention, if you seriously think that someone who has never reverse-engineered a program in his or her life is going to somehow magically take the information I post here and never have to pay for software again, get real. Warez are just a search engine away so if someone actually take the time to LEARN a new skill, I say good for them. Okay, here we go...
Required definitions:
1) PRC : Palm Resource File. Like an EXE. Contains app's code, graphics and forms
2) Form (FRM) : A Palm window filled with text, buttons or dropdowns
3) Alert (ALT) : Popup form, often used to comment on the validity of one's reg code
4) String (STR) : ASCII characters like "Registration Successful!"
5) Offset : Location in the PRC file where we will do some editing
6) ID : 2 byte hex code such as 05 DC that identifies a Resource
7) Trap : Palm function to perform a task such as sysTrapStrCompare
Required tools:
Yes, they are all for Windows, but if you are smart enough to read
1) PilotDis to thoroughly break down PRC files
2) Prc2Bin to untangle PRC files into Alerts, Forms and Strings
3) Palm Emulator (POSE)to run PRC's on your Windows machine for testing
4) Hex WorkShop to reach into PRC files and change the most delicate parts of them
5) UltraEdit to quickly find text occurrences in files
Now, you don't need to own a Palm to learn how to reverse engineer a Palm program, but the emulator isn't going to run without a PalmOS ROM file. If you can't figure out how to get a ROM file on the Internet, forget about learning to reverse engineer and instead learn how to use a search engine. Of course, if you own a Palm, or know someone who does, POSE has a button to download the ROM from it.
Fire up the Palm Emulator (POSE) and load the OS ROM to begin a new emulation session. Load up whatever program it is you want to reverse engineer. I recommend starting with a nice simple program like Yearly (stand-by for
Click the menu button and navigate to the Info menu where you'll find an About option. Choose that option and note the text "Unregistered Copy" (write this text down). Now choose the Register option and notice the test "Yearly Registration" (write this down too). Enter a bogus number like 111 and notice the message "Registration Failed: You entered a wrong code!"...yes, you need to write this down too.
Now, let's see where those resources are in the program file. Run PilotDis with the command "dis yearly.prc". Then run PRC2Bin with the command "prc2bin yearly.prc". If everything was done properly then your should have many
We know that the "Registration Failed" window is an Alert because it pops up when we enter the wrong number. If you've installed UltraEdit then right-click on one of the Alert files like "Talt138c.bin" and open it. What do you see inside? It says "Registration Successful!" Check out the other Alerts. Open them one by one. You'll notice that A#138D (Alert ID #138D) contains the text "Registration Failed".
Now, where do these ID's show up in the program? Open up UltraEdit and load "yearly.prc.s". Search for $138D to locate calls to the Failed Alert.
Here is the code nearby the call:
00004a02 4e4fa0c5 TRAP #15,$A0C5 = sysTrapStrCopy
00004a06 6100bcf4 BSR L48
00004a0a defc000c ADDA.W #12!$c,A7
00004a0e 4a6c0028 TST.W 40(A4)
00004a12 6708 BEQ L607
00004a14 3f3c138c MOVE.W #5004!$138c,-(A7)
00004a18 60000006 BRA L608
00004a1c 3f3c138d L607 MOVE.W #5005!$138d,-(A7)
00004a20 4e4fa192 L608 TRAP #15,$A192 = sysTrapFrmAlert
It is called at x4A1C (Address 4A1C), right after the #5005. Right above it is a call to $138C after #5004. This is our Successful Alert. Where does it decide what Alert to branch to? See the instruction 'BEQ'? That means 'branch if the compare or test equals 0'. The TST.W 40(A4) code above it checks memory location 40(A4). Therefore, somewhere in the program, 40(A4) is set to a value and depending on the value, flags either Pass or Fail responses. In this case, a 0 means we've Failed the check. Let's take a look at the the code immediately above it: L48 (label 48), part of the BSR (Branch Subroutine).
Here is truncated routine L48 that you found by searching for 'L48':
0000071e 3e06 MOVE.W D6,D7
00000720 9e40 SUB.W D0,D7
00000722 426c0028 CLR.W 40(A4)
000007fa 4e4fa0c8 TRAP #15,$A0C8 = sysTrapStrCompare
000007fe 4a40 TST.W D0
00000800 6606 BNE L53
00000802 397c00010028 MOVE.W #1,40(A4)
00000808 4cee04f8ffe8 L53 MOVEM.L -24(A6),D3-D7/A2
0000080e 4e5e UNLK A6
00000810 4e75 RTS
Noticed that the instruction CLR.W 40(A4) refers to the key address? This makes the memory location equal to 0 which it remains until another instruction affects 40(A4). The only way around it is at x0802 where 40(A4) may become 1. The BNE instruction above x0802 steers the program from the Pass outcome. Farther up, the instruction sysTrapStrCompare is a big tip-off things are coming to a close in L48. Memory location D0 will hold a 0 if the two compared values are equal and a 1 if they are not. The BNE instruction at x0800 means "branch if the compare or test does not equal 0". So, if we can ensure that the routine always returns a 1, it will always Pass.
Let's take the quickest path and plan to get rid of the BNE instruction, ensuring that we will always MOVE.W #1 into 40(A4). When you want to remove an instruction, the easiest thing to fill it with is a NOP, short for no instruction. The 2 byte opcode for NOP is 4E 71.
"Huh?" Well, unfortuately, Palms use Motorola DragonBall processors and the list of instruction codes is copyrighted material. I can't provide a link to it here. If you are seriously interested in reverse engineering on the Palm platform, you'll have to contact Motorola and request a copy from them. I'm providing the NOP number here so that its possible to learn how a reverse-engineering process works.
Anyway, at x0800 we want to place 4E 71. Because our BNE L53 instruction is also 2 bytes we only need one NOP. Open Hex Workshop or another hex editor and go to address x0800. In UltraEdit, type CTRL+G and type '0x0800'. You should find '66 06' there. Type over it with '4E 71' and save.
Now, reload the modified yearly.prc file into POSE. Try to register with any number. Does it work? Of course it does. Check the About screen. It says "Registered" now.
Thus ends the lesson. You now know why reverse-engineering is such a hot topic on the Internet today.
- JoeShmoe
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-- I wonder which will go down in history as the bigger failure: the War on Drugs or the War on Filesharing