Reverse Engineering?
codec7 asks: "Ever since I read the article a about Australia legalizing reverse engineering, I've been curious -- How DO you reverse engineer software? I'm an average programmer really interrested in computer graphics, and would love to get into some software packages to see how they work. Nothing underhanded, strictly educational. I get off on algorithms. Anyway, am I in over my head even contemplating it? I have a feeling that by the time I could really reverse engineer anything (even with help) the information would be grossly obsolete and I could pick up better tips and tricks from some gaming mags. I would appreciate any direction I could get from readers who know a little about this kind of stuff." I figure it's probably best to discuss this now while it is still legal someplace in the world.
There are at least two things that you can do when attempting to reverse engineer a piece of software. The first one (not legal in several countries) is to decompile the code: take a debugger or decompiler and check what instructions are executed. The second one (legal in most countries) is the "blackbox" approach: consider the software as something that produces some output(s) depending on its input(s), and try to guess what is inside.
This second approach is the "real" reverse engineering. By carefully crafting some inputs and observing the outputs, you can often draw some conclusions about how the software behaves. With some patience and a lot of trial and error on simple inputs, you can find some patterns in the software: stuff that does not change, stuff that changes depending only on one of the inputs, and so on.
In the good old days (well, five years ago), I was the author of DEU (Doom Editing Utilties), the first program that was able to create new levels for Doom. I also contributed to Matt Fell's Unofficial Doom Specs and Olivier Montannuy's Unofficial Quake Specs, the documents that describe the WAD and PAK file formats and other internal details about Doom and Quake. Almost everything in the Unofficial Doom Specs was gathered by reverse-engineering. It was only later (with the release of Doom II) that id Software released some information to the community, presumably after they saw that editing Doom levels was a very popular activity. I am grateful for id Software's support of the editing community in their later games, but the first informations about Doom had to be found the hard way.
Most of my efforts in decoding Doom's WAD file format (and later Quake's PAK file format) involved an hex editor for viewing and editing the raw files, and custom tools that I built along the way for making editing easier (or tools that I received from other people, like DEU 3.0 from Brendon Wyber). A key thing is also to share as much information as possible with other people who are progressing on the same front because you often get more in return than what you found by yourself. For WAD files, it was easy to find that the file was organized a bit like a tar archive: a header, a directory containing names of objects and offsets within the file, and the data for the objects. Then the trial and error starts: try to guess what an object might be, modify a few bytes, run the game and see what happens. If your changes produced something useful, write it down and share the info with others. If the game crashed, try again. Repeat until you have understood everything.
Sometimes, you will find data structures that you do not understand. That was the case for Doom's NODES, SEGS and SSECTORS data. If you share enough information with others, maybe someone will have an idea and find that the data structures are related to something that they know. This is exactly what happened for Doom: Alistair Brown and a group of students from Bradford suggested that the unknown data might be a BSP tree. After reading some papers on that topic (I didn't know anything about BSP trees), I was able to implement a first BSP builder in DEU. And then it became possible to create brand new levels for Doom, instead of only changing the textures and location of the monsters as we did in the first few months. Releasing the source code for the tools has probably helped a lot. Other people were able to create their own tools based on that, and then the next reverse-engineering steps became much easier when the other games based on the same engine were released (Doom II, Heretic, Hexen, Strife,...)
Ah well... The good old times... Sigh!
-Raphaël
First of all you need a target program, something that you'd like to reverse. Initially I'd suggest writing a smallish C/C++ program yourself, compiling it, then reversing that - I say this because it'll be small, and you should know how it works.
Once you have a program to reverse - Around 20-40k would be a good size for a start, then you'll need a dissasembler there are several around, mostly commercial ones, and some free ones.
Heres the few that I've heard of / used:-
Anyway by now you should be able to decompile most executables, and study the assembly language.
Much of this is going to be strange to you, so try to seperate out the different parts of the assembly - such as the startup code, the function calling, and the error handling.
After a bit of study you'll soon realise what a lot of the common code is doing.
Heres a small example of the sort of thing the DIS.exe will produce:
:00402001 E8AA220000 call 004042B0
:00402006 83F801 cmp eax, 00000001
:00402009 7434 je 0040203F
:0040200B 6A00 push 00000000
:0040200D 68A0034100 push 004103A0
(StringData)"Startup Message"
:00402012 6878034100 push 00410378
(StringData)"Program Starting In Interactive Mode"
:00402017 6A00 push 00000000
:00402019 C705F839410000000000 mov dword[004139F8], 00000000
:00402023 FF1560644100 call dword[00416460]
:00402029 EB0A jmp 00402035
From this you can see the names of the win32 function calls that the program is making - this will help you "copy" the program back into C.
This is what I've done - with a good read of the assembly language you can see which Win32 API calls the program is making, and that should give you a good head starting into reimplimenting the code... *grin*
Of course if you are just interested in cracking, (Removing protection from programs, etc), then the same things apply - you just search through your listing till you find "Incorrect Serial", etc, and change the conditional jumps appropriately - But thats' bad so I'm not going to encourage you.
Once you have your program, you can then try to translate it into C
Another to decompiling via static analysis is to study the program inside a debugger. Without a double NuMega's Soft Ice is the best debugger - but its also very, very terse, and quite hard to learn.
To give you some idea of the power of soft ice, when it is loaded you can set a breakpoint on a function such as "MessageBoxA", (Called from AfxMessageBox, et al), with
bpx MessageBoxA
Then when any running program calls this function Soft-Ice will pop up, allowing you to study / modify the running process.
Anyway thats enought encouragement for now. Just have patience and it will all come to you.
Steve
--
Okay...first of all, the most common reason for reverse-engineering something is to remove or bypass the copy protection scheme. I know this because I see the results float by every day on IRC channels. I bought every game Blizzard ever made, but yet I am extremely glad some talented person reverse-engineered their copy to get rid of the damn CD checks...which I just happened to acquire as a "offsite copy for backup purposes".
/. then you are smart enough to have access to a Windows box or know how to VMWare one.
/. effect) because it is easy to understand.
.BIN files and a file called "yearly.prc.s"
;What is this? ;Successful ;Failed
;Our memory address! ~~~~~ ;Leave 0 or make 1?
= -=-=-=-=-=-=-=-
In the interest of education about reverse-engineering, I'm going to discuss a step-by-step process as it relates to the most popular use for it...copy protection. If you want to flame me, or moderate this down to -2, or post hateful comments go ahead...your local library has instuctions on how to make bombs so I see no reason to feel guilty for teaching something that requires at least ten times the brain power of bomb making.
Not to mention, if you seriously think that someone who has never reverse-engineered a program in his or her life is going to somehow magically take the information I post here and never have to pay for software again, get real. Warez are just a search engine away so if someone actually take the time to LEARN a new skill, I say good for them. Okay, here we go...
Required definitions:
1) PRC : Palm Resource File. Like an EXE. Contains app's code, graphics and forms
2) Form (FRM) : A Palm window filled with text, buttons or dropdowns
3) Alert (ALT) : Popup form, often used to comment on the validity of one's reg code
4) String (STR) : ASCII characters like "Registration Successful!"
5) Offset : Location in the PRC file where we will do some editing
6) ID : 2 byte hex code such as 05 DC that identifies a Resource
7) Trap : Palm function to perform a task such as sysTrapStrCompare
Required tools:
Yes, they are all for Windows, but if you are smart enough to read
1) PilotDis to thoroughly break down PRC files
2) Prc2Bin to untangle PRC files into Alerts, Forms and Strings
3) Palm Emulator (POSE)to run PRC's on your Windows machine for testing
4) Hex WorkShop to reach into PRC files and change the most delicate parts of them
5) UltraEdit to quickly find text occurrences in files
Now, you don't need to own a Palm to learn how to reverse engineer a Palm program, but the emulator isn't going to run without a PalmOS ROM file. If you can't figure out how to get a ROM file on the Internet, forget about learning to reverse engineer and instead learn how to use a search engine. Of course, if you own a Palm, or know someone who does, POSE has a button to download the ROM from it.
Fire up the Palm Emulator (POSE) and load the OS ROM to begin a new emulation session. Load up whatever program it is you want to reverse engineer. I recommend starting with a nice simple program like Yearly (stand-by for
Click the menu button and navigate to the Info menu where you'll find an About option. Choose that option and note the text "Unregistered Copy" (write this text down). Now choose the Register option and notice the test "Yearly Registration" (write this down too). Enter a bogus number like 111 and notice the message "Registration Failed: You entered a wrong code!"...yes, you need to write this down too.
Now, let's see where those resources are in the program file. Run PilotDis with the command "dis yearly.prc". Then run PRC2Bin with the command "prc2bin yearly.prc". If everything was done properly then your should have many
We know that the "Registration Failed" window is an Alert because it pops up when we enter the wrong number. If you've installed UltraEdit then right-click on one of the Alert files like "Talt138c.bin" and open it. What do you see inside? It says "Registration Successful!" Check out the other Alerts. Open them one by one. You'll notice that A#138D (Alert ID #138D) contains the text "Registration Failed".
Now, where do these ID's show up in the program? Open up UltraEdit and load "yearly.prc.s". Search for $138D to locate calls to the Failed Alert.
Here is the code nearby the call:
00004a02 4e4fa0c5 TRAP #15,$A0C5 = sysTrapStrCopy
00004a06 6100bcf4 BSR L48
00004a0a defc000c ADDA.W #12!$c,A7
00004a0e 4a6c0028 TST.W 40(A4)
00004a12 6708 BEQ L607
00004a14 3f3c138c MOVE.W #5004!$138c,-(A7)
00004a18 60000006 BRA L608
00004a1c 3f3c138d L607 MOVE.W #5005!$138d,-(A7)
00004a20 4e4fa192 L608 TRAP #15,$A192 = sysTrapFrmAlert
It is called at x4A1C (Address 4A1C), right after the #5005. Right above it is a call to $138C after #5004. This is our Successful Alert. Where does it decide what Alert to branch to? See the instruction 'BEQ'? That means 'branch if the compare or test equals 0'. The TST.W 40(A4) code above it checks memory location 40(A4). Therefore, somewhere in the program, 40(A4) is set to a value and depending on the value, flags either Pass or Fail responses. In this case, a 0 means we've Failed the check. Let's take a look at the the code immediately above it: L48 (label 48), part of the BSR (Branch Subroutine).
Here is truncated routine L48 that you found by searching for 'L48':
0000071e 3e06 MOVE.W D6,D7
00000720 9e40 SUB.W D0,D7
00000722 426c0028 CLR.W 40(A4)
000007fa 4e4fa0c8 TRAP #15,$A0C8 = sysTrapStrCompare
000007fe 4a40 TST.W D0
00000800 6606 BNE L53
00000802 397c00010028 MOVE.W #1,40(A4)
00000808 4cee04f8ffe8 L53 MOVEM.L -24(A6),D3-D7/A2
0000080e 4e5e UNLK A6
00000810 4e75 RTS
Noticed that the instruction CLR.W 40(A4) refers to the key address? This makes the memory location equal to 0 which it remains until another instruction affects 40(A4). The only way around it is at x0802 where 40(A4) may become 1. The BNE instruction above x0802 steers the program from the Pass outcome. Farther up, the instruction sysTrapStrCompare is a big tip-off things are coming to a close in L48. Memory location D0 will hold a 0 if the two compared values are equal and a 1 if they are not. The BNE instruction at x0800 means "branch if the compare or test does not equal 0". So, if we can ensure that the routine always returns a 1, it will always Pass.
Let's take the quickest path and plan to get rid of the BNE instruction, ensuring that we will always MOVE.W #1 into 40(A4). When you want to remove an instruction, the easiest thing to fill it with is a NOP, short for no instruction. The 2 byte opcode for NOP is 4E 71.
"Huh?" Well, unfortuately, Palms use Motorola DragonBall processors and the list of instruction codes is copyrighted material. I can't provide a link to it here. If you are seriously interested in reverse engineering on the Palm platform, you'll have to contact Motorola and request a copy from them. I'm providing the NOP number here so that its possible to learn how a reverse-engineering process works.
Anyway, at x0800 we want to place 4E 71. Because our BNE L53 instruction is also 2 bytes we only need one NOP. Open Hex Workshop or another hex editor and go to address x0800. In UltraEdit, type CTRL+G and type '0x0800'. You should find '66 06' there. Type over it with '4E 71' and save.
Now, reload the modified yearly.prc file into POSE. Try to register with any number. Does it work? Of course it does. Check the About screen. It says "Registered" now.
Thus ends the lesson. You now know why reverse-engineering is such a hot topic on the Internet today.
- JoeShmoe
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-- I wonder which will go down in history as the bigger failure: the War on Drugs or the War on Filesharing