Reversing a Checksum Algorithm?
Todd asks: "Does anyone have good suggestions or advice regarding techniques for determining the checksum of a serial data protocol? I am currently working on a fun little project using a digital display sign which uses an unknown serial protocol. I cannot move on with the project until I figure this dreaded checksum out."
it's not these guys right? (you didn't say what phone # you tried so...)
c0w goes moo.
really.
I don't think you have much choice but brute-force it.
Collect as many as possible checksum -generators, and try them on portions of message.
Some more-or less trivial are 'LRC', CRC, CRC16, and CRC32 ones. Of course XOR too. difficulty is to determine which characters go to checksum, as it's not always that 'STX'/'ETX' are included.
fucktard is a tenderhearted description
If you also have the product whose protocol you are
:)
trying to emulate, let someone in a free country trace
it with a debugger (gdb, WinICE, Turbo Debugger...)
in order to find at least the assembly code out.
Then you can hire an assembly guru (like me
to write a routine which does the same.
My Karma isn't excellent, damn it! (And
Is it a very simple machine with a very simple processor and limited memory? If so, they may well have written their own with shifts and xors. On the other hand, if it is a modern processor with megabytes of memory, they probably have used a library function. Can you get access to the standard libraries for that chip?
How long is the checksum? What happens if you feed a wrong one? If it is just a single byte, and the box doesn't lock up completely on error, try all possible values for every packet!
Why do you think they put the checksum in the protocol? To catch a few dropped bits over a bad line, or to prevent tampering? If the later, watch out for some real cryptography. If only the former, assume they took the path of least resistance.
In Murphy We Turst
you said you have a DOS application that generates the data.
Well run it in a debugger and locate the part the does generates the checksum, you could rip the code out directly(or load the application up into memory and call the checksum bit), this will probably violate a few laws though.
So the best way is use there code to work out the algorythm and then write you own code(even better get someone else to do it) based on the algorythm
thank God the internet isn't a human right.
While looking at the test values that you sent to the sign, I noticed something about the 13-14 bits. Without more test cases, you can never be sure.
The 13-14 bits look like they change with the length of your message. With messages of 'TEST' or 'WICK' the value in 13-14 is 'DA BC'. With a message of 'T', the value in 13-14 is '5A BA'.
Neat project - keep us posted on the progress!
Hope this helps.
I just read your page fully(shoud have done this first!!)
It shouldn't be to hard to trace the part of the application that does the checksumming.
First of all there need to identify what area it's in, it must be somewhere between the bit that takes the data and the bit that sends the data out.
Hell it's only a few K, and looking at it in a hex editor about 1/3 of the app is data, no code.
Though it was years and years ago, I'd done some 6502 assembler programming. In fact, I just pulled out my copy of "Programming the 6502" by Rodnay Zaks (Sybex Inc.). I was never that good at it, but I had a lot of fun trying!
Unfortunately, when I tried to download the binary ROM image of the 6502 code on his web page I got an error: Cannot find the file specified.
If/when that gets fixed, the next step would be to get a computer do the work. A quick check of google using the search string: "6502 disassembler" brought up about 337 matches. Granted, some of the disassemblers are designed to run on older hardware (e.g. Atari, Commodore 64), but I saw links for unix, windows, and DOS, too.
looking at your page http://www.dickwick.com/sign/sign.htm The first three values differ from each other by 15 and the next differs from the others by a multiple of 15, the only number from this set that is missing is 61, which would appear to might have been generated by 0 if that was a valid value. The ones with bit seven set are also 15 apart but offset 1 from the first group. If you keep repeating the process for other bytes you should find a connection with every bit postition/group and the resulting values of the checksum.
Use a lot of test cases. In fact, try everything. Start with a message length one, and send all possible messages. Then try similar, with a length of two. You won't need to try literally every combination, but assuming it's not super complicated, you ought to be able to get it from this.
First, get a good debugger. An old DOS version of SoftICE would do the trick.
Next, set it up to send a test message, like "test 12345". Then set a breakpoint in the code that starts transmitting to the serial port. That shouldn't be hard to figure out with some old DOS references.
Then run the program, and when the breakpoint hits search memory for your test string. Then you find the memory location of the buffer being sent by the application. Then set a memory breakpoint so the debugger will break when that buffer is read, and continue the program and have it send the message again. Now you should be able to zero in on the crc code, as it reads the buffer to compute the checksum.
Once you find the loop of asm code that's computing it, you should be able to understand the algorithm pretty easy.
Since most of the good advice I can think of for reverse engineering the check sum has already been posted, I'll suggest something orthogonal:
You might consider black-boxing the original app using DosEmu and glue code of your own devising. Stuff goes in, stuff comes out, and you're done.
-- MarkusQ
P.S. I'm assuming, of course, that this is just a one-off project, not something you plan to replicate.
You can run the DOS app emulated (using dosemu) or interpreted (using Bochs). You can then automate this by using something like "expect" to simulate typing at the simulated DOS window.
I played around with the sample data you provided. I came across some funny business which I am almost certain is an error in your spreadsheet, which throws the whole thing into doubt. Did you mean to repeat the slot encoding 0x05 for both slots 5 and 6? I'm going to assume it was a typo. If so, I learned the following about the impact the slot number has on the checksum:
8-bit-number
Bits == 76543210
Csum(x) is the value of the checksum with slot #x and all else held constant.
Csum(x) = Csum(0) +
(Bit0(x) ? -0x0F : 0) +
(Bit1(x) ? -0x1E : 0) +
(Bit2(x) ? +0x3C : 0) +
(Bit3(x) ? -0x78 : 0) +
(Bit4(x) ? -0xF0 : 0)
Note that the values added/subtracted for each bit follow a pattern:
0x0F = 00001111
0x1E = 00011110
0x3C = 00111100
0x78 = 01111000
0xF0 = 11110000
More data might shed some light on the pattern. Whatever the case, I think this is reminiscent of a CRC16, since I don't think a checksum would have this kind of behavior -- a standard addition checksum or a XOR-based checksum (even with bit rotation) would make a bit always add/subtract the same amount, but it would be a power of 2 (I think).
So now you need to find the CRC's polynomial, and I don't know enough about that kind of thing to help you. (And there is a chance that everything I've said here is wrong, since this is not my specialty!)
Time flies like an arrow. Fruit flies like a banana.
Most of the checksum algorithms employed in that era were extremely light weight and not at all good by crytographic measures. The 6502 has a register set that makes x86 look positively beefy by comparison.
This could work against a simple reverse analysis. The algorithm might be table based to alleviate register pressure, and it's clear to me that the topic author is not up to the task of fleshing out such a table by differential techniques. If he had that kind of talent, he wouldn't be asking us.
On the other hand, working with a tool such as SoftIce, it's not at all difficult to capture the checksum code red handed. I used to play Falcon 3.0 under SoftIce so I could repair the system crashes on the fly (especially the guaranteed hang when you hit a ground target with one specific type of munition). "R2, see what you can to about the power."
And yes, I bypassed many kinds of crashes and checks using SoftIce. It was far easier than you would think and I enjoyed the small challenges as much as any game I played. Differential analysis is on a par with learning to play Go because you enjoy of the humility of probing your boundless incompetence.
At the other extreme, the 6502 instruction set can be learned in a couple of hours. People forget how simple things used to be. The 6502 is a four function calculator on steroids. The average configuration file these days (Apache, SMB) takes longer to master.
Don't start with messages like "test". Send things like AAAAA and see what actually comes out. Find the lowest char it accepts, probably space (somewhat useless), maybe bang, maybe digits. But figure out the lowest it accepts, and preferably as few bits as possible, such as @ 0x40. Reset it and send strings of this, see how they show. You do need to try lots of variations. Send zillions of AAAAA and see how many other chars show before it repeats.
Infuriate left and right
Have you tried emailing this guy? According to his resume, he worked at SilentRadio from 1980-1997, eventually ending up as vice president. And if he doesn't know the answer, maybe he can get you in touch with Mike Levin, the guy who started the company.
I put sign.exe thru IDA and identified the checksum algorithm. I found out that the only thing that goes thru the checksum is 35 35 00 then 0E DA is skipped and then the rest is put thru. The algorithm is a simple crc alike algo that adds the chars xors with the length and rotates some bits. You can find a perl program i wrote to calculate the checksum for a given range at: this location.
Good luck with your project,
Gijs
I'm no asm programmer (esp not 6502), but I'm guessing it's (counting from zero at start of rom)
(02A1) BEQ 02B0 and
(02AE) BEQ 02C5
that should be changed from BEQ to BNE (branch equal/not equal).
I spent some time in dosemu debug, but there didn't seem to be any simple algorithm (again, I don't know asm).
Brute forcing a checksum algorithm is pretty much impossible unless they've used a standard one. The best bet imho is to either grab it out of the sign.exe or disable it in the sign.
I assume you've found the following resources, but just in case...
8 &t=69 which might also provide some people who could help...
There's a thread the 'The LED Forum' about this sign. The people there might be able to help... http://www.eio.com/public/led/0219.html
Also, one message in that thread is for a message at SignIndustry.com http://www.signindustry.com/mb/read.php?f=19&i=13
While the information you are after is not on these boards, at least you will be able to contact some people who have the boards - maybe they've found some more information.
Other than that, I can only repeat what a couple of other people have said here - do some very methodical testing, starting with all single character messages from a-z, A-Z, 0-9 and punctuation. Then try a fair range of two character messages with the same options as the first run. (aa-az, aA-aZ, a0-a9, ba-bz, etc, but there's no real need to try ALL combinations.) Then try a fair sampling of 3 char messages. Next try some longer strings, modifying one character at a time and the same string modifying each property. After that, try a range of single characters (say 10-15 different characters) with each of the options. Finally, do some 2 char, 3 char and a longer message with a reasonable set of different options.
If you gather a truck-load of data (several hundred sets, not just 100) you stand a better chance of figuring it out. You really need the trivial cases (single and two character strings) and some non-trivial cases (a phrase, not just 'TEST') to make it easier to reverse-engineer the algorithm.
As others have mentioned, given the time frame and the fact that it is 6502-based, the algorithm is not likely to be particularly complex, but working it out can take a lot of time!
Mean-while, I might have a look at the 6502 code and your test data to see if anything jumps out at me...
Good luck!
The nytimes website has a number of partner websites like this through which you can access the nytimes content without having to register. The partner websites use a URL containing a checksum which the nytimes website verifies before giving access. The checksum access only works for a limited time like one week after creation.
A valid nytimes url with checksum in URL as value of 'en'
e urope/22EURO.html?ex=1028001600&en=9bcfc7fe702d6bd 0&ei=5040&partner=MOREOVER)
(http://www.nytimes.com/2002/07/22/international/
By the time you read this the URL may no longer work; I got it as a redirect from this. You can find other working nytimes URLs by visiting www.moreover.com/news and looking for nytimes stories. Just grab the URL you see after moreover.com re-directs your browser to nytimes.com.
I've already emailed you, but for the rest of the slashdot crew.... A friend and I solved this for our sign about 1.5 years ago.
http://www.kuci.org/~jburley/LED/
Hope that helps!
google said there is a crc reverse engineering tutorial at http://www1.lunarpages.com/biw/tuts/crctut1.htm (oh look, it's written by the "c00l guy" "by anarchriz "... ;). didn't look at it, but it at least should give some insight into general CRC strategies.
First of all, don't be listening to the guys that are telling you to try a variety of input. They've obviously never done this before.
If the sign doesn't use one of the standard polynomial CRC algorithms, you're only real option is to disassemble the binary you have to dig out the algorithm. Even if it does uses a standard algorithm (It's probably CRC16), you'll still have to figure out what data it's using. You can guess and check, and you may get lucky. Disassembling is a sure thing, and probably the quickest way to go, but check your licensing agreement.
I recently reverse engineered the CRC for my LG TP-5250 cell phone which, as it turns out, uses the standard CRC16 algorithm, but is creative about which data is uses to compute the checksum. Without a dissassembler it would have taken months to figure out what bytes were being used and in what order. You can read about how/what I did here.
Heh - I was just working on this exact same thing after finding someone's post on piclist looking to reverse-engineer the same program. And to think, you only beat me to the solution by a few hours...
Except that you didn't, completely. The trickiest part of this checksum is that it shows up TWICE in the packet format. As you state, the checksum is initialized after the ff attention byte, and then most of the packet is run through it. However, looking at the initial packet:
FF 35 35 00 0E DA BC 01 00 00 08 00 19 02 54 45 53 54 1F 8C 52
^^^^^ ^^^^^
both of the indicated portions are checksums. The first checksum covers the string "35 35 00 0E" and the second covers the string "35 35 00 0E 01 00 00 08 00 19 02 54 45 53 54 1F". The checksum is not re-initialized after it's output. Incidentally, here's my perl version - it uses bit operations in preference to the %-heavy code that you use, but it's the same algorithm:
#!perl
$check = 0xa55a;
$sum=0;
@bytes = (@ARGV);
for $pos (0..$#bytes) {
$sum += (hex($bytes[$i]) ^ ($pos & 0xff));
$sum = $sum & 0xffff;
$check += $sum;
$check = (($check >> 1) & 0x7fff) | (($check & 1) << 15);
}
printf "%04x\n", $check;
__END__
>First of all, don't be listening to the guys that are telling you to try a variety of input. They've obviously never done this before. Bah! I have reverse engineered a few (weak) checksum / password encryption schemes simply by charting a sequential pattern of input (generally start with AAA, AAB, AAC, AAD ... it may take a dozen and it may take a thousand or so) and just eyeballing the output. Sometimes it helped to chart the output but most the stuff today's VB script kiddies put together is painfully obvious - linear processions, character switching, 'code-wheeling' and progressive 'code-wheeling'. Well anything that doesn't involve bit switching ... once they start bit smashing it takes more than just eyeballing it to figure it out. Just because PGP level encryption exists doesn't mean everybody bothers to use it - programmers come in all shapes and flavors.
Try brute forcing it, sometimes you get lucky =)
Glonoinha the MebiByte Slayer