Reversing a Checksum Algorithm?
Todd asks: "Does anyone have good suggestions or advice regarding techniques for determining the checksum of a serial data protocol? I am currently working on a fun little project using a digital display sign which uses an unknown serial protocol. I cannot move on with the project until I figure this dreaded checksum out."
Is it a very simple machine with a very simple processor and limited memory? If so, they may well have written their own with shifts and xors. On the other hand, if it is a modern processor with megabytes of memory, they probably have used a library function. Can you get access to the standard libraries for that chip?
How long is the checksum? What happens if you feed a wrong one? If it is just a single byte, and the box doesn't lock up completely on error, try all possible values for every packet!
Why do you think they put the checksum in the protocol? To catch a few dropped bits over a bad line, or to prevent tampering? If the later, watch out for some real cryptography. If only the former, assume they took the path of least resistance.
In Murphy We Turst
looking at your page http://www.dickwick.com/sign/sign.htm The first three values differ from each other by 15 and the next differs from the others by a multiple of 15, the only number from this set that is missing is 61, which would appear to might have been generated by 0 if that was a valid value. The ones with bit seven set are also 15 apart but offset 1 from the first group. If you keep repeating the process for other bytes you should find a connection with every bit postition/group and the resulting values of the checksum.
I put sign.exe thru IDA and identified the checksum algorithm. I found out that the only thing that goes thru the checksum is 35 35 00 then 0E DA is skipped and then the rest is put thru. The algorithm is a simple crc alike algo that adds the chars xors with the length and rotates some bits. You can find a perl program i wrote to calculate the checksum for a given range at: this location.
Good luck with your project,
Gijs
I assume you've found the following resources, but just in case...
8 &t=69 which might also provide some people who could help...
There's a thread the 'The LED Forum' about this sign. The people there might be able to help... http://www.eio.com/public/led/0219.html
Also, one message in that thread is for a message at SignIndustry.com http://www.signindustry.com/mb/read.php?f=19&i=13
While the information you are after is not on these boards, at least you will be able to contact some people who have the boards - maybe they've found some more information.
Other than that, I can only repeat what a couple of other people have said here - do some very methodical testing, starting with all single character messages from a-z, A-Z, 0-9 and punctuation. Then try a fair range of two character messages with the same options as the first run. (aa-az, aA-aZ, a0-a9, ba-bz, etc, but there's no real need to try ALL combinations.) Then try a fair sampling of 3 char messages. Next try some longer strings, modifying one character at a time and the same string modifying each property. After that, try a range of single characters (say 10-15 different characters) with each of the options. Finally, do some 2 char, 3 char and a longer message with a reasonable set of different options.
If you gather a truck-load of data (several hundred sets, not just 100) you stand a better chance of figuring it out. You really need the trivial cases (single and two character strings) and some non-trivial cases (a phrase, not just 'TEST') to make it easier to reverse-engineer the algorithm.
As others have mentioned, given the time frame and the fact that it is 6502-based, the algorithm is not likely to be particularly complex, but working it out can take a lot of time!
Mean-while, I might have a look at the 6502 code and your test data to see if anything jumps out at me...
Good luck!