Reverse Engineering of a Graphics Format?
Jimbo God of Unix asks: "I recently purchased a color laser (Samsung CLP-500) because it claimed to have Linux compatibility. It does, mostly. However, I was irritated to find that the drivers are proprietary (splc, Samsung Printer Language - Color) and somewhat cranky to get working. I was hoping to find some good resources on reverse engineering the graphics format used to drive the printer. I've managed to mostly dissect the file format, so I think I can get the graphics data out, but I don't really know how to proceed to the next step. Are there any good resources for figuring out how to reverse engineer the graphics format? Are there any tools out there that will help me analyze the format (other than hexdump) or tell me if it's close to something else so I don't have to do as much work?"
"I have something of an advantage since I can compare the output from the Windows driver to the Linux driver, and I was able to dissect the Windows output file from the info gleaned by dissecting the Linux output file. But I'm kind of stuck at the moment and there don't seem to be too many documents or tools out there for dissecting graphics data.
I thought this might be useful for reverse engineering some of the proprietary image compression formats for web cams as well, but that's a project for another day."
I thought this might be useful for reverse engineering some of the proprietary image compression formats for web cams as well, but that's a project for another day."
I don't understand how companies can sell printers that don't support Postscript. On the other hand, this seems to be a case where a company heard complaints from its customers, and corrected thier bad practices (the toner issue, and Postscript support).
If it's a head control language or something you might in trouble, but if it's simply an image being sent you should be able to figure it out eventually.
The best way to reverse engineer a graphics format is to use a collection of sample images to get a high level idea of what is going on. Choose the images in a way that will give you the most information.
Make sure the printouts always the same size, layout, color depth, margins, etc. It does no good to compare an A4 grayscale image to a color letter sized one.
If you're operating under the assumption that it's a simple bitmap, the following may work.
1. Is it compressed?
Print out a page with some dots on a colored background.
Print out a page with more dots on it.
Are they the same size?
If so, most it's most likely a bitmap.
If not, it's probably compressed.
What type of compression is it?
Print out a page which is half white, half another color.
Print out another page which is checkered (with *very* small squares) half white, half the other color.
Is one smaller than the other, if so it may be compressed. If it is, it *could* be Jim-Bobs compression algorithm, but programmers are lazy so it's most likely something off-the-shelf.
If it's the half-and-half print that is smaller, it's either RLE or something like JPG (most likely RLE as JPG is lossy -- compare a gradient print to find out if it's RLE or not).
If it's the checkered print then it's probably LZW.
If neither is smaller, re-evaluate your compression assessment.
2. Create a decompresser to test your decompression theory.
Print a colored page.
Print a second colored page a couple of changes.
If you can't create two data dumps of (relatively) equal size from the input data, you're probably wrong about the compression algorithm.
If they are the same size you may be going in the right direction. (If they're exactly the same size be very happy).
3. Guestimate packing.
Print a cyan* page, a yellow page, a magenta page and a white page. Take a look at the first four bytes or 16 bit words. If you've got clearly observable patterns (ff 0 0 0; 0 ff 0 0; 0 0 ff 0; etc.) you're in luck. If not try to work out the packing order. Just keep in mind, if it's a bitmap, and you've got the decompression down, and the page is one color *eventually* you will find a repeating pattern that represents that color.
4. Visualize the decompressed data.
The best way from this point is to find a way to visualize what you've got. In the past I had stock BMP code that I would use to generate a new displayable image, but I've also created custom apps to display it.
If the resulting image looks right but is a funky color, it's packing.
If the resulting image looks like it *could* be close but has a lot of shear, play with your assumed width and height.
If it looks like static and you've previously determined that you're dealing with 16 bit values, try changing the byte order and try again.
5. Lather, rinse and repeat.
Despite what the nay-sayers want you to do. Don't give up. Figuring out someone's attempt to hide data from you is a reward you give yourself. Even if it takes days or weeks, when the light goes on and you think, "Ah ha! I've got you now you bastard!", it makes the time worth it -- at least for me it always did.
Besides, if you do get it working, you can release it and make Open Source better by your efforts.
* Remember, it's cmyk, not rgb.