Reverse Engineering of a Graphics Format?
Jimbo God of Unix asks: "I recently purchased a color laser (Samsung CLP-500) because it claimed to have Linux compatibility. It does, mostly. However, I was irritated to find that the drivers are proprietary (splc, Samsung Printer Language - Color) and somewhat cranky to get working. I was hoping to find some good resources on reverse engineering the graphics format used to drive the printer. I've managed to mostly dissect the file format, so I think I can get the graphics data out, but I don't really know how to proceed to the next step. Are there any good resources for figuring out how to reverse engineer the graphics format? Are there any tools out there that will help me analyze the format (other than hexdump) or tell me if it's close to something else so I don't have to do as much work?"
"I have something of an advantage since I can compare the output from the Windows driver to the Linux driver, and I was able to dissect the Windows output file from the info gleaned by dissecting the Linux output file. But I'm kind of stuck at the moment and there don't seem to be too many documents or tools out there for dissecting graphics data.
I thought this might be useful for reverse engineering some of the proprietary image compression formats for web cams as well, but that's a project for another day."
I thought this might be useful for reverse engineering some of the proprietary image compression formats for web cams as well, but that's a project for another day."
If there's a chance it's a plain bitmap, simple visualization can reveal lots of patterns not evident in a hex dump.
I believe the old wardialing tool Toneloc had a mode, or a companion program, to display logs this way. It was easy to see things like "numbers ending in -0100 never get answered" as a vertical red line, for instance.
The important things would be an adjustable margin, to "wrap" the pixels at varying widths, and adjustable bit depth, so you can discover odd packings that might not otherwise be apparent.
If the data might be compressed, have a look at the article Hacking Data Compression for a great, if slightly dated, conceptual overview.
ERANAI (I am not a reverse engineer), but I hope this helps. Let us know if you have any luck!
and reside in the USA?
My next step would be to get a good lawyer to find out if what you are doing will open yourself up to potential legal action.
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
I can't help you with your question, I have no experience with reverse engineering.
But for others who don't want to have the same problem: you should have checked www.linuxprinting.org, which says of the Samsung CLP-500:
Samsung supports this printer with proprietary drivers which come with the printer on its driver CD or can be downloaded on the web sites of Samsung. Unfortunately, these drivers do not work necessarily with all Linux distributions and there are no free drivers available. As it is also not sure whether Samsung will update their drivers for future Linux versions, this printer cannot be recommended.
I would try to get the proprietary driver to work, basically by getting the distro it was made for, or at least finding out why it works there but not on your distro - probably it needs some specific kernel image that it was compiled with, which would suck...
I believe posters are recognized by their sig. So I made one.
Never buy anything that claims to work with Linux. Buy things that Linux supports.
Unless you're just adventurous that way, and want to write drivers.
I don't understand how companies can sell printers that don't support Postscript. On the other hand, this seems to be a case where a company heard complaints from its customers, and corrected thier bad practices (the toner issue, and Postscript support).
If you cannot get specs you really only have one choice: trial, error, and compare. Print a blank page, then print a page with on pixel. Then print with two pixels. Start simple and make things more complex.
It helps greatly if you buy (or build if you can) some sort of hardware trace tool. I've used this for SCSI devices before, good ones will give you all the data that is transferred to/from the device in question.
If this was simple everyone would do it. However it is complex, and generally boring. A half functioning drive is worthless.
P.S. a better idea would be to return this printer now while you still can. Buy a printer that supports postscript. That hits the bottom line of companies who pull these tricks and in the end is worth more to the linux comunity.
PGM is the easiest format to reverse engineer out there; it's an ASCII file with RGB values and some headers.
:-D
Useful for those wanting to muck with images directly from code. I learned about that last week, and I'm having fun with neural nets
.. such as a page full of checkerboard or something, and work from there.
.. and look for that pattern at each stage through the pipeline ...
if you can see the 'obvious' change in pattern in the file, you've got a lead. but the important thing is to start from the very beginning with something you know
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
The truth is, you're probably never going to reverse engineer a decent driver.
If the linux driver is flakey, it's probably because the printer's firmware is itself flakey, and the Windows driver just contains innumerable hacks to get around the problems that keep cropping up.
Take the thing back, complain that you can't get it working under Linux, and buy a different one.
If it's a head control language or something you might in trouble, but if it's simply an image being sent you should be able to figure it out eventually.
The best way to reverse engineer a graphics format is to use a collection of sample images to get a high level idea of what is going on. Choose the images in a way that will give you the most information.
Make sure the printouts always the same size, layout, color depth, margins, etc. It does no good to compare an A4 grayscale image to a color letter sized one.
If you're operating under the assumption that it's a simple bitmap, the following may work.
1. Is it compressed?
Print out a page with some dots on a colored background.
Print out a page with more dots on it.
Are they the same size?
If so, most it's most likely a bitmap.
If not, it's probably compressed.
What type of compression is it?
Print out a page which is half white, half another color.
Print out another page which is checkered (with *very* small squares) half white, half the other color.
Is one smaller than the other, if so it may be compressed. If it is, it *could* be Jim-Bobs compression algorithm, but programmers are lazy so it's most likely something off-the-shelf.
If it's the half-and-half print that is smaller, it's either RLE or something like JPG (most likely RLE as JPG is lossy -- compare a gradient print to find out if it's RLE or not).
If it's the checkered print then it's probably LZW.
If neither is smaller, re-evaluate your compression assessment.
2. Create a decompresser to test your decompression theory.
Print a colored page.
Print a second colored page a couple of changes.
If you can't create two data dumps of (relatively) equal size from the input data, you're probably wrong about the compression algorithm.
If they are the same size you may be going in the right direction. (If they're exactly the same size be very happy).
3. Guestimate packing.
Print a cyan* page, a yellow page, a magenta page and a white page. Take a look at the first four bytes or 16 bit words. If you've got clearly observable patterns (ff 0 0 0; 0 ff 0 0; 0 0 ff 0; etc.) you're in luck. If not try to work out the packing order. Just keep in mind, if it's a bitmap, and you've got the decompression down, and the page is one color *eventually* you will find a repeating pattern that represents that color.
4. Visualize the decompressed data.
The best way from this point is to find a way to visualize what you've got. In the past I had stock BMP code that I would use to generate a new displayable image, but I've also created custom apps to display it.
If the resulting image looks right but is a funky color, it's packing.
If the resulting image looks like it *could* be close but has a lot of shear, play with your assumed width and height.
If it looks like static and you've previously determined that you're dealing with 16 bit values, try changing the byte order and try again.
5. Lather, rinse and repeat.
Despite what the nay-sayers want you to do. Don't give up. Figuring out someone's attempt to hide data from you is a reward you give yourself. Even if it takes days or weeks, when the light goes on and you think, "Ah ha! I've got you now you bastard!", it makes the time worth it -- at least for me it always did.
Besides, if you do get it working, you can release it and make Open Source better by your efforts.
* Remember, it's cmyk, not rgb.
They know printers. They know lots about printers, and printer languages. My guess is that they'll be thrilled to get an opportunity to hack another printer working. I know when I bought a printer that has "PS Support", it had a postscript driver in software that talked a propriatary protocol to the printer. They would have gladly written the output driver for it, but they didn't know how it worked.
Maybe if you know how it works, you'll be able to get them to do something with it.
Kirby
There's a bunch of info on the CLP-500 here that might help. There are lots and lots of comments from users with both good and bad results and the distros they used.
Check this out:
Good Luck!
ANAIE == (I am not a reverse engineer)
and buy a printer that has proper postscript 3 support. your life will be much easier as will your prints. it looks like the 550 has postscript 3 for instance (yes it's more expensive but the lack of headaches will be worth the cost)...
really folks, when you buy a printer don't just look at features and speed. look at the printer languages it features. if it only features a proprietary language (like yours does), be prepared for what you are getting into. pcl5 is okay, but postscript 3 is where its at.
especially because postcript 3 has native support for transparant images (ever wonder why when you print a page of text with graphics that have white borders from a low end printer it produces a box that has a dithered crosshatches in it?) which is really important for b&w printers. having worked in a large copying and printing business i always chuckle when i see printers that can't halftone worth a damn...
Large print giveth, and the small print taketh away
Color laser printer, max. 1200x1200 dpi, this is a Paperweight
Doh!
an fft is usually quite useful in trying to deconstruct binary formats, all of the fixed-length parts of the encoding show up as frequency spikes. as someone else mentioned, if it employs compression at any level then you're pretty much sol.
seriously though, its not worth it to write a driver for a single instance of a device. and if you dont have adequate documentation, the bar has to be even higher to make it worthwhile. if its really that trivial for you to do, you should get a real job doing it, throw away the printer, and buy one that works.
(yes, i do reverse engineering professionally, and was stupid enough once to write a driver for a scsi tape that wasn't what i thought i was buying)
Probably BIG bi-level compression (load jbigkit from the web -- it will give you a starting point). Probably separate maps for each of the colours. Embedded command language for the rest -- look on the cable for details.
'K? [I think that covers most of the current crop of printers]. Next time, buy a PostScript device.
Ratboy
Just another "Cubible(sic) Joe" 2 17 3061
"Whatever you do anyways always opens you to potential legal action."
I'm unzipping my pants.
As in: I Am Not A Reenigne?
Got time? Spend some of it coding or testing
I've RE'd a bunch of stuff, from DRM protection (http://crazney.net/programs/itunes/), Audio Codecs, network protocols and file formats. I use all sorts of nifty tools, most of which I wrote myself.
For a graphics format, however, I'd be inclined to go for disassembly of the proprietary driver. Perhaps you could try various test cases (scan a white sheet of paper, what's the data look like? Try a black, red, green, blue.. etc). But if it's compressed with some unknown algorithm (like the Audio codec that I've reversed) I don't like your chances of getting it that way.
There are a bunch of disassemblers around, I have written my own (which isn't available publically cause it's still too shit) but I would highly recommend Datarescue's IDA. Old versions work fine in wine.
However, something to be mindful of: Just rewriting their binary driver in C is copyright violation, make sure you properly document the spec and then do a cleanroom implementation.
David.
stuff
If we spend all of our time looking over our shoulders, nothing gets done, the enemy has won and we might as well not even try.
So I become a martyr to some random cause? Even the Bible says some evil must come, but I hope one of the DMCA framers is reading this, so they can also read the followup to it, or this concise description of the RIAA's activities.
Got time? Spend some of it coding or testing
This might be irrelevant to graphics, but I think french cafe analogy written by Andrew Tridgell who developed Samba is a good reference on how to do reverse engineering (or in his term: network analysis or protocol analysis) in general.
Wow, you're right! Nearly a $3 difference!
CLP-500 Laser Printer (21 PPM, 1200x1200 DPI, Color, 64MB, PC) from $341.00
CLP-550 Laser Printer (1200x1200 DPI, Color, 64MB, PC/Mac) from $344.00
If you need a high-quality wysiwyg printing, and are not afraid of a long wait, there is a "big crowbar" approach. Render your pages to a raster image via a decent rendering engine. Ghostscript does well on some fonts, not on others. Batik does XSL:FO well, though slowly. Render to a hi-res bitmap at the given resolution, and send that to the printer. Hopefully it can be sent as a stream, and not require buffering. It is slow and inefficient, but it can quite often work around difficult situations.
Wine is apparently able to use Windows printer drivers. I've never used this feature myself and there doesn't seem to be that much info about it but this may be worth examining.