Slashdot Mirror


Reverse Engineering of a Graphics Format?

Jimbo God of Unix asks: "I recently purchased a color laser (Samsung CLP-500) because it claimed to have Linux compatibility. It does, mostly. However, I was irritated to find that the drivers are proprietary (splc, Samsung Printer Language - Color) and somewhat cranky to get working. I was hoping to find some good resources on reverse engineering the graphics format used to drive the printer. I've managed to mostly dissect the file format, so I think I can get the graphics data out, but I don't really know how to proceed to the next step. Are there any good resources for figuring out how to reverse engineer the graphics format? Are there any tools out there that will help me analyze the format (other than hexdump) or tell me if it's close to something else so I don't have to do as much work?" "I have something of an advantage since I can compare the output from the Windows driver to the Linux driver, and I was able to dissect the Windows output file from the info gleaned by dissecting the Linux output file. But I'm kind of stuck at the moment and there don't seem to be too many documents or tools out there for dissecting graphics data.

I thought this might be useful for reverse engineering some of the proprietary image compression formats for web cams as well, but that's a project for another day."

12 of 62 comments (clear)

  1. A few general hints? by Myself · · Score: 4, Informative

    If there's a chance it's a plain bitmap, simple visualization can reveal lots of patterns not evident in a hex dump.

    I believe the old wardialing tool Toneloc had a mode, or a companion program, to display logs this way. It was easy to see things like "numbers ending in -0100 never get answered" as a vertical red line, for instance.

    The important things would be an adjustable margin, to "wrap" the pixels at varying widths, and adjustable bit depth, so you can discover odd packings that might not otherwise be apparent.

    If the data might be compressed, have a look at the article Hacking Data Compression for a great, if slightly dated, conceptual overview.

    ERANAI (I am not a reverse engineer), but I hope this helps. Let us know if you have any luck!

    1. Re:A few general hints? by zero-one · · Score: 2, Informative

      I think Axe (http://www.jbrowse.com/products/axe/screenshots.s html ) can do something like that.

  2. Check linuxprinting.org first by Scarblac · · Score: 4, Informative

    I can't help you with your question, I have no experience with reverse engineering.

    But for others who don't want to have the same problem: you should have checked www.linuxprinting.org, which says of the Samsung CLP-500:

    Samsung supports this printer with proprietary drivers which come with the printer on its driver CD or can be downloaded on the web sites of Samsung. Unfortunately, these drivers do not work necessarily with all Linux distributions and there are no free drivers available. As it is also not sure whether Samsung will update their drivers for future Linux versions, this printer cannot be recommended.

    I would try to get the proprietary driver to work, basically by getting the distro it was made for, or at least finding out why it works there but not on your distro - probably it needs some specific kernel image that it was compiled with, which would suck...

    --
    I believe posters are recognized by their sig. So I made one.
  3. This isn't really helpful, but... by xenephon · · Score: 5, Informative
    You should have bought the CLP-550. I avoided the CLP-500 for just this reason (and the fact that I heard bad things about its OS X support, as well). The CLP-550 supports Postscript, and works fine from my Mac, my Linux box, and my Windows box. Another advantage to the CLP-550 over the 500 is that the 550 comes with full toner cartridges; the cartridges which ship with the 500 are only half full.

    I don't understand how companies can sell printers that don't support Postscript. On the other hand, this seems to be a case where a company heard complaints from its customers, and corrected thier bad practices (the toner issue, and Postscript support).

  4. trial, error, and compare by bluGill · · Score: 3, Informative

    If you cannot get specs you really only have one choice: trial, error, and compare. Print a blank page, then print a page with on pixel. Then print with two pixels. Start simple and make things more complex.

    It helps greatly if you buy (or build if you can) some sort of hardware trace tool. I've used this for SCSI devices before, good ones will give you all the data that is transferred to/from the device in question.

    If this was simple everyone would do it. However it is complex, and generally boring. A half functioning drive is worthless.

    P.S. a better idea would be to return this printer now while you still can. Buy a printer that supports postscript. That hits the bottom line of companies who pull these tricks and in the end is worth more to the linux comunity.

  5. PGM by Knights+who+say+'INT · · Score: 2, Informative

    PGM is the easiest format to reverse engineer out there; it's an ASCII file with RGB values and some headers.

    Useful for those wanting to muck with images directly from code. I learned about that last week, and I'm having fun with neural nets :-D

  6. Be Methodical by maeglin · · Score: 5, Informative

    If it's a head control language or something you might in trouble, but if it's simply an image being sent you should be able to figure it out eventually.

    The best way to reverse engineer a graphics format is to use a collection of sample images to get a high level idea of what is going on. Choose the images in a way that will give you the most information.

    Make sure the printouts always the same size, layout, color depth, margins, etc. It does no good to compare an A4 grayscale image to a color letter sized one.

    If you're operating under the assumption that it's a simple bitmap, the following may work.

    1. Is it compressed?

    Print out a page with some dots on a colored background.

    Print out a page with more dots on it.

    Are they the same size?

    If so, most it's most likely a bitmap.

    If not, it's probably compressed.

    What type of compression is it?

    Print out a page which is half white, half another color.

    Print out another page which is checkered (with *very* small squares) half white, half the other color.

    Is one smaller than the other, if so it may be compressed. If it is, it *could* be Jim-Bobs compression algorithm, but programmers are lazy so it's most likely something off-the-shelf.

    If it's the half-and-half print that is smaller, it's either RLE or something like JPG (most likely RLE as JPG is lossy -- compare a gradient print to find out if it's RLE or not).

    If it's the checkered print then it's probably LZW.

    If neither is smaller, re-evaluate your compression assessment.

    2. Create a decompresser to test your decompression theory.

    Print a colored page.

    Print a second colored page a couple of changes.

    If you can't create two data dumps of (relatively) equal size from the input data, you're probably wrong about the compression algorithm.

    If they are the same size you may be going in the right direction. (If they're exactly the same size be very happy).

    3. Guestimate packing.

    Print a cyan* page, a yellow page, a magenta page and a white page. Take a look at the first four bytes or 16 bit words. If you've got clearly observable patterns (ff 0 0 0; 0 ff 0 0; 0 0 ff 0; etc.) you're in luck. If not try to work out the packing order. Just keep in mind, if it's a bitmap, and you've got the decompression down, and the page is one color *eventually* you will find a repeating pattern that represents that color.

    4. Visualize the decompressed data.

    The best way from this point is to find a way to visualize what you've got. In the past I had stock BMP code that I would use to generate a new displayable image, but I've also created custom apps to display it.

    If the resulting image looks right but is a funky color, it's packing.

    If the resulting image looks like it *could* be close but has a lot of shear, play with your assumed width and height.

    If it looks like static and you've previously determined that you're dealing with 16 bit values, try changing the byte order and try again.

    5. Lather, rinse and repeat.

    Despite what the nay-sayers want you to do. Don't give up. Figuring out someone's attempt to hide data from you is a reward you give yourself. Even if it takes days or weeks, when the light goes on and you think, "Ah ha! I've got you now you bastard!", it makes the time worth it -- at least for me it always did.

    Besides, if you do get it working, you can release it and make Open Source better by your efforts.

    * Remember, it's cmyk, not rgb.

  7. Quick googling found this by martyb · · Score: 3, Informative

    There's a bunch of info on the CLP-500 here that might help. There are lots and lots of comments from users with both good and bad results and the distros they used.

    Check this out:

    http://www.linuxprinting.org/show_printer.cgi?recn um=Samsung-CLP-500

    Good Luck!

  8. Your Samsung Printer by ratboy666 · · Score: 3, Informative

    Probably BIG bi-level compression (load jbigkit from the web -- it will give you a starting point). Probably separate maps for each of the colours. Embedded command language for the rest -- look on the cable for details.

    'K? [I think that covers most of the current crop of printers]. Next time, buy a PostScript device.

    Ratboy

    --
    Just another "Cubible(sic) Joe" 2 17 3061
  9. Disassemble the driver by crazney · · Score: 2, Informative

    I've RE'd a bunch of stuff, from DRM protection (http://crazney.net/programs/itunes/), Audio Codecs, network protocols and file formats. I use all sorts of nifty tools, most of which I wrote myself.

    For a graphics format, however, I'd be inclined to go for disassembly of the proprietary driver. Perhaps you could try various test cases (scan a white sheet of paper, what's the data look like? Try a black, red, green, blue.. etc). But if it's compressed with some unknown algorithm (like the Audio codec that I've reversed) I don't like your chances of getting it that way.

    There are a bunch of disassemblers around, I have written my own (which isn't available publically cause it's still too shit) but I would highly recommend Datarescue's IDA. Old versions work fine in wine.

    However, something to be mindful of: Just rewriting their binary driver in C is copyright violation, make sure you properly document the spec and then do a cleanroom implementation.

    David.

    --
    stuff
    1. Re:Disassemble the driver by k98sven · · Score: 2, Informative

      Just to pick a few nits..

      However, something to be mindful of: Just rewriting their binary driver in C is copyright violation,

      Well, that may depend on the EULA. But assuming either that the EULA doesn't forbid reverse engineering, or that you're willing to bet that the purpose your work will qualify as 'intercompatibility' in court. (Which it should, but not everyone wants to take that risk.)

      Anyway, if you rewrite it without duplicating their code, you're not infringing their copyright.

      make sure you properly document the spec and then do a cleanroom implementation.

      That is not clean-room, since the developer is already 'tainted' by having seen the code. A real clean-room implementation would be to give the spec to someone else, who would then do the implementation without ever having seen the proprietary code.

  10. Using a Windows printer driver in Wine by enosys · · Score: 2, Informative

    Wine is apparently able to use Windows printer drivers. I've never used this feature myself and there doesn't seem to be that much info about it but this may be worth examining.