Slashdot Mirror


Gzip on a PCI card

steve writes "The German tech news site heise.de is reporting here (in German, of course) about a PCI card developed by the Universiy of Wuppertal and Vigos AG being shown at CeBIT, which does Gzip compression in hardware, thus freeing the CPU to do other tasks. The PCI card can compress 32MB/sec, which is more than enough to compress a 100Mbit LAN in realtime. A future version will do 64MB/sec. The article mentions that this will be of particular interest for web servers. The card should be on sale by the end of the year."

12 of 141 comments (clear)

  1. Useful for netbackups too by walt-sjc · · Score: 5, Insightful

    Seems this would be a great help to those doing backups over a LAN. Shouldn't take too much to alter a version of tar , rsync, etc. to use this card.

    1. Re:Useful for netbackups too by Bazzargh · · Score: 4, Informative

      rsync doesnt use gzip, or the deflate algorithm - it uses the Burrows-Wheeler Transform, same as used in bzip2. If you read Tridge's thesis you'll see that he actually proposes an rzip algorithm based on the BWT and his work on rsync that compresses better than gzip or bzip2 on typical files.

      -Baz

  2. bandwidth saving by buro9 · · Score: 5, Insightful

    the key to using gzip is really not to compress at too high a ratio... a low rate of compression offers a pretty sizeable saving in bandwidth for an acceptable CPU usage... once you edge up to the higher compression levels then you pay for it in the CPU and your app slows.

    i love the idea of a hardware based gzip... but i'd start by educating the software users on the cost vs benefit ratio of their existing configuration... i always seem to find that those who don't know what they're doing are the ones that have it set to maximum compression

  3. A bzip2 version would be nice ... by geirt · · Score: 4, Insightful

    I try to avoid bzip2 because it is so slow, even on modern hardware. bzip2 compresses very well, much better than gzip. A bzip2 version of this card makes sense ....

    --

    RFC1925
    1. Re:A bzip2 version would be nice ... by arvindn · · Score: 5, Informative
      No, bzip2 is something that won't work for applications like serving web pages.

      gzip works with streams, producing input as it gets output. OTOH bzip2 treats the input as blocks. Thus it needs to get a whole block before it produces any output. Similarly the client needs to get a whole block of data before it can even start rendering the page. The man page of bzip2 says that the default block size is 900,000 (!) bytes. So while using bzip2 may improve bandwidth it will result in large latency.

    2. Re:A bzip2 version would be nice ... by ianezz · · Score: 4, Interesting
      gzip works with streams, producing input as it gets output. OTOH bzip2 treats the input as blocks.

      Gzip works with blocks of data too, but the block size is 32KB instead of nearly 1MB and it is not nearly as CPU intensive as bzip2, so this is why it appears to produce a continuous stream of compressed data (even if, strictly speaking, it doesn't).

      Gzip just seems to be a well-balanced compromise between resources and resulting compression ratio, plus it is Free Software (hint: bzip2 is Free Software too, but Rar isn't).

  4. Re:Hardware Gzip by Lord+Sauron · · Score: 4, Funny

    A hardware that does the dirty processing job while freeing the CPU ? Wow, that's new. I'm going to the USPTO to get my patent on this.

    Maybe I can even make some money on Intel, as they were in clear violation of my patent with their arithmetic coprocessor for use with the 80386SX family of microprocessors .

  5. How cute but useless. by _Eric · · Score: 5, Interesting

    The general trend in the industry goes to non-intelligent interconnections (Gigabit card used to have a processor (Alteon), they don't anymore (see latest intels)). I2O never took off because you don't really need to relieve a computer from computation when your computation power is pletoric.

    On a Xeon 2.8GHz, I just got 71 MB/s for gzip.

    What's the use for such hardware then?

    Plus it will eat the PCI bus because data has to go out of memory to processing card, back to memory, then to network card. You triple the PCI bus bandwidth. (Not true if the compression is embedded in the network card).

  6. Reconfigurable by KingPrad · · Score: 5, Interesting
    This is cool - dedicated chips can process monstrous amounts of data and much faster than a general purpose CPU. So it's a good idea to let this card do the heavy lifting of compression. Of course the use extends to many types of data analysis: encryption, scientific number crunching, graphics compression.

    The best idea would be to make the chip an FPGA not a specially-designed processor. Then you could load in different chip designs for whatever was currently needed. Need to do RSA encryption? The board reconfigures the FPGA for it. Same goes for Divx compression, gzip, SETI@Home, etc. FPGAs take a few milliseconds to reconfigure but when they operate as a dedicated signal processor they can leave a general purpose processor in the dust - leaving the main CPU to run the other apps, the desktop, etc.

    Check out the IEEE archives and journals, searching for "adaptive computing" or "reconfigurable computing".

    KingPrad

    --
    Stop the Slashdot Effect! Don't read the articles!
  7. Cool by arvindn · · Score: 5, Informative
    gzip was designed with such considerations in mind. Throughput of the algorithm took precedence over compression level. Good to see their farsightedness paying off. And the algorithm is pretty simple so that it can be implemented in hardware directly.

    Another thing about gzip is that it is assymmetric: decompression is much faster than compression. Again this is a nice feature, because most files will be decompressed many times but compressed only once. Thus for instance, all man pages are stored in gzipped form and decompressed on demand.

    But I can't see the point of implementing it in a PCI card. Wouldn't it be better to integrate it with either the processor or the network interface?

  8. Not quiet yet... by buzzbomb · · Score: 4, Informative

    The article mentions that this will be of particular interest for web servers.

    I'm assuming one is referring to something that will work with mod_gzip. That may be fine and dandy, but I just recently had to disable mod_gzip on my server. You can blame Microsoft.[1] It seems that both IE 5.5 and 6.0 have nasty little "sometimes" bugs[2] where they won't know what do with gzipped content. I tried to disable by user agent header with no luck. If anyone else has some good pointers or perhaps even a link to a patched version of mod_gzip that'll avoid those two bugs, I would apprieciate it.

    [1] No, really. This isn't a troll. They even admit the bugs.
    [2] Microsoft Knowledge Base Articles: Q313712 IE 5.5 Q312496 IE 6.0

  9. You have an important point... by mnmn · · Score: 4, Interesting


    When the PCI bus is taken, other stuff that the CPU needs to do will also be halted. And then the PCI bus is much slower than the FSB.

    I think what we need to push distributed computing more is altering the RAM and DMA channels. There should be many physical channels to the RAM capable of simultaneously reading/writing different parts of it. As in if the ram can output 200 MB per sec, 16 devices could attach themselves to the RAM via maybe EDMA (enhanced DMA?) and simultaneously be able to read at 200MB each. This might be done by:

    (1) Altering the addressing logic in the memory ICs, maybe put 16 different addressing systems and multiply their pins x16. Then have an external matrix, more advanced than the 802x DMA chip to allow simultaniety.

    (2) Seperate the addressing schemes of each chip, so an OS kernel could smartly put data of important processes in the right chip to be worked on by external devices.. again also having an external matrix for the address multiplexing.

    This way such a PCI gzip device could have its PCI address space, IRQ as well as (EDMA?) address which it would use to access the data to gzip and put back into the RAM, at full speed, not taking up RAM bandwidth, PCI bandwidth, IRQs or the CPU at all.

    The AGP as achieved this by seperating the AGP channel from PCI, but still using dedicated memory rather than smartly-shared memory. I understand multiprocessor systems technically do the same thing, but in this case we are treating the external devices like complete slaves, like the GPU, for only dedicated purposes, and I'm emphasizing the smart sharing of memory that doesnt exist in multiprocessor systems either. In this scheme, one could add CPU cards, maybe hot-plugged, and have insta-multiprocessor system or use it to offload kernel compilation, zipping, 3d transformations, or even take user tasks while the main CPU just works in supervisor mode.

    --
    "Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky