Gzip on a PCI card

ahem by Anonymous Coward · 2003-03-19 01:53 · Score: 0

"for sale", not "on sale"

Re:ahem by REBloomfield · 2003-03-19 03:45 · Score: 1

erm, no, i think on sale is fine :)
But of course, i may be wrong...

no bz2 by Anonymous Coward · 2003-03-19 01:58 · Score: 0

So why doesn't this card do bzip2? ;^)

Seriously, this is a funny development. The inverse of the winmodems.
Other interesting ideas for dedicated cards?

Re:no bz2 by stilwebm · 2003-03-19 03:35 · Score: 1

Because http browsers don't decode bzip2 on the fly, but can decode gzip on the fly. This is aimed at servers compressing their output on the fly. Offloading this task from the CPU frees more room for dynamic page processing or SSL calculations.
Re:no bz2 by Anonymous Coward · 2003-03-20 14:38 · Score: 0

No kidding they can decode gzip on the fly. I hate that. Even worse is downloading foo.tar.gz, but having it end up saved as foo.tar.tar, but still be gzipped. IE started doing this to me one day, after I'd made no changes at all to anything I can remember. I don't use IE anymore.
Re:no bz2 by Anonymous Coward · 2003-03-21 11:18 · Score: 0

Right click the shortcut, open cmd window, wget right-click Enter.

Done.
Re:no bz2 by localghost · 2003-03-23 14:31 · Score: 1

Other interesting ideas for dedicated cards?

Off-topic, but anyway, it would be nice to see dedicated MPEG-2 decoders aimed at consumers. Something like that would be great for sticking in a MythTV box. Right now you need a fast CPU to do anything useful on those, only because of the encoding.
Re:no bz2 by Anonymous Coward · 2003-03-24 18:13 · Score: 0

There already are dedicated MPEG-2 decoders. They're called "DVD players."
Re:no bz2 by DoomGerbil · 2003-03-24 19:48 · Score: 1

Uhh...you realize that you just described a DVD decoder card, right? They've been around for at least 5 years. try Creative's DxR3 or Sigma Designs' Hollywood cards.
Re:no bz2 by mabinogi · 2003-03-27 09:37 · Score: 1

There used to be, when CPUs couldn't handle it...but these days they're just an unecesary expense..

Also, some video cards include mpeg2 decoding

--
Advanced users are users too!
Re:no bz2 by usotsuki · 2003-03-29 07:06 · Score: 1

Nyetscrape 4 does the opposite with ftp.

Try downloading a disk image from Asimov.net and watch. It'll save with the .gz extension, but the file will be the uncompressed 143,360 bytes. *groan*

-uso.
When saving a file, browsers should not touch its contents. That includes newline translation. :(

--
Dreams, dreams, don't doubt dreams, dreaming children's dreaming dreams. Sailor Moon SS
Re:no bz2 by fruey · 2003-03-31 23:44 · Score: 1

You're talking about ENCODING, not decoding. MPEG decoders are available all over the place on ATI, Creative and Hollywood style cards. It's the encoding part which is hard, really. Decoding takes less CPU (a little less).
Now, as for dedicated encoders, there must be some around. Like here, or maybe high speed here (just a quick Google turned these up).
Of course, another thing that can be improved is I/O bandwidth and compression tweaks. As storage gets cheaper and cheaper you can use HuffYUV lossless, and then have a background task to gradually compress to MPEG2 or DivX, I guess. Heck, you could even do that now, but then current generation CPUs can encode and decode at the same time anyways...
The problem with MythTV is that it uses a hacked (in the good sense) version of the NuppelVideo compression scheme... I quote from MythTV:
So, with the hardware able to handle what I required of it, it's back to the software. I significantly modified the NuppelVideo codec: I added the ability to compress the audio using the LAME encoder library, upgraded the RTjpeg code to the latest version, modified it to use the Xvideo extension to convert colorspaces in hardware, added a better de-interlacing filter (taken from MPlayer), and finally c++-ified it all into Recording and Playback classes, so I could easily use the functionality in applications.

MythTV really looks like it rocks. I just need a better machine to test it out (mine is an appallingly slow poor old thing :'( )

--
Conversion Rate Optimisation French / English consultant

Useful for netbackups too by walt-sjc · 2003-03-19 01:59 · Score: 5, Insightful

Seems this would be a great help to those doing backups over a LAN. Shouldn't take too much to alter a version of tar , rsync, etc. to use this card.

Re:Useful for netbackups too by Bazzargh · 2003-03-19 02:35 · Score: 4, Informative

rsync doesnt use gzip, or the deflate algorithm - it uses the Burrows-Wheeler Transform, same as used in bzip2. If you read Tridge's thesis you'll see that he actually proposes an rzip algorithm based on the BWT and his work on rsync that compresses better than gzip or bzip2 on typical files.

-Baz
Re:Useful for netbackups too by walt-sjc · 2003-03-19 03:25 · Score: 3, Insightful

Interesting, didn't know that. I just assumed it used the same code. Note that one of the cool things about open source is that you could swap out the compression code which is exactly what I was suggeting, so it wouldn't really matter what algorithm the code originally used. (of course it would no longer be compatible, but I'm also assuming that this wouldn't be an issue in this case for this application.) I normally don't use the built-in compression with rsync, instead I use the compression in ssh which I believe IS gzip.

It would be Very cool if the card supported multiple compression algorithms. Considering that GNU tar supports bzip as well., this would definately be useful.
Re:Useful for netbackups too by stilwebm · 2003-03-19 03:42 · Score: 2, Interesting

Maybe you're thinking of dynamic linking against zlib or other compression libraries. This would use the same code, quite literally. That would be the most usefull way to support a card like this. The zlib.so (or zlib.dll) could be modified to interface the drivers for the card, so programs linked against zlib would transparently use the faster hardware acceleration. Few programs will be statically linked to zlib anyway, and those exceptions are likely to either be binaries you don't mind recompiling for speed (e.g. you linked it statically and tweaked the binary for speed already) or binaries on some rescue disk or small root filesystem where zlib.so may not be readable.
Re:Useful for netbackups too by walt-sjc · 2003-03-19 04:04 · Score: 1

Swapping out the dynamically linked zlib library would be easiest, but frequently backup and restore tools are statically linked for obvious reasons (at least all MINE are.)
Re:Useful for netbackups too by stilwebm · 2003-03-19 05:07 · Score: 1

It's probably more likely that the restore binary is staticly linked but not the backup. In any case, it would absolutely be work work a recompile to speed things up.
Re:Useful for netbackups too by Anonymous Coward · 2003-03-19 05:12 · Score: 0

worth a recompile i mean

damn time for more caffeine
Re:Useful for netbackups too by Anonymous Coward · 2003-03-23 19:28 · Score: 0

Read the thesis (and maybe rsync source code too) more carefully. rsync actually does use deflate (gzip/zlib) not bzip2.
Re:Useful for netbackups too by pbarker · 2003-03-24 17:10 · Score: 0

Doesn't use gzip? Some files to remove, then....

---
pbarker@milligan:~/muck/rsync/zlib$ ls
CVS/ adler32.o dummy.in infcodes.o inflate.o infutil.o zutil.c
ChangeLog crc32.c infblock.c inffast.c inftrees.c trees.c zutil.h
Makefile deflate.c infblock.h inffast.h inftrees.h trees.h zutil.o
README deflate.h infblock.o inffast.o inftrees.o trees.o
README.rsync deflate.o infcodes.c inffixed.h infutil.c zconf.h
adler32.c dummy infcodes.h inflate.c infutil.h zlib.h
pbarker@milligan:~/muck/rsync/zlib$
---

bandwidth saving by buro9 · 2003-03-19 02:03 · Score: 5, Insightful

the key to using gzip is really not to compress at too high a ratio... a low rate of compression offers a pretty sizeable saving in bandwidth for an acceptable CPU usage... once you edge up to the higher compression levels then you pay for it in the CPU and your app slows.

i love the idea of a hardware based gzip... but i'd start by educating the software users on the cost vs benefit ratio of their existing configuration... i always seem to find that those who don't know what they're doing are the ones that have it set to maximum compression

Re:bandwidth saving by s0l0m0n · 2003-03-26 17:12 · Score: 1

If there are cards doing gzip at each end, that wouldn't be much trouble right?

Wouldn't the point of using a hardware version be that you don't have to worry about the CPU usage as much, in any case?

Hardware Gzip by brejc8 · 2003-03-19 02:05 · Score: 1

The methods I have seen of Gzip seemed to be made to make it possible to do it in hardware. I was under the impression that was intended.

On an aside note this could be ofcause easily dome using an FPGA pci card. One that can do anything you want. Reprogram it to accelerate seti at home or stick some routines used in quake into it. Much more versetile.
The only problems are standarsation and convincing developers to use them.

--
Mouse powered Chips, Open source Processors and Lego

Re:Hardware Gzip by Lord+Sauron · 2003-03-19 02:35 · Score: 4, Funny

A hardware that does the dirty processing job while freeing the CPU ? Wow, that's new. I'm going to the USPTO to get my patent on this.

Maybe I can even make some money on Intel, as they were in clear violation of my patent with their arithmetic coprocessor for use with the 80386SX family of microprocessors .
Re:Hardware Gzip by brejc8 · 2003-03-19 02:59 · Score: 1

Im talking about reconfigurable FPGAs

--
Mouse powered Chips, Open source Processors and Lego
Re:Hardware Gzip by Anonymous Coward · 2003-03-19 03:40 · Score: 0

What's up with 386SX? 386DX needs an arithmetic coprocessor too. Even 486SX.
Re:Hardware Gzip by Anonymous Coward · 2003-03-19 21:34 · Score: 0

Yes, of cource that could be possible. Someone could come up with like a PCI card with a 1+ FPGA's on it and a couple DDR slots (for storage). Then it could be a configurable card that could do anything in hardware. Basically, that's what FPGAs are for, field programing hence the FP in the FPGA :) But I'm sure you know that, so the sarcastic remark stays!! :)

I suppose you could even put a couple external ports on it for something, like some general purpose I/O lines and maybe some ADC/DAC lines too. That would be kind of cool. I'm sure if something like that was fairly cheap, like a board that had up to 4 FPGA slots or somehing with like 2 DDR slots, and some external ports like I mentioned above, if that all could go for under $100 with 1 FPGA, and you could get a developer community going, it could take off. The only problem is, making the logic for some piece of h/w is a lot more complicated then taking the weekend to learn python for some simple scripting. The biggest problem in this, is that the compilers are propetary and expensive.
Re:Hardware Gzip by mvdw · 2003-03-20 14:26 · Score: 1

I suppose you could even put a couple external ports on it for something, like some general purpose I/O lines and maybe some ADC/DAC lines too.
Don't get out much, do you?
This is relatively simple to do, and most of the major FPGA vendors offer "PCI development kits" which allow you to develop your own PCI card using their FPGAs. They're quite expensive, though, as they're aimed at OEMs.
The biggest problem in this, is that the compilers are propetary and expensive.
That would be why FPGA Vendors like Altera, Xilinx and Lattice all offer free versions of their FPGA software that will place-and-route most of their lower-cost devices.
Learning to "program" an FPGA isn't all that hard, it's just a different paradigm to program a sequential language. Interfacing the PCI bus is probably the hardest part, but most of the vendors will help you with that, too. They're in the market to sell as many FPGA chips as they can, after all...
Re:Hardware Gzip by Anonymous Coward · 2003-04-02 18:26 · Score: 0

Heh, I didn't mean learning to "program" an HDL language, I meant the compilers for the FPGA's themselves.

A bzip2 version would be nice ... by geirt · 2003-03-19 02:05 · Score: 4, Insightful

I try to avoid bzip2 because it is so slow, even on modern hardware. bzip2 compresses very well, much better than gzip. A bzip2 version of this card makes sense ....

--

RFC1925

Re:A bzip2 version would be nice ... by arvindn · 2003-03-19 04:00 · Score: 5, Informative

No, bzip2 is something that won't work for applications like serving web pages.
gzip works with streams, producing input as it gets output. OTOH bzip2 treats the input as blocks. Thus it needs to get a whole block before it produces any output. Similarly the client needs to get a whole block of data before it can even start rendering the page. The man page of bzip2 says that the default block size is 900,000 (!) bytes. So while using bzip2 may improve bandwidth it will result in large latency.
Re:A bzip2 version would be nice ... by ianezz · 2003-03-19 06:59 · Score: 4, Interesting

gzip works with streams, producing input as it gets output. OTOH bzip2 treats the input as blocks.
Gzip works with blocks of data too, but the block size is 32KB instead of nearly 1MB and it is not nearly as CPU intensive as bzip2, so this is why it appears to produce a continuous stream of compressed data (even if, strictly speaking, it doesn't).
Gzip just seems to be a well-balanced compromise between resources and resulting compression ratio, plus it is Free Software (hint: bzip2 is Free Software too, but Rar isn't).

Complete, Utter, Comprehension! by Beatbyte · 2003-03-19 02:07 · Score: 0

GZIP-Kompression per Hardware

Ein Joint-Venture der Universität Wuppertal mit der Hagener Vigos AG zeigt auf der CeBIT (Halle 11, D26) den Prototyp eines "GZIP Accelerator Board". Die PCI-Steckkarte nimmt dem Prozessor die zeitraubende Kompression ab und soll in der aktuellen Version bereits 32 MByte pro Sekunde zusammenstauchen können. Damit läßt sich der Netzwerktraffic einer 100-MBit-Leitung bereits in Echtzeit komprimieren; durch einen modularen Aufbau sollen später bis zu 64 MByte pro Sekunde erreicht werden.

Vor allem in Webservern soll so das ausgehende Datenvolumen on-the-fly komprimiert und damit sowohl die CPU als auch die Netzwerkanbindung entlastet werden -- eine willkommene Hilfe für Internet-Provider, die ressourcenschonend agieren müssen. Diese sind auch die primäre Zielgruppe für das mittlerweile patentierte Verfahren, das in ersten Seriengeräten Ende 2003 zum Einsatz kommen soll. Bis dahin will der Hersteller auch das noch sehr klobige Layout der Karte auf die Gegebenheiten in Servergehäusen angepaßt haben. (Christopher Kunz) / (sun/iX)

oh OH!!!!! NOW it makes sense!

--
Get paid to code OSS

Re:Complete, Utter, Comprehension! by Specialist2k · 2003-03-19 02:15 · Score: 3, Informative

A translation: A joint venture between the University of Wuppertal and Vigos AG showcase the prototype of a "GZIP accelerator board" at CeBIT (Hall 11, D26). The PCI card removes the burden of performing time-consuming data compression tasks from the system CPU and already achieves a data throughput of 32 MB/s in its current development state. This is sufficient to compress the traffic generated by a 100 MBit LAN connection in real-time; through the modular design, it will be possible to reach 64 MB/s in the future. [end of first paragraph] Specialist
Re:Complete, Utter, Comprehension! by bumby · 2003-03-27 20:09 · Score: 1

Google says:

GZIP compression by hardware a Joint venture of the University of Wuppertal with the Hagener Vigos AG points to the CeBIT (, D26 resounds to 11) the prototype of a "GZIP accelerator board". The PCI plug-in card removes the time-consuming compression from the processor and is in the current version already 32 MByte per second to compress together to be able. Thus the Netzwerktraffic of a 100-MBit-Leitung can be already compressed in real time; by a modular structure are to be achieved later up to 64 MByte per second. Particularly in Web servers so the outgoing volume of data is to be compressed on-the-fly and be relieved thus both the CCU and the network binding -- a welcome assistance for InterNet Provider, which must act resources-carefully. These are also the primary target group for the procedure patented meanwhile, which is to be used in first standard sets at the end of of 2003. Up to then the manufacturer wants to have adapted also the still very klobige layout of the map on the conditions in server housings. (Christopher Kunz)/(sun/iX)

--
Hey! That's my sig you're smoking there!

Comparison by Merlin42 · 2003-03-19 02:37 · Score: 3, Interesting

For comparison i ran gzip on two machines I happen to have immediate access to, I compressed a 32mb file gotten from /dev/urandom,which probably would be a worst case scenario for a compressor

dd if=/dev/urandom of=32m bs=1024k count=32 ; time gzip 32m

P4-1.8Ghz:
real 0m4.428s
user 0m4.220s
sys 0m0.170s

AthlonXP2200+
real 0m3.579s
user 0m3.310s
sys 0m0.160s

So 32MB/s sounds pretty good to me.

--
Thoughts on tech, Software Engineering, and stuff

Re:Comparison by Oggust · 2003-03-24 23:58 · Score: 1

Generating the data that comes out of urandom isn't cheap for the kernel. Try running top (or similar) while doing this, I bet you have a whole lot of system time.
Try saving the data to a file first, and then gzipping that.
/August

--
"An object declared as type _Bool is large enough to store the values 0 and 1." -- 6.1.2.5, C99 standard.
Re:Comparison by Merlin42 · 2003-03-25 00:18 · Score: 1

That is what I did, please note the placement of the semi-colon. The file was named 32m.

dd if=/dev/urandom of=32m bs=1024k count=32 ; time gzip 32m

please type
man dd
then type
man time
then type
man gzip

--
Thoughts on tech, Software Engineering, and stuff
Re:Comparison by Oggust · 2003-03-25 01:56 · Score: 1

Duh, yeah. Should have looked closer before replying, I guess. Sorry.
/August, better get some coffe...

--
"An object declared as type _Bool is large enough to store the values 0 and 1." -- 6.1.2.5, C99 standard.
Re:Comparison by kjd · 2003-03-25 17:17 · Score: 1

Random data is not compressable; check your before and after file sizes.

If you want to test compression, try something like large log files, which usually have a lot of repetition.
Re:Comparison by karlm · 2003-03-31 12:24 · Score: 1

Random data should give you the worst compression ratio, but not necessarily speed. bzip2 starts out by using a poor compression agorithm (RLE) before the BWT becuse the worst case for the bwt is all of the bytes being the same. The BWT actually runs very fast with random inputs compared to all zeros, for instance.

--
Copyright Violation:"theft, piracy"::Anti-Trust Violation:"thermonuclear price terrorism"<-Overly dramatic language.

translation by steveheath · 2003-03-19 02:41 · Score: 1

Not a professional job, just bablefished..

GZIP compression by hardware A Joint venture of the University of Wuppertal with the Hagener Vigos AG points to the CeBIT (, D26 resounds to 11) the prototype of a "GZIP accelerator board". The PCI plug-in card removes the time-consuming compression from the processor and is in the current version already 32 MByte per second to compress together to be able. Thus the Netzwerktraffic of a 100-MBit-Leitung can be already compressed in real time; by a modular structure are to be achieved later up to 64 MByte per second. Particularly in Web servers so the outgoing volume of data is to be compressed on-the-fly and be relieved thus both the CCU and the network binding -- a welcome assistance for InterNet Provider, which must act resources-carefully. These are also the primary target group for the procedure patented meanwhile, which is to be used in first standard sets at the end of of 2003. Up to then the manufacturer wants to have adapted also the still very klobige layout of the map on the conditions in server housings. (Christopher Kunz)/(sun/iX)

Re:translation by gilesjuk · 2003-03-20 06:46 · Score: 1

If they're sensible designers it's a programmable CPU/DSP on a card, you could then write and upload any compression algorithm onto the card.

what about decompression??? by lastninja · 2003-03-19 02:42 · Score: 1

does the article mention anything about decompression? my german is lousy but it seems it doesn't. Is decompression really that fast so that it doesn't need dedicated hardware??

--
John Carmack fan, browsing at +5 since 1999.

Re:what about decompression??? by Anonymous Coward · 2003-03-19 03:46 · Score: 1, Informative

Decompression is distributed. The application of a gzip compression board is to compress HTTP data on high-load webservers. Most browsers accept gzip as transfer encoding, so gzipping the stream provides better bandwidth utilization for both the server and the clients.
Re:what about decompression??? by mvdw · 2003-03-20 14:08 · Score: 1

does the article mention anything about decompression?
I would imagine this card would be aimed at the server market, where the application is in serving dynamic data to a large number of clients. By compressing that data at the server side, the effective network bandwidth can be increased. The hit for real-time decompression is less for the client, since they are only decompressing one set of data, while the server needs hardware acceleration as it's compressing many data sets.
Another potential application that doesn't require hardware-assisted decompression could be doing backups. While performing the backups might require very fast compression, the use of those backups is infrequent enough to not require as fast decompression.
This is all speculation, but it seems reasonable enough to me.
Re:what about decompression??? by kjd · 2003-03-25 17:23 · Score: 1

Dedicated encryption/compression cards usually ship with replacement shared libraries for the system (e.g. SSL accelerator cards usually come with compatible replacement libraries that can drop into /usr/local/ssl). These replacements have the same API but take advantage of the hardware for computation.

Most likely any replacement for libz.so would try to use the hardware as much as possible, offloading compression and decompression. Ideally it'd be configurable by the administrator.

Not a good comparison by TheSHAD0W · 2003-03-19 02:48 · Score: 2, Interesting

You're assuming the card is using the same settings as your version of gzip defaults to. More likely it's using a much lower compression level and a considerably slower processor.

Note that this isn't necessarily a bad thing; at the expense of maybe 5-10% less compression, you're getting that high throughput. Depending on your task, it's a good trade-off.

Re:Not a good comparison by Merlin42 · 2003-03-19 03:10 · Score: 3, Interesting

Good point ... lets test a little more:
P4-18Ghz: gzip -9
real 0m4.437s
user 0m4.200s
sys 0m0.210s
P4-18Ghz: gzip -1
real 0m4.366s
user 0m4.130s
sys 0m0.200s
AthlonXP2200+: gzip -9
real 0m3.387s
user 0m3.160s
sys 0m0.210s
AthlonXP2200+: gzip -1
real 0m3.427s
user 0m3.200s
sys 0m0.170s

The really funny part is that I ran the Athlon one several times and the gzip -9 was always just ever so slightly faster than the gzip -1 version.

Maybe random data is not the best for testing the different compression levels though, since if it is truly random it cannot be compressed no matter how hard you try.

Even if this is not a perfect(or even reasonable) "apples to apples" comparison, it is a good end-to-end system level comparison. While it may not be "4x faster than a 2Ghz CPU", when building a system that _needs_ to do compression, adding this card would _effectively_ boost my CPU speed.

--
Thoughts on tech, Software Engineering, and stuff
Re:Not a good comparison by The+Ego · 2003-03-19 03:57 · Score: 2, Interesting

If gzip -9 (a.k.a. gizp --best) is faster than gzip -1, it must be because you are IO limited, so writing a smaller file ends up as wall-clock saving.

It clearly is a flawed test to compare the CPU loads of -9 and -1 but it is an excellent example that IO is often the bottleneck.
Re:Not a good comparison by Alien+Being · 2003-04-02 01:03 · Score: 1

Forget about the real (wall clock) and system times. user time tells the real story in this case.

Browser Compression by Kalak · 2003-03-19 02:57 · Score: 3, Informative

Most all current browsers will automatically uncompress gzipped files sent to it, allowing things such as the mod_gzip module to compress web pages and have them rendered on the browser transparently. The bandwith savings ccan be huge, with all the associated benefits (less bandwith for the server, less for the clients and less congestion on the net). Without bzip2 support built into the browser, the hardware compression isn't useful for general web traffic, as it can't be used for the pages being sent.

It'd be nice if I could convince my boss to get some of these for us, but our CPU usage is pretty low right now with the mod_gzip module installed, so it'd be an unnecessary luxury at this point for us.

--
I am, and always will be, an idiot. Karma: Coma (mostly effected by .hack)

How cute but useless. by _Eric · 2003-03-19 03:02 · Score: 5, Interesting

The general trend in the industry goes to non-intelligent interconnections (Gigabit card used to have a processor (Alteon), they don't anymore (see latest intels)). I2O never took off because you don't really need to relieve a computer from computation when your computation power is pletoric.

On a Xeon 2.8GHz, I just got 71 MB/s for gzip.

What's the use for such hardware then?

Plus it will eat the PCI bus because data has to go out of memory to processing card, back to memory, then to network card. You triple the PCI bus bandwidth. (Not true if the compression is embedded in the network card).

Re:How cute but useless. by sporty · 2003-03-19 03:12 · Score: 2, Insightful

Not really. Can you cheaply create a cluster of say.. 50 web servers, all that use mod_gzip for line compression?

Xeon's arent' THAT cheap, but hey, 1ghz machines (or even 500mhz machines) with this card would easily match your Xeon once the 64MB/s cards come out. Or was that 64mb/s. Well, you get the point.

As for the bus latency, well.. you are right, it'd be better in the network card, but remember, that's layer 1 and 2 stuff you'd be meddling with, where gzip would end up in layer 4. Layer 3 is tcp/udp, 4 is app data, right?

--
-
ping -f 255.255.255.255 # if only
Re:How cute but useless. by ivan256 · 2003-03-19 04:20 · Score: 1

The general trend in the industry goes to non-intelligent interconnections (Gigabit card used to have a processor (Alteon), they don't anymore (see latest intels)). I2O never took off because you don't really need to relieve a computer from computation when your computation power is pletoric.

General purpose CPU power is still more expensive than specialized processing for compute heavy tasks. High level gzip compression still eats CPU on multi-ghz machines.

Besides, that's not the trend at all. The trend is typically to have a special purpose chip that can do things the CPU can't, then when CPUs get faster to offload the task, and finally, when the task becomes ubiquitous, it is included in the interface hardware. This is true for video, audio, networking....

The initial alteon boards had general purpose processing on them, while the newer cards use chips designed specifically for gigabit ethernet, and either way, it doesn't change the interconnect, just the interface. The newer cards are way more intelligent than the old ones though; many are starting to compute checksums on the card, and you can get cards with encryption processors.

On a Xeon 2.8GHz, I just got 71 MB/s for gzip.

That's great, so you can dedicate most of an expensive Xeon to gzipping, or you can plug one of these things in and free you Xeon up to, say, generate 64MB/s of data... You're Xeon can't be raytracing if it's gzipping.
Re:How cute but useless. by Lars+T. · 2003-03-19 05:07 · Score: 1

So who/what is doing the serving/generating of pages while your Xeon is busy gziping them?

--
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck
Re:How cute but useless. by _Eric · 2003-03-19 05:23 · Score: 1

You won't be gzipping faster that the bandwidth, which is the bottleneck (let's assume you double bandwidth with gziping). Usually, serving a lot of requests involves a load balancer plus many machines serving the actual request, because the serving is complex. The gzipping will be neglectable I think. It could also be a task devoted to the load balancer itself, if load permits (on the other hand, the load balancer is critical).
Re:How cute but useless. by The+Apostrophe+Guy · 2003-03-19 06:39 · Score: 0

"Plus it will eat the PCI bus because data has to go out of memory to processing card, back to memory, then to network card. You triple the PCI bus bandwidth. (Not true if the compression is embedded in the network card)."
I didn't RTFA, but I assumed that the processing card was the network card. It must only cost them a few cents to stick 100Mbit ethernet on the same card.
Re:How cute but useless. by elphkotm · 2003-03-19 07:26 · Score: 1

Ethernet is Layer 2, IP is Layer 3, TCP is Layer 4, HTTP is Layer 7

--

<Amanda`> I just went out to the parking lot in my bathrobe to exchange warez CDs.
Re:How cute but useless. by sporty · 2003-03-19 07:53 · Score: 1

Aren't there two layer views of networking?

I know there's a 7 layer one, which is what I think you are describing.

--
-
ping -f 255.255.255.255 # if only
Re:How cute but useless. by Anonymous Coward · 2003-03-19 11:45 · Score: 0

"... smells like seven layers, that beaver eats taco bell!"
With apologies to Primus.
Re:How cute but useless. by UID30 · 2003-03-20 02:25 · Score: 1

wtf? this would be incredibly useful. what else did your Xeon 2.8Ghz do while gzipping at a sustained 71MB/s? For any web server running dynamic content (database backend of some flavor) & mod_gzip, any reduction in CPU consumption is a godsend ... PLEASE let me buy this cheap PCI card to extend the life of my server!

now if the limit of your use of gzip consists of sitting at a command prompt and typing
tar xvfz pr0n.tar.gz /home/luser/.pr0n/
then you are correct ... this card is not for you.

--
"Glory is fleeting, but obscurity is forever." - Napoleon Bonaparte
Re:How cute but useless. by _Eric · 2003-03-20 02:54 · Score: 1

The wrong thing is the cheap thing. The co-processing cards are never cheap because the market is small. An FPGA prototype is in the range of $100's. And the gzipping will be neglectable with regard to the amount of work in the DB.

What kind of line are you serving if you want to do 10s of MB/s ? If you do, don't you have load balancer? What is the average throughput of an idividual server then?

A hardware DB offload engine would definitely be more inpressive, and I thing much more usefull.
Re:How cute but useless. by UID30 · 2003-03-20 03:26 · Score: 1

at $200 per card, i'd still say this was worth it. we installed mod_gzip quite a while back in order to save on network bandwidth and saw an immediate benefit there, however our web servers did register a hit on their cpu utilization in the order of 10-20%.

yes, we do run a load balancer with multiple servers ... and know we'd never approach the 32MB/s limits of the 1st version of this card on a per-server basis, but if i even suspected it'd give me back that 10-20% to serve more content per server, i'd jump at the chance to test it further.

being able to stream gzip 32MB/s is quite impressive by itself, but i'd still like to find out how it handles 100s of smaller tranasctions rather than one big one.

--
"Glory is fleeting, but obscurity is forever." - Napoleon Bonaparte
Re:How cute but useless. by heXXXen · 2003-03-31 12:14 · Score: 1

The OSI model has 7 layers, the TCP/IP model has 4.
Re:How cute but useless. by nhorton · 2003-04-02 10:30 · Score: 0

What would make this cool would be putting this functionality on NIC's and network devices. NIC's have gotten to be so cheap that really the cost difference between the compression card and the compression card + integrated NIC would be negligable and this would solve the PCI bandwidth issue.

Reconfigurable by KingPrad · 2003-03-19 03:07 · Score: 5, Interesting

This is cool - dedicated chips can process monstrous amounts of data and much faster than a general purpose CPU. So it's a good idea to let this card do the heavy lifting of compression. Of course the use extends to many types of data analysis: encryption, scientific number crunching, graphics compression.

The best idea would be to make the chip an FPGA not a specially-designed processor. Then you could load in different chip designs for whatever was currently needed. Need to do RSA encryption? The board reconfigures the FPGA for it. Same goes for Divx compression, gzip, SETI@Home, etc. FPGAs take a few milliseconds to reconfigure but when they operate as a dedicated signal processor they can leave a general purpose processor in the dust - leaving the main CPU to run the other apps, the desktop, etc.

Check out the IEEE archives and journals, searching for "adaptive computing" or "reconfigurable computing".

KingPrad

--
Stop the Slashdot Effect! Don't read the articles!

Re:Reconfigurable by kistel · 2003-03-19 04:02 · Score: 1

IBM sells a crypto module for quite a while now which can take all the crypto processing off the main CPU. Things like key generation, hashes, encryption/decryption etc. Think of an OpenSSL implementation, which simply forwards these requests to a hw module, and this way provides hw based SSL and such to applications...
Re:Reconfigurable by benjamindees · 2003-03-19 07:16 · Score: 1

Soekris Engineering also sells a crypto card for VPN applications. It says it also does compression.

--
"I assumed blithely that there were no elves out there in the darkness"
Re:Reconfigurable by UranusReallyHertz · 2003-03-19 07:50 · Score: 2, Interesting

I always thought it would be cool if some of the transistors in general purpose cpus could be used as an FPGA to serve as an "algorithm cache". When a program is run the most frequently used algorithms are automatically implemented in hardware on the FPGA, resulting in speedups anywhere between 10 and a 1000 times. Seeing as how CPUs will have a billion or more transistors in the near future, this would seem like an excellent use for them.

--
Smoking is an expensive, slow, and unreliable method of suicide.
Re:Reconfigurable by dillon_rinker · 2003-03-23 15:05 · Score: 1

Read the CISC vs. RISC article on arstechnica. it addresses this sort of thing. It was found not to be as useful as it seems. HOWEVER...that was probably because chip designers were trying to predict the behavior of software. MMX and similar improvements to Intel CPUs resulted from analyzing actual software to see how it could be made faster in hardware.

Only useful for dynamic sites? by d-Orb · 2003-03-19 03:24 · Score: 2, Interesting

I guess that this would only be useful for dynamic sites, wouldn't it? Otherwise, static pages would be cached on the server, only needing compression the first time they are served :-?
At any rate, most of the visitors to my site rarely get the gzipped pages, as their browsers don't seem to support it :(

Re:Only useful for dynamic sites? by Anonymous Coward · 2003-03-19 04:03 · Score: 0

Hello IE!!
Re:Only useful for dynamic sites? by Anonymous Coward · 2003-03-19 04:07 · Score: 0

Actually, some other poster just posted the fix from MS for the bug, cheers, you saved the day!
Re:Only useful for dynamic sites? by Anonymous Coward · 2003-03-23 19:41 · Score: 0

this would be great for an archive site. something like kernel.org or where there is lots of source code or other types of data that will be requested and would make better sense to get the data that is needed and not some bloated file. if you could request the files that are needed, and have the compressed on the fly while they are being transfered, you would save tonnes of bandwidth with more processing power being used. in todays world, processing power costs much less than raw bandwidth.

Cool by arvindn · 2003-03-19 03:49 · Score: 5, Informative

gzip was designed with such considerations in mind. Throughput of the algorithm took precedence over compression level. Good to see their farsightedness paying off. And the algorithm is pretty simple so that it can be implemented in hardware directly.

Another thing about gzip is that it is assymmetric: decompression is much faster than compression. Again this is a nice feature, because most files will be decompressed many times but compressed only once. Thus for instance, all man pages are stored in gzipped form and decompressed on demand.

But I can't see the point of implementing it in a PCI card. Wouldn't it be better to integrate it with either the processor or the network interface?

Re:Cool by Tailhook · 2003-03-23 19:31 · Score: 1

gzip was designed with such considerations in mind. Throughput of the algorithm took precedence over compression level. Good to see their farsightedness paying off.

I think that if one were planning to dedicate hardware to the task of compression, one would decide that space should take precedence over speed. Performance is the reason that hardware gets dedicated to a task. Why design something to be efficient with your CPU, and then solve the efficiency problem with dedicated CPUs?

And the algorithm is pretty simple so that it can be implemented in hardware directly.

Are there any compression algorithms so complex that they can't be implemented in hardware? I don't know the implementation details of this particular bit of dedicated hardware, but I seriously doubt that the gzip algorithm was actually implemented directly in the gates and capacitors of a silicon chip. More likely it is either re-configurable logic, or a general purpose CPU (ARM et al) with some firmware.

--
Maw! Fire up the karma burner!

Not quiet yet... by buzzbomb · 2003-03-19 03:57 · Score: 4, Informative

The article mentions that this will be of particular interest for web servers.

I'm assuming one is referring to something that will work with mod_gzip. That may be fine and dandy, but I just recently had to disable mod_gzip on my server. You can blame Microsoft.[1] It seems that both IE 5.5 and 6.0 have nasty little "sometimes" bugs[2] where they won't know what do with gzipped content. I tried to disable by user agent header with no luck. If anyone else has some good pointers or perhaps even a link to a patched version of mod_gzip that'll avoid those two bugs, I would apprieciate it.

[1] No, really. This isn't a troll. They even admit the bugs.
[2] Microsoft Knowledge Base Articles: Q313712 IE 5.5 Q312496 IE 6.0

Re:Not quiet yet... by arvindn · 2003-03-19 05:53 · Score: 2, Funny

You might want to try out mod_msff: the Microsoft-free friday apache module ;)
Re:Not quiet yet... by buzzbomb · 2003-03-19 06:32 · Score: 1

Now that is beautiful. However, I run a couple of e-commerce sites from that server. Blocking potential customers via that module would be...bad. It's also crazy to block those potential customers that don't have certain plug-ins installed.

For example, my own tests have revealed that Flash is installed in 70% or less of browsers that frequent one of these sites. That's 30+% of your users that you'd be locking out! That's also quite a bit smaller than the 93% that I've seen Macromedia claim; I wonder why they would lie.
Re:Not quiet yet... by arvindn · 2003-03-19 06:48 · Score: 1

Take it easy. That was said purely in jest, and with tongue firmly in cheek.
Re:Not quiet yet... by buzzbomb · 2003-03-19 07:36 · Score: 1

I understand. It just doesn't take much to get me on the soapbox regarding blocking users from sites...even if it is for a good reason. :)
Re:Not quiet yet... by Anonymous Coward · 2003-03-19 08:57 · Score: 0

How do idiots like you get +1 Karma??
Re:Not quiet yet... by Anonymous Coward · 2003-03-19 09:31 · Score: 0

you have to log in first
Re:Not quiet yet... by techy · 2003-03-19 15:30 · Score: 1

Actually, since the bug only affects the first 2048 bytes of content, and only when using the IE back button, one solution I have heard suggested before is to prepend the content with 2048 spaces.

This might sound counter-intuitive, but 2k spaces compresses *very* well (about 14 bytes according to a quick test).

Of course, it's always a shame to have to put in a hack like this to get around IE's "features" (after being in as many versions as this has, its hard to think of it as just a bug anymore) in the first place...
Re:Not quiet yet... by ptudor · 2003-03-22 11:51 · Score: 1

Yeah I have a good pointer, Windows Update. According to Q312496: "This problem was first corrected in Internet Explorer 6 Service Pack 1."
If the problem is with an MS dll and MS patches it, don't expect mod_gzip to work around it when your clients are the ones with the malfunctioning software.
Re:Not quiet yet... by Phroggy · 2003-03-29 06:59 · Score: 1

If the problem is with an MS dll and MS patches it, don't expect mod_gzip to work around it when your clients are the ones with the malfunctioning software.

It's still necessary to work around the malfunctioning software, since many of those users won't update for a long time.

--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;

Moo by Chacham · 2003-03-19 05:52 · Score: 1, Informative

Yeah, I'm stupid. Correct me where I'm wrong.

This thing is going to sit on the PCI bus? Isn't that where your hard drives are too? On older computers which use a 33 megahertz bus, that would mean that compression @33 megahertz would keep the hard drive receiving any of the data. So, it would actually have to compress it at a slower rate, unless it caches everything. Even at 133 megahertz, the hard drive would be both reading and writing when trying to compress, and that's without worrying about swap.

--
Have you read my journal today?

Re:Moo by Sunnan · 2003-03-19 07:08 · Score: 1

Maybe not so much for harddrive compression but for other purposes (maybe network-related).

Putting it on the PCI makes sense from a research perspective - later implementations may be in other places, say on the network card or the disc controller. Danger, but fun.
Re:Moo by benjamindees · 2003-03-19 07:30 · Score: 2, Informative

When the PCI bus is used in conjunction with a 32-bit CPU, the bandwidth is 132 Mbytes/s
That's Bytes, as in 8 bits. A 100 Mbit/sec NIC is only 12.5 MBytes/sec.

--
"I assumed blithely that there were no elves out there in the darkness"
Re:Moo by Anonymous Coward · 2003-03-20 14:21 · Score: 0

33Mhz 32 bit bus does 125MBytes/sec minus overhead.

I think it can handle the 32MBytes/sec plus typical HDD 4-50MBytes/sec easily.

However that is not what this is intended for.

Use with a Web server, 32MB/s to card, then compressed stream (anwhere from 16MB/s to 32MB/s) to the LAN card. Then it depends if the page come from RAM, etc. You might have to add 30-40MB/s onto that for disk reads. Still fits with change.

Now the 64MB/sec card, RAID disks and Gigabit networking would give it problems, but most of those cards either support 66Mhz and/or 64bit PCI.

Waiting for PCI-X, or a proper workstation motherboard. (Uses desktop class memory, ie PC3200+, dual proc and 64bit and/or 66Mhz PCI slot(s). ATM you have to choose between fast memory bus (desktop) or fast I/O (PCI) bus (server).
Re:Moo by Microsofts+slave · 2003-03-25 16:44 · Score: 1

When the hell are we gonna finally abandon these outdated ways of making a computer work. we could have performance coming out our ears if someone *cough intel* would finally abandon our current motherboad system and come up with somthing new. They are trting to replace bios already, but what about the actual freaking motherboard. PCI should be gone, just like 8 bit computing.
Feel free to flame me, but i think the Motherboards days should be numbered.

--
Tragek
Re:Moo by mabinogi · 2003-03-27 09:45 · Score: 1

Actually, If you look at the design on modern intel chipsets, and compare it with the old Northbridge / Southbridge design that most non Intel chipsets use, then you'll see that there's already quite a step in the right direction.

--
Advanced users are users too!

Break out the ramdisk! by TheSHAD0W · 2003-03-19 06:12 · Score: 1

Good point, Ego.

Merlin? Mind running those tests one more time, this time to a ramdisk?

Re:Break out the ramdisk! by CableModemSniper · 2003-03-19 08:50 · Score: 1

well I did it on a ramdisk, but I was too lazy to change the default ramdisk size, so the file is just one meg...
It's on an athlon Tbird 1Ghz
time gzip -9 1m
real 0m2.403s
user 0m0.180s
sys 0m0.020s
time gzip -1 1m
real 0m1.813s
user 0m0.180s
sys 0m0.010s

yeah, I know pretty useless.

Now pretending like I can multiple this by thirty-two to get the rate for 32MB... 76.896s for gzip -9... hmm that can't be right. ah whatever. Someone with more ram than i have can figure it out.

--
Why not fork?
Re:Break out the ramdisk! by danbuhler · 2003-03-27 18:11 · Score: 1

why not just write to dev/null?

Re:Not quite yet... by buzzbomb · 2003-03-19 06:38 · Score: 1

Oh, one more thing I found out in extensive tests: the MS IE patches don't always work as advertised. If they did, it would be easy to say "if you get garbage on these pages, install SP1 for your browser." They appear to fix it somewhat, but not always. The "sometimes" bug still exists in 5.5 SP1 and 6.0 SP1...and that is why mod_gzip is disabled now.

You have an important point... by mnmn · 2003-03-19 06:44 · Score: 4, Interesting

When the PCI bus is taken, other stuff that the CPU needs to do will also be halted. And then the PCI bus is much slower than the FSB.

I think what we need to push distributed computing more is altering the RAM and DMA channels. There should be many physical channels to the RAM capable of simultaneously reading/writing different parts of it. As in if the ram can output 200 MB per sec, 16 devices could attach themselves to the RAM via maybe EDMA (enhanced DMA?) and simultaneously be able to read at 200MB each. This might be done by:

(1) Altering the addressing logic in the memory ICs, maybe put 16 different addressing systems and multiply their pins x16. Then have an external matrix, more advanced than the 802x DMA chip to allow simultaniety.

(2) Seperate the addressing schemes of each chip, so an OS kernel could smartly put data of important processes in the right chip to be worked on by external devices.. again also having an external matrix for the address multiplexing.

This way such a PCI gzip device could have its PCI address space, IRQ as well as (EDMA?) address which it would use to access the data to gzip and put back into the RAM, at full speed, not taking up RAM bandwidth, PCI bandwidth, IRQs or the CPU at all.

The AGP as achieved this by seperating the AGP channel from PCI, but still using dedicated memory rather than smartly-shared memory. I understand multiprocessor systems technically do the same thing, but in this case we are treating the external devices like complete slaves, like the GPU, for only dedicated purposes, and I'm emphasizing the smart sharing of memory that doesnt exist in multiprocessor systems either. In this scheme, one could add CPU cards, maybe hot-plugged, and have insta-multiprocessor system or use it to offload kernel compilation, zipping, 3d transformations, or even take user tasks while the main CPU just works in supervisor mode.

--
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky

huh? by Anonymous Coward · 2003-03-19 07:08 · Score: 0

could someone tell me what this has to do with apache?

Re:huh? by Penguinoflight · 2003-03-30 16:16 · Score: 1

Given that the main usage for PCI cards (or NICS as they should be called), is web servers, this is very targeted towards apache. Apache uses more network bandwidth than anything else, and it would be a very big help, esp. if someone would test this on html code, instead of random data. Whenever you have lame html (html by idiots, or idiot-coded programs, or idiot's scripts), you generally get multiple unnessesary tags, and even in well formed html, there are recurring tags that could be compressed, concievably even more than standard text.

So, it is definatly in the ball-park of apache.

--
"And we have seen and do testify that the Father sent the Son to be the Savior of the World"
1 John 4:14

PCI details and Addressing tricks by _Eric · 2003-03-19 07:46 · Score: 1

On current PCI architectures, you already have that implemented.

Here is the description of the Serverworks chipset (Scroll down to the drawings) Intel's (e7500/7501) is very similar, in architecture at least.

The memory subsystem is one leg of the northbridge (center of the chipset), (two channels allows the chipset to double the bandwidth, but not the latency)

The CPU(s) sit on another bus.

The PCI busses are interconnected through HUBs and specilised links. With this kind of architecture, you can reach 4 times 400 MB/s (1.6GB/s agregate) using the busses in PCI64bits/66MHz). Even better can be expected with PCI-X interfaces.

About the address tricks, you can do that kind of things, but in this case, expect to have to write many things ad-hoc, and forget the general-purpose side of your system. You usually want a real-time system, and I see no point in doing that for a simple web server. RADAR systems, avionics, and stuff like that can be expected to use that kind of trick and optimisations a lot (lots of processing done, and in a very systematic way). 3D rendering seems a nice application for that as well, but I don't know what the state of the art is for high quality (movies) rendering.

The hard part of Web sites is usually database access, which implies complex algorithm that don't fit well in specialised hardware. Compression is I think anacdotis in a web server.

Warning by Anonymous Coward · 2003-03-19 07:54 · Score: 1, Funny

Running gzip on a PCI card could invalidate its warranty. Make a backup of /proc/bus/pci/(card number) first.

Mod_Gzip on a card.... by OneFix · 2003-03-19 08:56 · Score: 1

The article mentions that this will be of particular interest for web servers.

Why? Gzip already uses minimal processor time...and many sites already use Mod_Gzip...

So, as far as I'm concerned, unless the Mod_Gzip project supports this hardware,it's not gonna float...

Re:Mod_Gzip on a card.... by UID30 · 2003-03-20 02:38 · Score: 1

while i'm sure that support for this card doesn't currently exist in mod_gzip, it shouldn't be too awfully difficult to get it to work.

Gzip already uses minimal processor time

i think the definition of "minimal" might be useful here. if you have access to a reletively high volume web server (something on the order of 1mil+ hits per day), take a look at MRTG graphs with and without mod_gzip running ... you might be shocked.

if this card is in the sub$200 range, i'd outfit my server farm with it immediately (if not sooner).

--
"Glory is fleeting, but obscurity is forever." - Napoleon Bonaparte

Gcc by morcheeba · 2003-03-19 09:02 · Score: 1

Now, Gcc on a PCI card is something I'd pay for...

--
HIV Crosses Species Barrier... into Muppets

Re:Gcc by ryan+at+slipgate · 2003-03-19 11:04 · Score: 1

heh, that would be interesting. perhaps make a way to "flash" new gcc releases on to the card's rom?
Re:Gcc by usotsuki · 2003-03-29 07:26 · Score: 1

Yeah. DJGPP on a plug-in, compile Dapple ][ in record time *g*

-uso.
Maybe doing it on a ramdisk will do just as well. :\

--
Dreams, dreams, don't doubt dreams, dreaming children's dreaming dreams. Sailor Moon SS

*most* new PCs have a 33 MHz PCI bus by green+pizza · 2003-03-19 11:38 · Score: 1

As someone who has been working with a large number of new P4 and Athlon PCs, I can tell you that most new PCs still use one single 32 bit, 33 MHz PCI bus. Even wiz-bang mobos with onboard RAID controllers tend to use a single PCI bus of this type... a major I/O bottleneck if you plan on moving more than 100 MB/sec of data. (granted, RAM, AGP, and CPU still have lots of legroom) Keep this in mind when building your next server... you may want to consider a board with 64 bit, 66 MHz PCI or even 133 MHz PCI-X.

Re:*most* new PCs have a 33 MHz PCI bus by Anonymous Coward · 2003-03-23 19:09 · Score: 0

For someone involved with a large number of PC installations, you seem to lack the common knowledge to differenciate between mega bytes / sec and mega bits / sec. Furthermore, you should have read the post above you're before posting this nosense.

For crying out loud! by Anonymous Coward · 2003-03-19 13:34 · Score: 0

Then implement bzip2 AND gzip on the same card! And while they're at it, include an implementation of lzip, rzip, s3tc, and flac on it as well. Isn't the point to have a dedicated micro-computer, ie small package, do all this? Hello! Microcomputer should do more than just gzip!

Ok, but by insanecarbonbasedlif · 2003-03-20 08:06 · Score: 1

When are they gonna offload something interesting, like 3-d rendering, to cards instead of abusing the poor cpu?!

Oh... wait...
Sorry about that; my computer date was set for January 3rd, 1987... let me get out my soldering iron and correct it

--
Just because I doubt myself does not mean I find your position compelling.

Sun machines use PCI busses, too. by Vengeful+weenie · 2003-03-20 08:18 · Score: 3, Insightful

A little late posting, but I did want to point out that modern Sun machines use PCI buses, and the Enterprise class [4000+] machines have a crap load of bandwidth through their backplanes.

I think it's a little naive to say "Oh, my 1000 hit a day web box, running on a cheap 686 wouldn't benfit from this, so it must suck." Hey, dont get mad! You said it! :P

Re:Sun machines use PCI busses, too. by Anonymous Coward · 2003-03-20 14:24 · Score: 0

There is a big difference between a 32bit 33Mhz PCI bus used in a desktop machine, and the 64bit 66Mhz PCI bus used in the Sun box you mentioned.

Here is a thoguht! by f00zbll · 2003-03-20 14:54 · Score: 2, Insightful

What if you run a website that gets say 5million+ page views a day and you generate around 2gigs of logs per day per machine across 8 machines. At night you setup an automated batch to zip the logs and ftp them to a log reporting server. Then a cron jobs kicks off log analysis of all 16gigs of logs. Wouldn't this hardware acceleration help? Now let's try to scale that up to 20million+ page views a day. Or what if you're Yahoo who gets 1billion page views a day. How many gigs of logs do you have to process now. Not everyone needs hardware acceleration, but I would hardley call it useless.

Re:First Ninnle Post! by Anonymous Coward · 2003-03-23 14:32 · Score: 0

The PCI GZip will work under Ninnle Linux, of course.

a sure way to profit... by perlchild · 2003-03-24 12:27 · Score: 1

Now to come out with a single "web server accelerator card"

that does both ssl/cram-md5/AES/etc.. and gzip/zlib/other compression

I can see my clients salivating already(saving the processors for those .jsp pages etc...)
well except for the IO-bound jobs...

Zipping backups? by phorm · 2003-03-24 20:39 · Score: 1

Sometimes I shudder when I hear of people zipping large volumes onto backup. Hopefully hardware compression won't aggravate this problem by making it easier.
One of the big problems with compressed backups, particular if you are tar-gzipping something is that any resulting damage/error in the file can render an entire archive unusable.

Hopefully, most people are into tar-clustering files (that is to say... tar'ing large archives as a group of files, then gzip'ing the grouped archive). You might save a little on CPU and grow the file a bit, but the saving in integrity and possibly speed can be worth it.

Re:Zipping backups? by The_K4 · 2003-03-27 09:36 · Score: 1

Unless you zip each file individually. The compression is only slightly less then doing it as a single big archive and an error only effects the 1 file in that zip file.
:)
Re:Zipping backups? by itwerx · 2003-03-27 15:56 · Score: 1

Most tape drives do hardware compression anyway...
Re:Zipping backups? by phorm · 2003-03-28 06:38 · Score: 1

Yes indeedy... and a hardware compressor would come in quite useful for this. It might be annoying untarring an archive and finding a whole bunch of gzip'ed files though... which is why clustering comes in handy (for example, clustering by subdir, or letter range). Archives shouldn't get tainted very often, if ever... but it can be very annoying if you've ever had to deal with (keep those tapes away from magnets!)

Very interesting, but a little late by monish · 2003-03-25 00:11 · Score: 3, Interesting

We at Indra Networks developed a PCI based gzip accelerator a long time ago. It has been on sale for almost a year. The current version of the card is already at 50 MB/s and we have been shipping that since last September. A higher performance version is on the way.

The card is being sold on an OEM basis to manufacturers of load balancers and SSL accelerators. These boxes front-end multiple Web servers and have very high performance requirements. Also, the CPU has plenty of other work to do, for example TCP/IP processing. This is the application that needs hardware acceleration.

For a low performance site, mod_gzip is fine. But, if you have a busy site with hundreds of Web servers, you don't want to go around installing mod_gzip hundreds of times. It is a lot cheaper to buy a load balancer with gzip hardware acceleration.

bzip2 is irrelevant here as IE and Netscape would not understand bzip2 encoding anyway. But they understand gzip just fine (unless you have a version that is many years old).

Monish Shah
CTO, Indra Networks
www.indranetworks.com

Re:Very interesting, but a little late by JohnnySax · 2003-03-26 18:29 · Score: 1

Installing a consistent software image across "hundreds of web servers" would be necessary regardless of the presence of mod_gzip. I fail to see how eliminating the mod_gzip module would really help in this regard.

I would also think "hundreds of webservers" would be able to gzip at a composite rate higher than 50MB/sec, even while serving content. After all, with a hundred webservers, each webserver would only have to compress at an average rate of 0.5MB/sec to achieve 50MB/sec performance.

Look, I can see certain scenarios where it would be beneficial to do the compression in the load balancer. I just don't think the scenario you give is a valid one.
Re:Very interesting, but a little late by monish · 2003-03-26 19:41 · Score: 1

I suppose, if you are creating a brand new server farm and you install mod_gzip right from the start, that may not be much trouble. But, if you want to enable compression in an existing server farm, it is probably a lot easier to leave your Web servers alone and just deploy it in your load balancer.

As for the throughput, I agree that a hundred Web servers could outperform one of my cards. But, I'd be happy to sell them more than one card. :-)

The point is that administratively, it is easier to do this where you have concentration of bandwidth (e.g. at the load balancer). In that case, hardware assist is a must.

As for scenarios where this makes sense, if you have other ones, please do post them. I'd be happy to get more marketing material. :-)

Monish Shah
Indra Networks
http://www.indranetworks.com

dd yields much smaller file here by Beetjebrak · 2003-03-26 07:22 · Score: 1

When I execute your command I only get about 1.5MB of random data instead of 32MB. I'm running Gentoo Linux on this box.

--
Learn from the mistakes of others. There isn't enough time to make them all yourself.

interresting by SHEENmaster · 2003-03-26 10:19 · Score: 1

but do I remove my tv tuner card for it?

How would this be implimented into unix? Would there be a device to stream to and a replacement for the gzip command and compression libraries?

--
You can't judge a book by the way it wears its hair.

Why use Gzip? by fudgefactor7 · 2003-03-26 10:40 · Score: 1

When rar gives better compression? Since CPU speed won't be a factor anymore, it would make sense to go with a compression system that is more compact.

Using just the standard options, here's my results:

Original file: 732,921,856 bytes
.ZIP compressed: 725,244,234 bytes
.CAB compressed: 719,244,234 bytes
.RAR compressed: 719,855,409 bytes
.TAR compressed: 732,928,000 bytes
.BZ2 compressed: 732,884,505 bytes
.LHA/.LZH compressed: 725,886,696 bytes
.BH compressed: 725,251,468 bytes
.tar.gz compressed: 725,254,634 bytes

.CAB actually won, but that one has some problems (like being Windows), and of the remainder .RAR is best and is cross-platform. I would have used .RAR...

Re:Why use Gzip? by mabinogi · 2003-03-27 09:49 · Score: 1

so how is RAR going to help you accelerate serving web pages using mod_gzip?

Also, why on earth do you have .tar there?, tar is not a compressor, it's a container...

And finaly, the use of this card is for compressing web pages. That's plain text of about 5 to 30k. Why on earth are you comparing 730 Meg of binary (and possibly already compressed due to the bad results from everything) to make your point?

--
Advanced users are users too!
Re:Why use Gzip? by fudgefactor7 · 2003-03-27 11:00 · Score: 1

.RAR = superior compression compared to gzip
.TAR thrown in for good measure
Why not go with the superior cross-platform compression? With a little work they could have done just that. That was the point.
Re:Why use Gzip? by Phroggy · 2003-03-29 07:27 · Score: 1

That can't be right. I've never seen gzip do better than bzip2, or bzip2 make a large file larger. And of course, as someone else mentioned, tar is NOT a compression format, and the result will ALWAYS be slightly larger than the original, not smaller.

The real answer to your question, though, is: #1) web browsers know how to decode gzip, not rar, so gzip is useful for a web server sending web pages while rar is useless for that purpose, and #2) somebody mentioned that gzip is designed to work with a stream of data, while bzip2 requires analyzing a whole block at once, so bzip2 can't be used on a stream of continuous data, while gzip can. I don't know if the same applies to rar or not, but if so, then rar wouldn't work for some applications where gzip would.

--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Re:Why use Gzip? by k98sven · 2003-03-30 05:20 · Score: 1

.RAR compression uses blocks, like bz2 (See other posts about this) and is therefore not suitable for compressing streams.

Also, your comparison is flawed, looking at the compression factors you achived (and the file size) I'm guessing that what you're trying to compress already is. (a DivX file?)
Re:Why use Gzip? by NtG · 2003-03-30 10:50 · Score: 2, Insightful

There are many many issues with this test, which has proved absolutely nothing:

a. It appears (as someone mentioned elsewhere) that you are compressing an already compressed file

b. You have not specified the options used when compressing, which can seriously alter the result

c. You have thrown in TAR, which can be overlooked, however taring a single file before gzip compressing it is simply a waste of time unless there is some particularly pertinent permissions/directory structure data you want to preserve. Basically, you have inflated the gzip output by doing this

d. Each of these compression methods has its own benifits and shortfalls. Good compression ratio is not the be-all and end-all. Certainly many people have explained the whole block-compression theory and why gzip is so versatile.

e. You seem to be trying to prove here that RAR is a superior compression method. It is also not free. It certainly can't be used without licensing fees as gzip can.

f. Where is output such as time taken, i/o and cpu demands, etc?

You may want to rethink your research.
Re:Why use Gzip? by NtG · 2003-03-30 10:55 · Score: 1

I've forgotten to add, that compressing an already compressed file or random binary data will NOT test the capabilities of a compression algorithm. If you want to really test each method, find something like a web server log file, and compress it. At least the test will be somewhat realistic as well.
Re:Why use Gzip? by glwtta · 2003-03-31 14:07 · Score: 1

I'm no mathematician, but that's about a 0.7% difference - is that really worth changing a well established format over? Unless it has some other benefit in addition to the insignificantly smaller file size?
Of course it also largely depends on what it is you are compressing. Let's not forget that "real" compression is, after all, impossible.

--
sic transit gloria mundi

Much better: Reprogrammable Co-processors by pacc · 2003-03-27 22:54 · Score: 3, Informative

A lot of computing records over the years have been set vector computers or other specialized hardware. Putting that power on a PCI-card like this gzip-solution and in addition making the algorithm reprogrammable and reconfigurable you get: Mitron Co-processor on a PCI-card.

has been traditional areas for these kinds of devices, but with the new FPGA's and PCI-express on the horizon I can see it becoming usable for even more specialized applications.

Here is a crude translation of an article in Swedish ( Source Elektroniktidningen)

FPGA enhances PC
You don't have to be a logic constructor to make use of FPGA-chips. Using a normal PCI-card and a compiler from the innovation startup Flow Computing in Lund, programming in Flow's dialect of C is enough.
- We can make a normal PC do calculations that otherwize would have needed supercomputers of large Linux-clusters, said Josef Macznik on Carlstedt Research & Technology, a company that invested and works together with Flow Computing.
The main idea is parallelism. That implies that the PC hardware has to be added in some way, since normal PC-processors works sequentially and normal programs are written to be executed in that way.
Flow has chosen to use normal PCI-cards. The cards are equipped with an FPGA-chip from Xilinx with two million gates, but the size of the chip can be selected depending on requirements according to Josef Masznik.
The corporate secret lies in the compiler. Software has to be written in Flows own variety of C, and the compiler can decide which processes that wins the most on parallell execution, configuring the FPGA for maximum efficiency.
- The user don't see the FPGA-chip and don't really have to know what kind of hardware there is on the card. We are directed towards programmers - that's where the market is, said Josef Macznik.
Flows solution is currently used by a bioinformationcompany in Lund. But the technology can according to the company be used for all purposes where the computing power in a PC needs to be multiplied using parallelism ane where the effort to adapt their programs to the special variety of C is worthwhile.

gzip is a stream protocol, with bounded state by swine · 2003-03-28 11:48 · Score: 1

from memory, gzip takes in a byte at a time, and outputs a bitstream of huffman-like variable width tokens. As each input token (0..255 and EOF (say 256)) is applied to the engine sequentially, it is possibly replaced by a compound output token (numbered 257...2^N-1) encoding an already-seen sequence (old-output-token, new-input-token).

The 32KB blocking is mainly to simplify resync, IIRC.

There is *no* lookahead in gzip, just that one pending output token, and memory of past input.

Bzip2 OTOH analyzes the entire block (default 900kB) before outputting a single bit, and thus can do a better job with changing pattern space.

Go explore gzip and friends -- they're beautiful.
The trick is to modify the encoder's state (learning a better encoding for a sequence) only *after* that token has been emitted, so the decoder learns exactly the same lesson as the encoder has, just by watching the token stream.
No metadata has to be passed between the machines.

*Really* simple in hardware (for sufficiently complex values of 'simple')

--
^..^ OO (oo)

Hang on... by anthonyrcalgary · 2003-03-29 12:12 · Score: 1

Doesn't urandom need to wait to collect entropy before it produces output?

Also... who told you random data can't be compressed? That's completely wrong.

Consider a binary string X of length 1000000, where the contents of X are set randomly. It is possible for the contents of X to be all 0's.

I can describe the contents of X in fewer bits than the string itself. I shall do so now. "A binary string of length 1000000 where the contents are 0's.".

Since I can describe the string with another binary string of length less than 1000000, I can compress it. Since it's possible for that string to result if it's been randomly chosen, some random strings can be compressed.

--
When someone might yell at me, it has to be OpenBSD.

That's a 32K window, not block by K-Man · 2003-04-02 07:48 · Score: 1

gzip finds repeats among the most recent 32K of the stream it's processing, using a hash table etc. to match its current position against previous ones.

IIRC it hashes the three bytes from its current position and looks for a match against hashes from 32k previous positions, then does a lookup in the hash bucket for as much as it can match following the initial 3 bytes.

The BWT actually sorts every position in the block. It's not streamable in any significant way.

--
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger

141 comments