Domain: compression.ca
Stories and comments across the archive that link to compression.ca.
Comments · 28
-
JPEG recompression
ACT has a JPEG recompression test which clearly shows a bunch of compressors making a JPEG smaller. Even better - there's a great paper by the author of packJPG talking about how to compress a JPEG losslessly using the technique teppples described...
-
Re:Seems like the distributor needs to be slapped
-
Re:tarballs
Solution: pbzip2. With 4+ cores it should be comparable to gzip.
-
Worthless paddingThese tests include neither the Calgary Corpus nor the more recent Canterbury Corpus so there is no baseline to measure their fileset against.
Without that there is "Nothing to see", "Move along, move along"
Perhaps to http://compression.ca/act/act-calgary.html
-
Much more exhaustive test
http://compression.ca/act/ has a much more exhaustive test, and no ads either.
-
Archive Comparison Test
See also: the Archive Comparison Test. Covers 162 different archivers over a bunch of different file types.
It hasn't been updated in a while (5 years), but have the algorithms in popular use changed much? I remember caring about compression algorithms when I was downloading stuff from BBSs at 2400 baud, or trading software with friends on 3.5" floppies. But in these days of broadband, cheap writable CDs, and USB storage, does anyone care about squeezing the last few bytes out of an archive? zip/gzip/bzip2 are good enough for most people for most uses.
-
Re:"Enthusiast Megatasking" is a lousy catchphrase
Forget gzip. You can do SMP or cluster-based bzip though...
http://compression.ca/pbzip2/
http://www.mediawiki.org/wiki/Dbzip2 -
Re:"Enthusiast Megatasking" is a lousy catchphrase
Come on, where's dual-core gzip?
Well, I don't know of a dual-core gzip, but there is an SMP bzip2. Is that good enough? Their tests seem to indicate that the speedup is very close to linear up to around 30 processors.
-
Re:"Enthusiast Megatasking" is a lousy catchphrase
(Come on, where's dual-core gzip?)
Gzip is sufficiently fast that I suspect in most cases it's more limited by your hard drive speed than your CPU speed. There is however, parallel bzip2, which most certainly does benefit from parallelism. -
Re:parallel gzip/bzip2/etc
Parallel BZIP2 (PBZIP2) and bzip2smp are parallel implementations of bzip2. I've not looked for any similar gzip implmentations.
-
Calgary / Canterbury corpus?
If they can't compress the canterbury corpus or calgary corpus beyond 3X, then it's a SCAM.
-
Calgary / Canterbury corpus?
If they can't compress the canterbury corpus or calgary corpus beyond 3X, then it's a SCAM.
-
An even more thorough comparison site
Jeff Gilchrist's Archive Comparison Test has been around for years, and covers many more archivers and uses several different data sets, on several different platforms. It has even been cited in compression literature:
http://www.compression.ca/ -
Another compression test
I used to like this one: Archive Comparison Test, but unfortunately it hasn't seen updates since 2002 for general data compression. However, that's still in the post-WinRAR 3.00 era, and the Windows archiver summary explains a bit why WinRK may win here, but still not be too well-known. Good compression isn't everything -- one often have to keep the speed aspect in mind too. And when you've then picked an archiver with nice compression for the speed, you may start looking at the feature set. Again WinRK isn't state-of-the-art there. It's mostly a pure no frills compressor where you can ignore durations, especially for large archives. Not nearly "an archiver for everyone".
Personally, after a couple of years of testing things out (OK, make that a decade -- time flies), I believe RAR by far exceed most archivers' features nowadays, and also hit the sweet spot of good compression for reasonably good speeds. I think RAR trumps both WinZIP 10, 7-zip, bzip2, and all other common archivers you throw at it as for features, and does really well in the compression field for being so all-around. It can decompress most common archive formats too. For a lower cost than WinZIP, while to me looking just as easy to use.
WinACE was once an archiver preferred by some over RAR, but it sort of died out due to a lack of updates, or at least a lagging behind by RAR's improvements. What once looked promising there now looks more like a rarely used RAR-wannabe to me.
7-zip is the one other archiver that has recently caught my attention because it's open source and generally compress better than RAR, still at pretty good speeds. However, it's nowhere near RAR's feature set and lacks pretty large chunks of important features for me to use it still, but I keep having an eye on it, and I don't dislike it at all, and can clearly understand why some prefer it. 7-zip has become my favorite over bzip2 (in turn over gzip) now as my favorite open source archiver, and its cross-platform support is looking better these days with OS X, Debian, Fedora, and Gentoo support, although unofficial, directly from its home page. -
Another compression test
I used to like this one: Archive Comparison Test, but unfortunately it hasn't seen updates since 2002 for general data compression. However, that's still in the post-WinRAR 3.00 era, and the Windows archiver summary explains a bit why WinRK may win here, but still not be too well-known. Good compression isn't everything -- one often have to keep the speed aspect in mind too. And when you've then picked an archiver with nice compression for the speed, you may start looking at the feature set. Again WinRK isn't state-of-the-art there. It's mostly a pure no frills compressor where you can ignore durations, especially for large archives. Not nearly "an archiver for everyone".
Personally, after a couple of years of testing things out (OK, make that a decade -- time flies), I believe RAR by far exceed most archivers' features nowadays, and also hit the sweet spot of good compression for reasonably good speeds. I think RAR trumps both WinZIP 10, 7-zip, bzip2, and all other common archivers you throw at it as for features, and does really well in the compression field for being so all-around. It can decompress most common archive formats too. For a lower cost than WinZIP, while to me looking just as easy to use.
WinACE was once an archiver preferred by some over RAR, but it sort of died out due to a lack of updates, or at least a lagging behind by RAR's improvements. What once looked promising there now looks more like a rarely used RAR-wannabe to me.
7-zip is the one other archiver that has recently caught my attention because it's open source and generally compress better than RAR, still at pretty good speeds. However, it's nowhere near RAR's feature set and lacks pretty large chunks of important features for me to use it still, but I keep having an eye on it, and I don't dislike it at all, and can clearly understand why some prefer it. 7-zip has become my favorite over bzip2 (in turn over gzip) now as my favorite open source archiver, and its cross-platform support is looking better these days with OS X, Debian, Fedora, and Gentoo support, although unofficial, directly from its home page. -
Re:Question: What needs multiple threads?
Actually that's not necessarily true. It's definitely true right now though. Most developers haven't really been tought to think in terms of parallelism when designing software, but that's starting to change.
It's all about the algorithms. Once multi-core chips have been mainstream for a while, all the algorithms out there will start to get converted to take advantage of parallel processing. And there are already algorithms out there that do this... this page has a small repository of parallel implementations of common algorithms including QuickSort, hashing techniques (for super fast searching), string operations (which every application in existence uses), and more.
Now I know this isn't always possible, but in many cases it is. Almost every program out there uses search and sort algorithms. Your address book does it, your web browser does it. These algorithms can be implemented to take advantage of having multiple processors.
A lot of operations can actually be modified to take advantage of this stuff. See the pbzip2 project that achieves a near linear speed up per processor!
Almost every algorithm out there can be modified to take advantage of muliple cores. Things like video/audio decoding are prime candidates (a lot of research is currently happening in this area).
It may take a generation of programmers and then another generation or two of applications to start really taking advantage of parallelism, but mark my works: once this stuff is mainstream, you'll start to really see some performance like never before. -
Re:Smart. Scary.
they'd have to do it lossy, and so now pics look like crap.
No, they wouldn't. -
Re:Good news!
Of course, RAR is not the best either...
-
Hyperthreading is Overrated
Just look at the scaling graphs here for parallel bzip2. Note the almost linear scaling on proper SMP and NUMA architectures vs. the embarrassing curve on intel's old-fashioned bus architecture with dual pentium 4 xeons with hyperthreading. Also notice the high clock frequency on the intel processors compared with the performace achieved on the "slower" ones...
-
Re:A little factoid for you
Just a little note: A version of bzip does exist that scales lineary on SMP machines - you can find it here.
-
Re:Hey!
http://www.compression.ca/act-calgary.html
The leading compressor there gets 1.8226 bits per byte on a large archive consisting of several different data types (text, images, code).
If you're just talking plain english text, then a dedicated text compressor can do a bit better through the use of a large dictionary file. Not much better though, general compression algorithms are amazingly good.
-
Re:README: From the Authors
Nobody should use RAR. [...] gzip has better compression
Why don't you check the facts? gzip has a quite average compression, or less so. RAR is one of the best and it's even much more efficient than bzip.
-
Re:Winzip
Have a look at the Archive Comparison Test page. WinACE does better than RAR in some areas, but not in all. And neither of them are the absolute best (in terms of compressed file size) at anything.
-
Re:bzip
Uhh
... excuse me, but why aim at such a lowly achievment? As a piece of serious tehcnology BZIP sucks serious ass in terms of compression ratio.
See here. There are over a dozen better alternatives that have better compression than bzip.
(And of course winrar won't create bzip files -- they are *bloated* in comparison to the .rar format. What would be the point??) -
Re:bzip
For comparison purposes, I downloaded cs94_002.zip and recompressed it with the latest version of WinRAR (3.10 beta 3), set to maximum compression. The result:
cs94_002.rar (Source) 9.4MB (9,407,157 bytes)
WinRAR appears to compress much better than bzip2; however, it isn't free. Interestingly, as good as WinRAR is, even it doesn't come that close to having the best compression ratio out there.
For lots of useful statistics on the relative capabilities of virtually every compression engine in the world, check out Jeff Gilchrist's Archive Comparison Test. A lot of progress is still being made in compression technology, so the state of the art keeps changing. -
Re:They're dumb at the same time.
(also the home of arj and other odd archivers that are still not as good or just as good as gzip+tar, too bad they've never heard of bzip2).
Yeah... nothing like stereotypes or popular thought to cloud hard facts, eh?
In the Sound (WAV) Compression Test on compression.ca the GZip 1.2.4 + TAR combo comes in at 7.29b/B (91%), bzip2 0.9.5d + TAR is at 7.01b/B (87%). RAR on the other hand, comes in at 5.65b/B (70%) and Monkey's Audio 3.96 rocks in at 5.01b/B (62%).
So my 10mb of WAV takes up 9.1MB after being GZiped and 7.0MB after compressing it with that odd archive that [is] still not as good or just as good.
GZip and bzip are *excellent* compression tools. But they are not - and have not been for a long time - the kings of the hill. -
Re:They're dumb at the same time.
(also the home of arj and other odd archivers that are still not as good or just as good as gzip+tar, too bad they've never heard of bzip2).
Yeah... nothing like stereotypes or popular thought to cloud hard facts, eh?
In the Sound (WAV) Compression Test on compression.ca the GZip 1.2.4 + TAR combo comes in at 7.29b/B (91%), bzip2 0.9.5d + TAR is at 7.01b/B (87%). RAR on the other hand, comes in at 5.65b/B (70%) and Monkey's Audio 3.96 rocks in at 5.01b/B (62%).
So my 10mb of WAV takes up 9.1MB after being GZiped and 7.0MB after compressing it with that odd archive that [is] still not as good or just as good.
GZip and bzip are *excellent* compression tools. But they are not - and have not been for a long time - the kings of the hill. -
Compression Tech Link
The current state of the art in compression technology is benchmarked by Jeff Gilchrist at his site which includes current benchmarks in image compression technology too.