Exhaustive Data Compressor Comparison

duh by Gen.+Malaise · 2007-04-22 14:12 · Score: 5, Funny

Nothing to see. High compression = slow and low compression = fast. umm duh?

Re:duh by timeOday · 2007-04-22 14:17 · Score: 2, Insightful

So you alreay knew WinRK gave the best compression? I didn't; never even heard of it. My money would have been on bzip2.
Re:duh by setirw · 2007-04-22 14:18 · Score: 5, Funny

High compression = slow and low compression = fast

You compressed the article into that statement. How long did it take to write the comment?

--
This message printed on 100% post-consumer recycled electrons.
Re:duh by kabeer · 2007-04-22 14:26 · Score: 2, Insightful

Compressing the article into that statement would technically be classed as a lossy compression e.g. jpeg.
Re:duh by aarusso · 2007-04-22 14:27 · Score: 3, Informative

Well, since the dawn of ages I saw ZIP v ARJ, bzip2 vs gzip.
What's the point? Same programs compressing same data on a different computer.

I use gzip for big files (takes less time)
I use bzip2 for small files (compresses better)
I use zip to send data to Windows people
I really, really miss ARJ32. It was my favorite on DOS Days.
Re:duh by Petrushka · 2007-04-22 14:39 · Score: 1

I take it you didn't look at the "Compression Efficiency" graph at the bottom of each page.

Of course they don't seem to reveal their methodology for calculating that graph, but even a glance at the other tables will show that, for example, Stuffit is almost always much faster saves very nearly as much space as 7-Zip (sometimes more). That's why comparisons like this are interesting.
Re:duh by dotgain · 2007-04-22 14:42 · Score: 2, Funny

So you alreay knew WinRK gave the best compression? I didn't; never even heard of it.
Well thank heavens we have now! If there's one area of computing I've always felt I wasn't getting enough variety, it's compression algorithms and the associated apps needed to operate with them.
If there's one thing that brightens my day, is a client sending me a PDF compressed with "Hey-boss-I-fucked-your-wife-ZIP" right on deadline.
Re:duh by dotgain · 2007-04-22 14:47 · Score: 1

I should give credit to Profane MuthaFucka (574406) for the "Hey, boss..." name he coined here.
Re:duh by Anonymous Coward · 2007-04-22 14:54 · Score: 0

I've done such a test about a month ago, but not with the exact same apps:

My quick summary:
-there's some very fast apps (like using the zip format), but compression is so low it's almost pointless to bother with it in the first place
-there's some higher compression formats, but the last few percent you usually get from them often means doubling or even more the compression time -- not worth the wasted time, unless bandwidth/file size is absolutely critical. This usually means using compressors most people haven't heard of and annoys them (like .7z archives, which most people will complain that they can't open it with winzip -- .ace and .arj were very popular back then, now nowadays...)
-my winner? Winrar. Why? Ultimate speed:compression ratio/tradeoff. I challenge anyone to find something (preferably without using some obscure format) that has a better compression ratio:time spent -- I haven't found one. It's almost as good as the most extreme compressors that take forever, but it's still very fast. Also, winrar's GUI is much nicer and intuitive than many of the others, it does multipart archives real well, it handles decompression of most formats, has decent shell integration and all. Not free though.

And when I need more "extreme" compression I use 7zip with the ultra setting, but it's ~8x slower than Winrar, for something around 10% smaller most of the time.
Re:duh by h2g2bob · 2007-04-22 15:19 · Score: 5, Funny

Oh, if only they'd compressed the article onto a single page!
Re:duh by morcego · 2007-04-22 16:03 · Score: 4, Informative

So you alreay knew WinRK gave the best compression? I didn't; never even heard of it. My money would have been on bzip2.

I agree with you on the importance of this article but ... bzip2 ? C'mon.
Yes, I know it is better than gzip, and it is also supported everywhere. But it is much worst than the "modern" compression algorithms.

I have been using LZMA for some time now for things I need to store longer, and getting good results. It is not on the list, but should give results a little bit better than RAR. Too bad it is only fast when you have a lot of memory.

For short/medium time storage, I use bzip2. Online compression, gzip (zlib), of course.

--
morcego
Re:duh by cbreaker · 2007-04-22 16:09 · Score: 2, Insightful

Hell yea. Although ARJ had slightly better compression, it allowed for *gasp* two files in the archive to be named the same!

Now a days it's all RAR for the Usenet and Torrents and such. RAR is really great but it's piss slow compressing anything. It's just so easy to make multipart archives with it.

I really wish Stuffit would go away ..

--
- It's not the Macs I hate. It's Digg users. -
Re:duh by MillionthMonkey · 2007-04-22 16:12 · Score: 2, Funny

Server Error in '/' Application.
Server Too Busy
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.Web.HttpException: Server Too Busy
Source Error: An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.
Stack Trace:
[HttpException (0x80004005): Server Too Busy]
System.Web.HttpRuntime.RejectRequestInternal(HttpW orkerRequest wr) +148
Version Information: Microsoft .NET Framework Version:1.1.4322.2300; ASP.NET Version:1.1.4322.2300

Someone should write an article about how you should always replace your default error screens and remove information identifying your server software and version.
Re:duh by Firethorn · 2007-04-22 16:37 · Score: 4, Interesting

Not only that, but you can sacrifice compression to create recovery capability in the case of lost/corrupted data, especially in the newer ones.

Missing part 3 of 10? No problem!

Of course, I'm a holder of a license for Rar from way back when. I like it.

--
I don't read AC A human right
Re:duh by yppiz · 2007-04-22 16:38 · Score: 2, Informative

Another problem is that gzip has compression levels ranging from -1 (fast, minimal) to -9 (slow, maximal), and I suspect he only tested the default, which is either -6 or -7.

I wouldn't be surprised if many of the other compression tools have similar options.

--Pat
Re:duh by Anonymous Coward · 2007-04-22 17:03 · Score: 0

I completely agree with you. http://www.haabi.com/
Re:duh by AncientPC · 2007-04-22 17:04 · Score: 3, Insightful

They do test between different comparison levels. The problem is they haven't posted any of the results yet which makes this article incomplete and useless.
Re:duh by timeOday · 2007-04-22 17:07 · Score: 4, Informative

I agree with you on the importance of this article but ... bzip2 ? C'mon.
Well, now I know.
Here's a scatterplot of resulting file sizes and compression times from the text compression data (lower is better), and as my luck would have it, bzip2 is really the only one that's out of line - i.e. the furthest from the pareto frontier. But then, looking at the same data with file sizes plotted in the range of [0.0, 1.0], it seems like there's a major case of diminishing returns for the expensive algorithms anyways. If you care at all about compression time, good ol' gzip is still a pretty decent choice!
Re:duh by Compact+Dick · 2007-04-22 17:30 · Score: 4, Informative

LZMA ... is not on the list 7-Zip [included in the test] is based on LZMA.

--
Use ISO 8601 dates [YYYY-MM-DD]
Re:duh by OmnipotentEntity · 2007-04-22 17:40 · Score: 3, Informative

7zip's default compression is LZMA, FYI.

--
"Build a man a fire warm him for a day, set a man on fire and warm him for the rest of his life."
Re:duh by morcego · 2007-04-22 17:45 · Score: 1

Yes, but there are many settable parameters for LZMA. So, using LZMA Utils will often give better results than 7-Zip.

--
morcego
Re:duh by edwardaux · 2007-04-22 19:41 · Score: 2, Funny

High compression = slow and low compression = fast

You compressed the article into that statement. How long did it take to write the comment?

Meh, that's not compression.
2@Bcompression = #4low High @s#and #@fast

41 vs 50 chars. Clearly an 18% improvement :-)
--
edwardaux
Re:duh by jimicus · 2007-04-22 20:11 · Score: 1

Yes, I know it is better than gzip, and it is also supported everywhere.

"Supported everywhere" is pretty nice to have. Particularly if you can't guarantee what kind of system you'll be recovering data to in the event of having to go back to old backups.

I've stored things in obscure formats before. It's a PITA if you find your platform changes drastically 1 year later, the software which wrote the files isn't available on the new platform and you need to restore something from a pre-platform-change old backup 2 years later. Basically, it's a variation on the "but all our documents are .DOC!" issue which keeps so many people using Office but substantially worse because at least Office is ubiquitous enough that a lot of the hard work of reverse-engineering the format has been done.

That being said, if all the algorithms used here are free to use, documented and not patented, it's much less of an issue.
Re:duh by VON-MAN · 2007-04-22 20:24 · Score: 1

I have been using LZMA for some time now for things
Good choice. The Exhaustive Data Compressor Comparison for Linux won't be found in this article of course. And since the author tests exotic Windows software against stock linux tools in a rather amateurist way this isn't interesting for linux users.

Try this http://www.linuxjournal.com/article/8051 article for a linux slant.
Re:duh by Anonymous Coward · 2007-04-22 21:46 · Score: 0

I wish LZMA Utils could at least handle 7z LZMA files. Then we could do without the laughably bad "portable" version of 7zip. As it is, the only 7z decompressors I've seen anywhere are either Windows only, unusable, or written in a bloody scripting language (Use Ruby to read 7z files? But of course!)
Re:duh by packeteer · 2007-04-22 21:47 · Score: 1

Small=slow Big=fast

There is compressed it further. Its kind of like when i ZIP up all my ZIP files. I mean you might as well right. Im pretty sure all the data in the world can be compressed down to a single 1 or 0 with enough computing power.

--
unzip; strip; touch; finger; mount; fsck; more; yes; unmount; sleep
Re:duh by jnnnnn · 2007-04-22 22:44 · Score: 1

Actually, I've often wondered if a tool exists that will do exactly that - namely, take a large amount of text and cut out all the padding, leaving just the information. Does anyone know of one? I know MS Word has a "summarize" tool.. It would be quite useful when dealing with all those texts that actually contain no information at all.
Re:duh by MoHaG · 2007-04-22 23:38 · Score: 1

In many cases 7-zip gives better compression than RAR.... I just wish they want to add the maximum compression results. (They are probably still waiting for WinRK to finish....)
Re:duh by Adhemar · 2007-04-22 23:44 · Score: 1

Not only that, but you can sacrifice compression to create recovery capability in the case of lost/corrupted data, especially in the newer ones.
Not only that, you can even increase compression rate and performance if you sacrifice the ability to recover/decompress losslessly. See this article on lzip, the lossy advanced file compression utility.
Re:duh by Firethorn · 2007-04-22 23:49 · Score: 1

LZip is not RAR, and I like lossless compression.

I like my programs to work.

--
I don't read AC A human right
Re:duh by Anonymous Coward · 2007-04-23 00:43 · Score: 0

It's not really computing power, it's complexity of zip utility.

Ultrazipper (patent pending):
if $data=1 copy /usr/local/ultrazipper/allworldsdata.bin to $filename if $data=0 print "CRC ERROR" end

You can only compress one particular file, but you can get a hell of a compression ratio!
Re:duh by maxwell+demon · 2007-04-23 02:25 · Score: 1

That's nothing. I can compress files to 0 bytes and decompress them again on Linux. :-)

Here are simple compression/decompression scripts (no error handling etc.):

Compression:
#!/bin/bash setfattr -n user.content -v "$(xxd $1)" $1 cat </dev/null >$1
Decompression:
#!/bin/bash getfattr --only-values -n user.content $1 | xxd -r - $1 setfattr -x user.content $1
Don't try it on important files, of course.

--
The Tao of math: The numbers you can count are not the real numbers.
Re:duh by cbreaker · 2007-04-23 02:49 · Score: 1

Yea I haven't had much bad to say about 7-zip. I also like that it appears to be GPL (at least the SourceForge project says it is.)

RAR isn't open source, although they do *basically* give it away for free. You can use WinRAR indefinitely with nag screens only if you open the full program, and all the "unrar" tools available on other platforms are free. RAR really filled a need for an archiver that was friendly to multiple parts and didn't have some of the legacy limitations of ZIP.

But, I really do prefer GPL software when it comes to things like this. Who wants to be locked out of some old archives because there's no tool to extract it with your current version of OS-Whatever? You'll have to pull out old hardware or emulation to do it. With GPL stuff, chances are high that you'll always find a current version of the tool.

--
- It's not the Macs I hate. It's Digg users. -
Re:duh by toleraen · 2007-04-23 02:59 · Score: 1

Each efficiency graph shows "KB Saved /s" on the x-axis. Pretty straight forward...KB saved (not KiB) divided by time to compress.
Re:duh by Anonymous Coward · 2007-04-23 03:15 · Score: 0

I'd love to see some algorithm which was intelligent enough to crop out all the "story" fluff that goes into a lot of news (presumably as a gimmick to get people who aren't interested in technology or science to read the whole article). Although it has been done forever, it seems to be getting worse lately as more professional news writers take on a "weblog" approach to writing their articles.

Generally these articles start out with a bunch of unimportant details: what the weather was like, what they had for breakfast, transportation to the interview, etc. I don't care about every minor detail of the reporter's life, just give me the damn news.
Re:duh by yppiz · 2007-04-23 05:51 · Score: 1

Minor nit - they list the default compression for gzip as -5, but it's -6. From the man page:
-# --fast --best Regulate the speed of compression using the specified digit #, where -1 or --fast indicates the fastest compression method (less compression) and -9 or --best indicates the slowest com- pression method (best compression). The default compression level is -6 (that is, biased towards high compression at expense of speed).
--Pat
Re:duh by Anonymous Coward · 2007-04-23 05:57 · Score: 0

Wow, what a misleading graph. The difference in 0.56 to 0.71 on the file size axis is only a 27%; that's barely significant. Meanwhile, the time axis covers a factor of roughly 20x.

Pareto frontier my ass. There's a strict lower bound on the compression of a particular given data set, and there's also a lower bound on the average compression for a particular type of file. For example, pseudo-random binary data files and partially compressed files cannot be compressed as well as binary program files, which cannot be compressed as well as source code or English language text.

It will require infinite resources to reach the theoretical average case lower bound, which is not much different from the current state of the art in compression. As an engineer, my view is that they've already crossed the line between efficient and a waste of time for most applications. By looking at the graph I quickly conclude that nothing above or to the left of WinRar and Stuffit will ever be successful, but that those two are not sufficiently smaller than winzip/gzip, so they will never take off either.

In short, winzip/gzip will dominate because they're in wide use and they perform reasonably well. The "Z0MG my compression is 3.5% better than yours" crowd will still get its rocks off on WinRK and WinAce, but they'll eventually grow up and realize that spending 2x as much CPU to get 3.5% payoff is a waste of electricity, and nobody wants to have a dozen compression/decompression programs.

One question... by Anonymous Coward · 2007-04-22 14:14 · Score: 0

Which compression format are you going to send the article....

WOW! by vertigoCiel · 2007-04-22 14:15 · Score: 5, Funny

I never would have guessed that there was a tradeoff between the quality and speed of compression! No way! Next they'll be saying things like 1080p HD offers quality at the expense of computational power required!

Re:WOW! by seanadams.com · 2007-04-22 14:28 · Score: 1

I never would have guessed that there was a tradeoff between the quality and speed of compression! No way! Next they'll be saying things like 1080p HD offers quality at the expense of computational power required!

If you really mean quality (as opposed to compression ratio) you've got it backwards. Lossless compression algorithms are generally simpler than lossy ones, especially on the encode side. Lossy algorithms have to do a lot of additional work converting signals to the frequency domain and applying complex perceptual models.
Re:WOW! by renegadesx · 2007-04-22 16:05 · Score: 1

No dont forget people are rediculous in their claims, next they will say that 1080p takes up more disk space. Next thing you know Bill Gates will go on record saying you will actually need more than 640K of RAM

--
Make SELinux enforcing again!
Re:WOW! by Anonymous Coward · 2007-04-23 12:10 · Score: 0

Not to mention that you can indeed press F12 to continue if your keyboard is missing.
Re:WOW! by vertigoCiel · 2007-04-23 13:38 · Score: 1

I did mean compression ratio. That's the term I was looking for before I wrote "quality."

Screw speed, size reduction: gimme compatibility by xxxJonBoyxxx · 2007-04-22 14:17 · Score: 5, Insightful

Screw speed and size reduction. All I want it compatibility with other OSs (i.e., fewest things that have to be installed on a base OS to use it). For that, I'd have to say Zip and/or gzip wins.

small = slow by Anonymous Coward · 2007-04-22 14:17 · Score: 5, Funny

So that's why smaller computers are slower, right?

Re:small = slow by BrokenHalo · 2007-04-22 17:13 · Score: 1

The GP was right. Nothing to see here.

I just read TFA, and all the results were "TBA" and "coming soon!". Looks like a really exhaustive and comprehensive review, and I'll bear it in mind next time I'm using gzip and I need to make an extra 13 bytes available on my 5-terabyte hard drive.
Re:small = slow by Anonymous Coward · 2007-04-23 03:04 · Score: 0

I just read TFA, and all the results were "TBA" and "coming soon!".
Seems to be a work in progress. There's some interesting results, but I'd also like to see decompression times as well, usually I'm less concerned about how long it takes to compress (happens once) and more about how long it takes to decompress (happens MANY times). I'd also be curious how these algoritms compare to NTFS file compression (easy and painless)
Re:small = slow by HTH+NE1 · 2007-04-23 04:29 · Score: 1

I'm more interested in how resilient they are to damage. If you corrupt one bit of an uncompressed file, you can usually fix it with little difficulty, but if it is a compressed file, you tend to lose the entire file, or at least all the contents at that bit and thereafter (depending on whether the decompresser will allow you to retain the work product of the partially decompressed file).

I have some GIFs that I've wanted to restore that have damage to multiple consecutive byte pairs in the file. The algorithm for GIF is simple enough that one could use trial and error to find what the damaged bytes were supposed to be (without trying the 65535 alternate possibilities) so that you could decode until the next damaged pair. Yet I still haven't found sufficient public documentation for this format due to the patents causing the code to be pulled from the net which haven't been republished since the patent expired. (I have less hope for restoring the damaged JPGs.)

They are images containing text from a website that no longer exists and which managed not to be preserved at archive.org. They are interlaced, so I wouldn't even need to repair the entire image to deduce the text. It was a short retelling of the story "The War of the Worlds" from a modern perspective of an IRC-like chat room. The story was entitled "The Last Chat Room". I was converting it to a Flash presentation when I tried hooking up an ATA drive as slave to a controller that didn't properly support both a master and slave drive (B&W G3), ironically for the purpose of making a backup.

--
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
Re:small = slow by blincoln · 2007-04-23 08:03 · Score: 1

There should be plenty of OSS image-viewing programs available for you to use as a reference for the GIF spec. If I can find example code for something strange and proprietary like Hitachi/Sega's VQ texture format (and I have), GIF should be a cakewalk.

--
"...always new atoms but always doing the same dance, remembering what the dance was yesterday." -Richard Feynman
Re:small = slow by ImaLamer · 2007-04-30 06:38 · Score: 1

I'm Swiss you insensitive clod!

--
Get your Unix fortune now!

I keep it simple by Anonymous Coward · 2007-04-22 14:17 · Score: 5, Funny

I fill an old station wagon with backup tapes, and then put it in the crusher.

Re:I keep it simple by Gordonjcp · 2007-04-22 22:17 · Score: 1

I compress my data like this...

Maximum compression? by Anonymous Coward · 2007-04-22 14:18 · Score: 1, Informative

http://www.maximumcompression.com/ ?

Not really by Toe,+The · 2007-04-22 14:19 · Score: 3, Insightful

Not every software achieves maximum efficiency. It is perfectly imaginable that a compressor could be slow and bad. It is nice to see that these compressors did not suffer that fate.

Skip the blogspam by Anonymous Coward · 2007-04-22 14:19 · Score: 5, Informative

as its slashdotted

this site
http://www.maximumcompression.com/
has been up for years and performs tests on all the compressors with various input sources, much more comprehensive

Re:Skip the blogspam by Darkinspiration · 2007-04-22 14:33 · Score: 0

My god this site list more than 150 compression algo... never taught that there was so many of them. You learn new thing every day.
Re:Skip the blogspam by mwilliamson · 2007-04-22 15:13 · Score: 1

CoralCDN, the poor man's slashdot effect countermeasure.
http://www.techarp.com.nyud.net:8090/showarticle.a spx?artno=4&pgno=0
Re:Skip the blogspam by xigxag · 2007-04-22 16:05 · Score: 1

maximumcompression.com is an excellent site but it just compares compression ratio, not speed. Hence for some people, it's of limited use.

And of course, there are other factors that these types of comparisons rarely mention or that are harder to quantify: Memory footprint, compression speed while multitasking, both foreground and backgound, single anad dual core, OS/gui integration, cross-platform availability, availability of source code, cost (particularly for enterprise users), backup options (how quiet is quiet mode), processor load (to what extent will it interfere with the use of a multimedia app), spanning options, etc. Raw comparisons are fine, but once you've eliminated the ludicrously slow/inefficient programs, you need to actually try the remaining choices before committing to them.

--
There are two kinds of people: 1) those who start arrays with one and 1) those who start them with zero.
Re:Skip the blogspam by _|()|\| · 2007-04-22 16:18 · Score: 2, Interesting
After scanning MaximumCompression's results (sorted by compression time) the last time one of these data compression articles hit Slashdot, I gained a newfound appreciation for ZIP and gzip:
- they compress significantly better than any of the faster (and relatively obscure) programs
- the programs that compress significantly better take more than twice as long
- they're at the front of the pack for decompression time
If you have a hard limit, like a single CD or DVD, then the extra time is worth it. Otherwise, look no further than the ubiquitous ZIP.
Re:Skip the blogspam by Spikeles · 2007-04-22 17:19 · Score: 2, Insightful

maximumcompression.com is an excellent site but it just compares compression ratio, not speed. Hence for some people, it's of limited use.
See this page? http://www.maximumcompression.com/data/summary_mf. php
What are the headers along the top? let's see..

Pos, Program, Switches used, TAR, Compressed, Compression, Comp time, Decomp time, Efficiency

OMG!.. is that a "time".. as in speed column i see there?

--
I don't need to test my programs.. I have an error correcting modem.
Re:Skip the blogspam by xigxag · 2007-04-23 00:41 · Score: 1

I'll grant you that, but only a small subset of the tests compare the speed, and he didn't attempt any optimizations in that regard. In other words, he tried to eke out the maximum compression of each program (living up to the name of the site) but didn't try to do anything to compare the speed of the programs under various settings. Hence, for some people, it's of limited use. Which is what I said earlier. I'm not slamming the site, it's good for what its intended purpose is.

--
There are two kinds of people: 1) those who start arrays with one and 1) those who start them with zero.
Re:Skip the blogspam by bigbigbison · 2007-04-23 01:23 · Score: 1

While I prefer .7z it is true that zip's ubiquitousness is a great advantage. However, even within the zip format there is a pretty decent variation on how well various programs will compress. With the exception of the new non-standard versions of zip files that winzip has started using (and few if any other programs can open) the smallest and yet still totally compatible zip files I've been able to make have been with 7zip and pass=4 as the parameters. Beats winzip every time.

--
http://www.popularculturegaming.com -- my blog about the culture of videogame players
Re:Skip the blogspam by _|()|\| · 2007-04-23 04:31 · Score: 1

Thanks for pointing out the problems with WinZip. I no longer use it, but it's good to know that there may be non-standard ZIP archives out there.

Looking at the MaximumCompression results, 7-Zip takes more than twice as long as gzip to compress and decompress in each of the three tested modes, so I would generally not bother. However, the compression is significantly better, so I would consider it if space was really at a premium.

/. effect rears its ugly head once again! by Anonymous Coward · 2007-04-22 14:20 · Score: 0

s'all she wrote, Jim. Coral cache of it works, though.
http://www.techarp.com.nyud.net:8080/showarticle.a spx?artno=4&pgno=0

Re:/. effect rears its ugly head once again! by killa62 · 2007-04-22 14:23 · Score: 5, Funny

yep, looks like they're using WinRK on the fly to decompress the website from storage

This is nothing new by Anonymous Coward · 2007-04-22 14:20 · Score: 1, Informative

I remember people did MUCH more exhaustive (30+ programs) comparisons back in the BBS days. Yes... it was a much simpler time.

Re:This is nothing new by Starburnt · 2007-04-22 15:29 · Score: 4, Funny

So they've compressed it to 11. I'd say that's a step forward.

How quick does it compress when slashdotted? by syousef · 2007-04-22 14:22 · Score: 1

Bit hard to have a spoiler when the article isn't available.

--
These posts express my own personal views, not those of my employer

What about LHA, TAR by Anonymous Coward · 2007-04-22 14:23 · Score: 2, Insightful

These two formats are still widely used out there, and why are we compressing MP3's?

Re:What about LHA, TAR by 644bd346996 · 2007-04-22 14:42 · Score: 4, Informative

TAR is not a compressor.
Re:What about LHA, TAR by SirSlud · 2007-04-22 14:43 · Score: 3, Funny

TAR for compression? I woulda thought you were trolling if you didn't have LHA up there. Too bad you're anonymous, you'll never get to find out how unqualified you are for participating in this discussion.

--
"Old man yells at systemd"
Re:What about LHA, TAR by Ant+P. · 2007-04-22 20:38 · Score: 1

Actually, it can compress sparse files.
Re:What about LHA, TAR by SashaM · 2007-04-22 22:21 · Score: 1

TAR for compression?
Well, it'd certainly take 1st place for speed.
Re:What about LHA, TAR by Anonymous Coward · 2007-04-22 23:06 · Score: 0

Another aspect to consider: fs slack space.
How much space do 10 000 files 1 byte each require on your file system? Might be 10 000 bytes plus metadata but anything above 10 000 x 512 (5 120 000) would be more typical, as far as I know. Tar them and you get a single file some 15 000 bytes in size. You just saved at least 5 000 000 bytes.
Now go and count slack space on your windows partition.
Re:What about LHA, TAR by Anonymous Coward · 2007-04-23 00:26 · Score: 0

Well, it'd certainly take 1st place for speed. If you're I/O-bound it almost certainly won't.
Re:What about LHA, TAR by Anonymous Coward · 2007-04-23 02:08 · Score: 0

Too bad you're anonymous, you'll never get to find out how unqualified you are for participating in this discussion.
Why does anonymity suggest one cannot get feedback on a post? Are you aware that posts have their own unique identifier?

Interesting, needs better graphs by MBCook · 2007-04-22 14:23 · Score: 4, Informative

I read this earlier today through the firehose. It was interesting, but the graphs are what struck me. It seems to me all the graphs should have been XY plots instead of pairs of histograms. That way you could easily see the relationship between compression ratio and time taken. Their "metric" for showing this, basically multiplying the two numbers, is pretty bogus and isn't nearly as easy to compare. With the XY plot the four corners are all very meaningful. One is slow with no compression, one each good compression/time, and the sweet spot of good compression and good time. It's easy to tell those on two opposing corners apart (good compression vs good time), where as with the article's metric they could look very similar.

Still, interesting to see. The popular formats are VERY well established at this point (ZIP in Windows and Mac (stuffit seems to be fading fast), and GZIP and BZIP2 on Linux). They are so common (especially with ZIP support built into Windows since XP and also built into OS X) I don't think we'll see them replaced any time soon. Of course, with CPU power getting cheaper and cheaper we are seeing formats that are more and compressed (MP3, H264, Divx, JPEG, etc) so these utilities are becoming less and less necessary. I no longer need to stuff files on floppies (I've got the net, DVD-Rs, and flash drives). Heck, if you look at some of the formats they "compressed" (at like 4% max) you almost might as well use TAR.

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.

Re:Interesting, needs better graphs by TubeSteak · 2007-04-22 14:57 · Score: 1

Heck, if you look at some of the formats they "compressed" (at like 4% max) you almost might as well use TAR.
For high bandwidth websites, saving 4% means saving multiple GBs of traffic

And I still zip up multiple files for sending over the internets.

--
[Fuck Beta]
o0t!
Re:Interesting, needs better graphs by karnal · 2007-04-22 15:02 · Score: 1

Of course, with CPU power getting cheaper and cheaper we are seeing formats that are more and compressed (MP3, H264, Divx, JPEG, etc)so these utilities are becoming less and less necessary. You do realize that you're talking about two different datasets whether you're talking something like .zip and then something like .mp3??? The more and more compressed options you spoke of only work well because they're for specific applications - and they're lossy to boot; the typical compression tools are lossless and for any data set.

I don't think common compression libraries/utilities will ever fade, where there's a data set, there's always a need to get it just a little smaller....

--
Karnal
Re:Interesting, needs better graphs by Anonymous Coward · 2007-04-22 15:29 · Score: 0

You do realize that you're talking about two different datasets whether you're talking something like .zip and then something like .mp3??? The more and more compressed options you spoke of only work well because they're for specific applications - and they're lossy to boot; the typical compression tools are lossless and for any data set.
His point is that as the former become more widely used (because of CPUs that can handle them on-the-fly) the latter become less relevant. You don't need to compress mp3's like you did with wav's.
Re:Interesting, needs better graphs by timeOday · 2007-04-22 17:12 · Score: 2, Informative

It was interesting, but the graphs are what struck me. It seems to me all the graphs should have been XY plots instead of pairs of histograms.
Yup..
Re:Interesting, needs better graphs by Petrushka · 2007-04-22 17:24 · Score: 1

That's very interesting. Looks like WinRAR is sitting in a pretty sweet spot in that hyperbola.

Which ones of these run cross platform by rminsk · 2007-04-22 14:23 · Score: 1

Which compressors on the list run on non windows platforms?

Re:Which ones of these run cross platform by vertigoCiel · 2007-04-22 14:31 · Score: 1

Only gzip, bzip2, and Stuffit run multi-platform, although other programs to uncompress most of the file types used are available on most platforms.
Re:Which ones of these run cross platform by Lehk228 · 2007-04-22 14:36 · Score: 1

http://p7zip.sourceforge.net/ 7zip does as well

--
Snowden and Manning are heroes.
Re:Which ones of these run cross platform by metamatic · 2007-04-22 16:17 · Score: 1

Only gzip, bzip2, and Stuffit run multi-platform, although other programs to uncompress most of the file types used are available on most platforms.

That's a bit misleading. For example, PKzip may not be multi-platform, but there are good native Zip compression and decompression programs available for every major platform.

--
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
Re:Which ones of these run cross platform by Anonymous Coward · 2007-04-22 22:08 · Score: 0

Bleurgh. p7zip is an awful hack; basically the Win32 7zip sources with a bunch of hacks and wrappers to make it "portable". This is as stable and effective as you might imagine. p7zip isn't portable in the sense that it can be built on any POSIX system, it's portable in the sense that it compiles on Linux and maybe one or more BSD's, depending on the direction of the wind and what's broken this week. It's horrendously unstable and is hopefully only a stopgap until someone puts a proper 7z frontend on LZMA Utils.
Re:Which ones of these run cross platform by Anonymous Coward · 2007-04-22 22:47 · Score: 0

Of the top of my head... .ACE (unace) and arj (unarj) archives can be decompressed on *Nix .RAR, (rar/unrar) .zip (zip/unzip), 7zip (7zip), , gzip/bzip2 (gzip/bzip2) and can be both created and decompressed on *Nix.
Stuffit was common on classic Mac OS, I wasn't aware that it was still commonly used.

I haven't heard of the rest.

no best compression results? by Uksi · 2007-04-22 14:23 · Score: 1

You have gotta be kidding me, article is posted and there are no best compression test results! Lame!

Re:Screw speed, size reduction: gimme compatibilit by Nogami_Saeko · 2007-04-22 14:24 · Score: 4, Insightful

Nice comparison, but there's really only two that matter (at least on PCs):

ZIP for cross-platform compatibility (and for simplicity for less technically-minded users).

RAR for everything else (at 3rd in their "efficiency" list, it's easy to see why it's so popular, not to mention ease of use for splitting archives, etc).

--
"Nothing strengthens authority so much as silence." - Charles de Gaulle

Poor article. by FellowConspirator · 2007-04-22 14:24 · Score: 5, Insightful

This is a poor article on several points. First, the entropy of the data in the files isn't quantified. Second, the strategy used for compression isn't described at all. If WinRK compresses so well on very high entropy data, there must be some filetype specific strategies used.

Versions of the programs aren't given, nor the compile-time options (for the open source ones).

Finally, Windows Vista isn't a suitable platform for conducting the tests. Most of these tools target WinXP in their current versions and changes to Vista introduced systematic differences in very basic things like memory usage, file I/O properties, etc.

The idea of the article is fine, it's just that the analysis is half-baked.

Re:Poor article. by j00r0m4nc3r · 2007-04-22 14:41 · Score: 0

Entropy isn't what it used to be.
Re:Poor article. by RedWizzard · 2007-04-22 15:10 · Score: 5, Insightful

I've got some more issues with the article. They didn't test filesystem compression. This would have been interesting to me because often the choice I make is not between different archivers, but between using an archiver or just compressing the directory with NTFS' native compression.
They also focused on compression rate when I believe they should have focused on decompression rate. I'll probably only archive something once, but I may read from the archive dozens of times. What matters to me is the trade-off between space saved and extra time taken to read the data, not the one-off cost of compressing it.
Re:Poor article. by cpaglee · 2007-04-22 15:41 · Score: 1

The website is riddled with annoying ads and no way to print the article. And Windows only: there is zero information on which programs use compression algorythms supported in Linux.

The data is in a pretty useless format. The data should definitely be charted in compression vs. speed format with identical scales to measure the significance of the compression. Compression of 7% for video is really not that interesting to me. I don't want to be bothered with the time it takes to decompress. For different users different levels of compression is significant. For me if I can't compress by 20% then I don't bother.

It seems almost like this article was written to get Slashdoted. The article is a complete waste of time.
Re:Poor article. by fireboy1919 · 2007-04-22 16:39 · Score: 1

They didn't test filesystem compression.

No, they didn't. They really should have tested that. Personally, I like 7-zip's compressed filesystem better than WinZip's, but I haven't really tried any of the others.

Hold on...I've just been handed a note. Apparently you don't get to make any real choices in that area - it's zip or nothing. Further, the details of compressing and decompressing is handled whenever the filesystem feels like it, so it can't really be judged against traditional programs. So I guess that was a silly idea - more of a topic for people who work on reverse engineering the various Windows filesystems.

They also focused on compression rate when I believe they should have focused on decompression rate.

While IMHO this is more important for exactly the reason you said, it's also less interesting. Pretty much all of these algorithms are of the same class and will take about the same length of time to decompress as all of the rest. You don't get the same kind of dramatic results that way.

I always just assume that there are very, very few cases where more compression isn't always better if the only thing you're losing is the time it takes to compress. Trading compression for robustness isn't ever worth it, though.

If losing a portion of a file means that the entire archive is unrecoverable, I don't want it.

--
Mod me down and I will become more powerful than you can possibly imagine!

What's the point of compressing JPEG,MP3,DivX etc by mochan_s · 2007-04-22 14:26 · Score: 5, Insightful

What's the point of compressing JPEG,MP3,DivX etc since they already do the compression? The streams are close to random (with max information) and all you could compress would be the headers between blocks in movies or the ID3 tag in MP3.

Hmm... by neonstz · 2007-04-22 14:27 · Score: 1

They didn't think their cunning plan to create more ad revenue by creating a shitload of pages all the way through...

english language is mostly fluff by Blue+Shifted · 2007-04-22 14:27 · Score: 4, Funny

the most interesting thing about text compression is that there is only about 20% information in the english language (or less). yes, that means that 4/5ths of it is meaningless filler. filled up with repetitive patterns. as you can see, i really didn't need four sentences to tell you that, either.

i wonder how other languages compare, and if there is a way to communicate much more efficiently.

Re: english language is mostly fluff by Anonymous Coward · 2007-04-22 14:43 · Score: 0

Y.
Re:english language is mostly fluff by maxume · 2007-04-22 14:46 · Score: 1

Does efficiency derive directly from the size of the textual representation of the words? I would think it would have to include things like robustness, expressiveness, clarity, as all of those things have a significant effect on how completely a given message is transmitted, which seems to be the numerator in the efficiency calculation.

--
Nerd rage is the funniest rage.
Re:english language is mostly fluff by demonlapin · 2007-04-22 15:17 · Score: 1

Yes, you can communicate much more efficiently. Much of the length of English words is related to defining part of speech - we use "-ing" for adjectival forms of verbs, "-ly" for adverbs, etc. It's just an attribute that is expressed by a clearly recognizable pattern. As such, it's easily comprehended by the reader who can identify parts of words rather than letter-by-letter reading. This is the essence of true speed-reading.
An anecdotal observation: my wife is better than I am at linguistic tasks, but not a lot better if the words are spoken. Reading is an entirely different matter. I'm a letter-by-letter reader, I still sound out words in my head, and I'm pretty good at it - I read pulp fiction at around 100 pages an hour. She is one of those people who can read a line - or sometimes a paragraph - at a time, and routinely reads pulp books at 250 pages an hour with total recall. She can identify the pieces of words by sight and digests them instantly.
If you can figure out what she's doing, you've vastly multiplied the efficiency of human communication.
Re:english language is mostly fluff by Anonymous Coward · 2007-04-22 21:39 · Score: 0

>> i wonder how other languages compare, and if there is a way to communicate much more efficiently.
Many surprisingly long English sentences can be compressed to one or two words, which are highly inflected and have multiple prefixes/infixes/postfixes.
Re:english language is mostly fluff by Anonymous Coward · 2007-04-22 21:47 · Score: 0

BTW, the parent post ist about the constructed Ithkuil language, which is designed to be as precise and efficient as possible. (Slashdot ate a part of the comment)
http://home.inreach.com/sl2120/Ithkuil/
Re:english language is mostly fluff by Viol8 · 2007-04-22 23:25 · Score: 1

"routinely reads pulp books at 250 pages an hour with total recall"

Bullshit. Someone might be able to skim read the pages at that speed (if they have strong eye muscles that didn't suffer strain) , but total recall? I don't think so. Not unless she's some sort of 1 in a billion savant.
Re:english language is mostly fluff by Spurion · 2007-04-22 23:56 · Score: 1

That redundancy is probably pretty helpful when it comes to working out what was actually meant, instead of what was said/written.

--
Any sufficiently self-referential snowcloned .sig is indistinguishable from nonsense.
Re:english language is mostly fluff by Eivind+Eklund · 2007-04-23 01:51 · Score: 1

The redundancy is there to make it possible to understand speech. You could use a more compact format for printed text (which is high quality to begin with); for speech, we already have misunderstandings fairly often (even with the redundancy and the high amount of context people use to guess at what was said).
Eivind.

--
Doubting the existence of evolution is like doubting the existence of China: It just shows that you're uninformed.
Re:english language is mostly fluff by IwantToKeepAnon · 2007-04-23 04:11 · Score: 1

OMG! LOL!

Can't imagine how to communicate more compactly :o)

--
"Happy families are all alike; every unhappy family is unhappy in its own way." -- Anna Karenina by Leo Tolstoy
Re: english language is mostly fluff by Sketch · 2007-04-23 05:54 · Score: 2, Funny

Your response is 50% larger than necessary

--
-- OpenVerse Visual Chat: http://openverse.com
Re:english language is mostly fluff by Anonymous Coward · 2007-04-23 12:46 · Score: 0

Hmmm..

"the most interesting thing about text compression is that there is only about 20% information in the english language (or less). yes, that means that 4/5ths of it is meaningless filler. filled up with repetitive patterns. as you can see, i really didn't need four sentences to tell you that, either. i wonder how other languages compare, and if there is a way to communicate much more efficiently."

= normal language contains _20%_ real information.

- hey you were right.
Re:english language is mostly fluff by demonlapin · 2007-04-25 00:56 · Score: 1

I'd have said the same thing, if I hadn't watched it happen too many times to count. She reads each line in (max) two glances - I've watched her eyes as she reads, and there is at most one horizontal saccade (jump of the eyes) per vertical saccade.
Not surprisingly, she was an English major.

Depends on the application by Toe,+The · 2007-04-22 14:27 · Score: 1

Some people are sending huge graphics files and paying for badnwidth and/or sending to people with slow connectiuons, so they actually have a use for maximal compression.

I have to agree that for most people (myself included), compatibility is all that matters. I'm so glad Macs now can natively zip. But there are valid reasons to want compression over compatibility.

7zip by Lehk228 · 2007-04-22 14:33 · Score: 4, Insightful

7-zip cribsheet:

weak on retarded things to zip like WAV files (use FLAC) mp3's, jpegs and divx movies.

7zip does quite well in documents (2nd) and ebooks (2nd) 3rd on MPEG video, 2nd in PSD

also i expect 7zip will improve in higher end compressions settings, when possible i give it hundreds of megs and unlike commercial apps 7zip can be configured well into the "insane" range

--
Snowden and Manning are heroes.

Re:7zip by MobyDisk · 2007-04-23 02:25 · Score: 1

7-zip can also compress small files, while most archivers just store them uncompressed. Do any of the others do this?
Re:7zip by drinkypoo · 2007-04-23 04:07 · Score: 1

7-zip can also compress small files, while most archivers just store them uncompressed. Do any of the others do this?

You can create a "SOLID" RAR archive which rather than compressing all files separately compresses them together. I don't know if it's even possible to remove a file from the middle of the archive without processing all the data in the archive from the beginning, like a gzipped tar - I don't think you can, I think it's the same story as getting a file out of the middle of a compressed tar. But it does provide substantially more compression over stock RAR. This review is way light on details.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:7zip by Sketch · 2007-04-23 06:06 · Score: 1

also i expect 7zip will improve in higher end compressions settings, when possible i give it hundreds of megs and unlike commercial apps 7zip can be configured well into the "insane" range You are correct. I downloaded a set of router firmware that was distributed in a 2.7MB 7zip and was annoyed to need yet another compressor. I figured it was probably no better than .tar.bz2, so I re-compressed the data to test my theory. The .tar.bz2 archive was 10 times larger (27MB!) than the original .7zip archive. The router firmware archive contained 10 slightly different ~2.7MB ROM files. WIth the maximum block size of bzip2 being 900K, it couldn't reach the point where it was seeing the same redundant information in the 2.7MB files. With the larger block size support of 7zip, it was able to compress them into essentially 1 copy of the file + the minor differences. That's not really even in the insane range, but it's still a 10x improvement in a special case.

--
-- OpenVerse Visual Chat: http://openverse.com
Re:7zip by Lehk228 · 2007-04-23 06:58 · Score: 1

the largest compression i ever did was with the emporio release of HL2 a few years ago, i saved over a gigabyte after compression (i think i went from 3.8 gigs to 2 gigs) and that was feeding it something like 600 megs of RAM for a night. haven't tried it on my laptop where i could spare over a gig just to see what kind of results i would get.

--
Snowden and Manning are heroes.

Doesn't really matter by 644bd346996 · 2007-04-22 14:35 · Score: 2, Informative

These days, file compression is pretty much only used for large downloads. In those instances, you really have to use either gzip, pkzip, or bzip2 format, so that your users can extract the file.

Yes, having a good compression algorithm is nice, but unless you can get it to partially supplant zip, you'll never make much money off it. Also, most things these days don't need to be compressed. Video and audio are already encoded with lossy compression, web pages are so full of crap that compressing them is pointless, and hard drives are big enough. Although, I haven't seen any research lately about whether compression is useful for entire filesystems to reduce the bottleneck from hard drives. Still, I suspect that it is not worth the effort.

Re:Doesn't really matter by alphamugwump · 2007-04-22 19:03 · Score: 1

Compressing filesystems is the wave of the future, b/c it sacrifices processor processor time for IO speed. According to Reiser, anyway, it gave an overall speedup.
Re:Doesn't really matter by Maltheus · 2007-04-23 09:14 · Score: 1

hard drives are big enough

What kind of a geek are you? Take it back!
Re:Doesn't really matter by 644bd346996 · 2007-04-23 09:42 · Score: 1

I was quite serious. A complete linux distro, with all available source code, easily fits on a recent hard drive. The only people who are actually using more than 100Gb are people who are baking up large collections of DVDs and music. Other than the multimedia, desktops don't really need more than 10Gb. (That's not to say they don't use more than 10Gb. They do. But that qualifies as bloat.)

You might want an interface. by twitter · 2007-04-22 14:38 · Score: 1

All I want it compatibility with other OSs (i.e., fewest things that have to be installed on a base OS to use it). For that, I'd have to say Zip and/or gzip wins.

Sure, but there's also the issue of finding the files you really want to share and there KDE has very nice front ends. There's a nice find in Konqueror, with switches for everything including click and drool regular expressions. Krename coppies or links files with excellent renaming. Finally, Konqueror has an archive button. The slick interface does not preclude the use of command line tools because the rename and archive programs will take piped input. The GUI is nice for review of the output and easy further processing.

--

Friends don't help friends install M$ junk.

Re:You might want an interface. by Anonymous Coward · 2007-04-22 15:23 · Score: 0

Erm, sorry but what's your point, this is completely off topic, we aren't discussing how bloated and unintuitive KDE's file manager is here, the article is about archive formats.
Re:You might want an interface. by Anonymous Coward · 2007-04-23 05:29 · Score: 0

Say there, Ali-Baba, have you seen the keyboard?

Re:What's the point of compressing JPEG,MP3,DivX e by Lehk228 · 2007-04-22 14:38 · Score: 1

because then they can use those graphs to pump their sponsor (WinRK)

--
Snowden and Manning are heroes.

Exhaustive? don't forget flac.. by Anonymous Coward · 2007-04-22 14:44 · Score: 0

UM, yeah, the dataset includes WAV files. Try flac. Then you will have exhausted a little more of the compression programs available.

Re:Exhaustive? don't forget flac.. by moronoxyd · 2007-04-22 15:22 · Score: 2, Informative

> UM, yeah, the dataset includes WAV files. Try flac [sourceforge.net].
> Then you will have exhausted a little more of the compression programs available.

You are aware that all the tools tested are general purpose compressors, and FLAC is not, aren't you?

Otherwise, you would also have to talk about Wavepack, Monkey Audio, Shorten and others.
And those are only the loseless audio codecs. What about lossy codecs?

What about all those different formats for pictures? They compress data as well.
And what about the different video codecs? ...
Re:Exhaustive? don't forget flac.. by Anonymous Coward · 2007-04-22 17:07 · Score: 0

Ie cred te bt compron algthm er, t prides 90%preson, unnately, is it lsy.

Moo by Chacham · 2007-04-22 14:46 · Score: 0, Offtopic

This is the First Post compressed really well, so it took until after a few posts to show up.

--
Have you read my journal today?

Alternate Compressor Comparisons by Anonymous Coward · 2007-04-22 14:49 · Score: 0

I read the article, got shocked at the time spent comparing the compression of MP3s and DiVX, and didn't read much further.

Google's top hit turns up this site which is chock full of data on every compressor you ever & never heard of:
http://www.maximumcompression.com/index.html

Wikipedia has nice charts to quickly see features and OS support for a handful of common compressors:
http://en.wikipedia.org/wiki/Comparison_of_file_ar chivers

The newsgroup comp.compression has been around awhile, and is maintaining an excellent FAQ:
http://datacompression.dogma.net/index.php?title=C omp.compression_FAQ

Re:Screw speed, size reduction: gimme compatibilit by NMerriam · 2007-04-22 14:51 · Score: 2, Interesting

Screw speed and size reduction. All I want it compatibility with other OSs (i.e., fewest things that have to be installed on a base OS to use it). For that, I'd have to say Zip and/or gzip wins.

I have to admit I switched over/back to ZIP about a year ago for everything for exactly this reason. yeah, it meant a lot of my old archives increased in size (sometimes by quite a bit), but knowing that anything anywhere can read the archive makes up for it. ZIP creation and decoding is supported natively by Mac and Windows and most Linux distros right from the GUI, so it makes it brain-dead simple to deal with.

--
Recursive: Adj. See Recursive.

Coralized and Hutter Prized by Baldrson · 2007-04-22 14:51 · Score: 1

A coralized link for the slashdotted.

Meanwhile, I noticed they didn't include the latest winner of the Hutter Prize, which is unfortunate since its latest entry looks like it will come in at nearly a 10% improvement over all prior text compressors using novel semantic modeling techniques.

--
Seastead this.

What about rzip? by Anonymous Coward · 2007-04-22 14:52 · Score: 0

rzip is like bzip2 on steroids. Works great for me.
See http://en.wikipedia.org/wiki/Rzip

I use a single finger by Anonymous Coward · 2007-04-22 14:52 · Score: 0

I use a single finger

Re:I use a single finger by Anonymous Coward · 2007-04-22 15:57 · Score: 0

I use a single finger
I don't want to hear about your anal stimulation preferences.

Re:L-Zip by Anonymous Coward · 2007-04-22 15:00 · Score: 2, Funny

The L-Zip project at http://lzip.sourceforge.net/ seems to be down right now but it should be included in any file compression comparison. It could reduce files to 0% of their original size and it was quick too.

It was so good at what it did that I bet Microsoft bought them out and are going to incorperate the technology into Windows.

Mirrors! by antdude · 2007-04-22 15:02 · Score: 1

Looks like the server was /.'ed. Mirrors: MirrorDot and Network Mirror.

--
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).

I've got you all so beat by turing_m · 2007-04-22 15:04 · Score: 0, Flamebait

I use MS DOS 6 with doublespace doubling my hard drive space. I store stuff on both C and H drives, zipping, arjing and rarring all my jpgs and divx files. I figure with the amount of compression I'm using, I'll have roughly 20 times as much room as regular plebs. Suckers!

You should also see my 133t power strip setup. I don't need extra sockets, with daisy chaining I can fit as many devices as I want! LOL LOL Unfortunately my faulty circuit breaker keeps switching off at the most inconvenient times, I'll have to get that seen to.

--
If I have seen further it is by stealing the Intellectual Property of giants.

Re:I've got you all so beat by Anonymous Coward · 2007-04-22 15:27 · Score: 0

All that effort and you're not even using XTRATANK? Lame!

You can't even BEGIN to claim maximal ub3r-133t compressionality without the first, and best, disk space doubler.

Archive Comparison Test by Repton · 2007-04-22 15:09 · Score: 4, Insightful

See also: the Archive Comparison Test. Covers 162 different archivers over a bunch of different file types.

It hasn't been updated in a while (5 years), but have the algorithms in popular use changed much? I remember caring about compression algorithms when I was downloading stuff from BBSs at 2400 baud, or trading software with friends on 3.5" floppies. But in these days of broadband, cheap writable CDs, and USB storage, does anyone care about squeezing the last few bytes out of an archive? zip/gzip/bzip2 are good enough for most people for most uses.

--
Repton.
They say that only an experienced wizard can do the tengu shuffle.

Much more exhaustive test by Anonymous Coward · 2007-04-22 15:11 · Score: 0

http://compression.ca/act/ has a much more exhaustive test, and no ads either.

Exhaustive?! by jagilbertvt · 2007-04-22 15:12 · Score: 5, Informative

It seems odd that they didn't include executables/dlls in the comparison (where maxmumcompression.com does). I also find it odd that they are compressing items that normally don't compress very well with most data compression programs (divx/mpegs/jpegs/etc). I'm guessing this is why 7-zip ranked a bit lower than most.

I did some comparison last year, and found 7-zip to do the best job for what I needed (great compression ratio without requiring days to complete). It also doesn't take into account the network speed at which the file is going to be transmitted. I use 7-zipfor pushing application updates and such to remote offices (most over 384k/768k WAN links). Compressing w/ 7-zip has saved users quite a bit of time compared to winrar or winzip.

I would definitely recommend checking out maximumcompression.com (As others have, as well) over this article. It goes into a lot greater detail.

Re:Exhaustive?! by yoshi_mon · 2007-04-22 22:05 · Score: 1

Proper link for the lazy.

--

Really, I know what I'm doing...Ohhhh, look at the shiny buttons!

Pizzachish: setting a new standard in languages by pizzach · 2007-04-22 15:24 · Score: 2, Interesting

I have been thinking about creating a new language with about 60 or so words. The idea is that you don't need a lot of words when you can figure out the meaning by context. Strong points are that the language would be very easy to pick up, and you would get that invigorating feeling of talking like a primitive cave man.

As an example of the concept, we have the words walk and run. They are a bit too similar to be worth wasting one of our precious few 60 words. Effectively, one could be dropped with have the other taking on a broader meaning without any real repercussions. The words sit and shit are also fairly similar. When you have a guest over, you can say something like, "Please, shit down." Because of context, it would be all okay. Just remember, there is a difference between shitting on the toilet and shitting in the toilet.

--
Once you start despising the jerks, you become one.

Re:Pizzachish: setting a new standard in languages by Kandenshi · 2007-04-22 15:48 · Score: 1

If you're going to do that, I'd suggest considering the way some languages deal with modified versions of words like warm and hot.

"warm warm" = "hot"
"walk walk" = "walk fast/run"

It could greatly reduce the number of adjectives and verbs(and other stuff) you need in the language.
Re:Pizzachish: setting a new standard in languages by matts-reign · 2007-04-22 15:53 · Score: 1

This idea is double-plus-good!

--
Waffles rock.
Re:Pizzachish: setting a new standard in languages by maxume · 2007-04-22 16:12 · Score: 1

With sixty words, you would be lucky if 'shit body area' meant shitting on the toilet(where body area is your stand in for bathroom). I doubt you could express sitting on a toilet. If you disagree, consider that a poor vocabulary for a native English speaker is something like 20,000 words, average is 50,000 and people that speak a lot of jargon commonly have 75,000 words.

--
Nerd rage is the funniest rage.
Re:Pizzachish: setting a new standard in languages by wall0159 · 2007-04-22 16:38 · Score: 2, Interesting

you might be interested in this:

http://www.tokipona.org/
Re:Pizzachish: setting a new standard in languages by TheLink · 2007-04-22 17:01 · Score: 1

yeah, those double plus ungood languages ;).
--
- Too many replies beneath your current threshold
Re:Pizzachish: setting a new standard in languages by dodobh · 2007-04-22 18:09 · Score: 1

That is doubleplusinteresting..

--
I can throw myself at the ground, and miss.
Re:Pizzachish: setting a new standard in languages by Viol8 · 2007-04-22 23:30 · Score: 1

"he idea is that you don't need a lot of words when you can figure out the meaning by context."

And you do you express the context? Unless you're assuming all contexts can be boiled down to a small subsection. With 60 words you don't even make a dent in all the nouns never mind , verbs , adjectives , adverbs etc. How would you describe a goose in a language with 60 words that also had to describe asteroid, linux, sugar , dreaming etc etc. Your post is either an attempt at being toungue in cheek (not very funny if it was) or you're just talking rubbish.
Re:Pizzachish: setting a new standard in languages by Anonymous Coward · 2007-04-23 10:17 · Score: 0

Better not ask her to sit on your face!

Re:What's the point of compressing JPEG,MP3,DivX e by trytoguess · 2007-04-22 15:25 · Score: 5, Interesting

Er... did ya check out the comparisons? As you can see here here jpeg at least can be compressed considerably with Stuffit. According to this the program can "(partially) decode the image back to the DCT coefficients and recompress them with a much better algorithm then default Huffman coding." I've no idea what that means, but it does seem to be more thorough and complex than what you wrote.

Re:What's the point of compressing JPEG,MP3,DivX e by ampathee · 2007-04-22 15:30 · Score: 1

Mod parent up! I noticed that too, very interesting - I wonder whether a jpg compressed as efficiently as the JPEG standard allows could still be improved upon by StuffIt, or whether it just takes advantage of the inefficiency of most jpg compression code..

Re:Screw speed, size reduction: gimme compatibilit by Deliveranc3 · 2007-04-22 15:35 · Score: 1

With Quantum computing perhaps we'll start to see really elegant compression, like 2d checksums with bitshifting. If you can make all the data relate to each other than each bit of compressed file cuts the possibilities in half, get it down to maybe 1,000,000,000 possibilities and then tell it that it needs to be able to play in winamp and... well, use a lot of processing power.

Re:Screw speed, size reduction: gimme compatibilit by BluhDeBluh · 2007-04-22 15:39 · Score: 2, Informative

It's closed sourced and proprietary though. Someone needs to make an open-source RAR compressor - the problem is you can't use the official code to do that (as it's specifically in the licence), but you could use unrarlib as a basis...

Backups by Craig+Ringer · 2007-04-22 15:40 · Score: 1

File compression is also very important for backups, both for capacity and backup/restore speed. But you know what? In backups, you want to ensure that the archives are going to be recognisable and readable by as wide a variety of software as possible, so your disaster recovery options are open. Sure, you probably encrypt them, but there portable and fairly standard tools are also a good idea rather than some compression&archival app's built-in half-baked password protection.

As for compressing whole file systems, it doesn't work well because data compresses by variable amounts. It's hard to get a layout that handles this well - when a program overwrites a few blocks of a file, those blocks might grow and force everything to move, or force fragmentation of the file. That sort of thing. You might say to compress the data but store it in the original block layout - which works and solves the above problem, but loses you your performance gains because the drive will generally read a whole block if part of it is needed, so you have no net change. This doesn't mean that efficient read/write compressing file systems aren't possible, just that they are hard, and probably won't perform as well as you might initially expect. They'll also have very _different_ performance characteristics because of the changes required to make them work without insane levels of fragementation or lots of block copying.

Compressing file systems are amazing for backups, though, where files are written, read, or truncated, but rarely appended to or partially overwritten. I'd LOVE a widely supported r/w compressing FS for our backups here, but have to make do with compressed archives at the moment. Tape drives compress, but I don't have the cash for an SDLT here and we need that kind of capacity.

poor sample data choices by SideshowBob · 2007-04-22 15:47 · Score: 1, Redundant

It's a waste of time using a general purpose compressor on data that's already been compressed by domain specific audio or video compressors.

Re:What's the point of compressing JPEG,MP3,DivX e by athakur999 · 2007-04-22 15:53 · Score: 3, Insightful

Even it the amount of additional compression is insignificant, ZIP, RAR, etc. are still very useful as container formats for MP3, JPG, etc. files since it's easier to distribute 1 or 2 .ZIP files than it is 1000 individual .JPG files. And if you're going to package up a bunch of files into a single file for distribution, why not use the opportunity to save a few kilobytes here and there if it doesn't require much more time to do that?

--
"People that quote themselves in their signatures bother me" - athakur999

Compress big files before -ALL- File-Transfers by Anonymous Coward · 2007-04-22 15:54 · Score: 0

I am forever amazed that originating servers & mirrors of oft-released (minor releases of large) EXEs, ISOs, etc. do NOT - by default - compress their files, ie, before the first-requested transfer happens.

PROPOSAL (not likely to be so new, I suppose):

Whenever a requested file is NOT already compressed:

1. On the Server-Side:

- [FTP or HTTP] file-transfer programss/protocols should (by default) compress them (using the best compressor for that type of file), and

- save the now-compressed version of the big file on the server (in case of future requests for the same file), and

On the Client-Side:

- User can be asked (unless there's been a default reply saved) in which form the file should be saved (ie, compressed or decompressed), and

- the received file is saved in the form requested by User.

We, in Australia, need such compression, as we've recently had significant INCREASES in our Internet service DATA costs... either because ISPs are just beginning to need to invest in ADSL-2+ DSLAMS -or- we're now using data for VoIP applications (and ISPs figure they're entitled to some of the $'s we save) -or- due to greed?

Others may also have high data costs.

In any case, I'm sure no one would mind some server-side changes that would reduce the sheer quantity of data that needs to be transferred.

Re:small = slow. Tunning UPX (Ultimate Packer eXe) by Anonymous Coward · 2007-04-22 15:56 · Score: 1, Informative

Book: "Digital Compression for Multimedia". PRINCIPLES & STANDARDS. Morgan Kaufmann Publishers Inc.

Interesant algorithms: i suppose that the patents are expired. Key items:

Tail-biting LZ77.
Lempel-Ziv-Yokoo LZY 1992, Kiyohara and Kawabata 1996.
LZ78SEP.
LZWEP.
LZYEP.

No War, Peace Again!

Re:What's the point of compressing JPEG,MP3,DivX e by maxume · 2007-04-22 16:03 · Score: 1

Yes. Jpeg includes lossless compression. First, it discards information(which is the part that you can tune), and then it losslessly compresses the result of that step. Stuffit backs out the standard lossless compression and uses some other better algorithm. If you are worried about it, use Jpeg 2000 or something similar, they are better at discarding information.

--
Nerd rage is the funniest rage.

ARTICLE TEXT (Conclusions only) by Anonymous Coward · 2007-04-22 16:03 · Score: 1, Informative

Save yourself 24 pages of crap, here's the punchline:

Aggregate Results

Overall, WinRK was the champion at compressing the filesets. It had an average compression rate of 23.2%. It was 9% better at overall compression than its closest rival, SBC Archiver which had an average compression rate of 21.3%.

The poorest compressors overall, at default settings, were the trio of WinZip, gzip and ARJ32. They only had average compression rates of about 13%. ...

However, gzip was the undisputed speed champion. It only took just over 121 seconds to completely process the complete fileset collection which weighed in at over 1.6GB. It was over a third faster than the runner-ups, ARJ32 and WinZip.

The other compressors were pretty slow at their normal compression settings. However, WinRK was extremely slow, compared to the others. It took almost 1.5 hours to compress the entire fileset collection. ...

The most efficient data compressor for the aggregated results was gzip. Its super-fast compression speed, coupled with its average compression rate allowed it to become the undisputed overall efficiency champion. ARJ32 and WinZip were also very efficient compressors. They were more than twice as efficient as their nearest rivals, StuffIt and bzip2.

The other compressors may have been good at certain files, but overall, they were pretty inefficient. The most inefficient compressors overall was WinRK by a large margin . No matter how good it was at compressing files, its extremely slow compression speed totally killed its efficiency ratings.

Conclusion

WinRK was the best compressor in most filesets it encountered. So, it was not surprising that it was the overall compression champion. However, its performance was offset by its abysmally slow performance. Even with a really fast system, it still took ages to compress the filesets. On several occasions, it took more than 18 minutes to compress just 200MB of files. Thanks to this flaw, it had the dubious honour of being the most inefficient compressor as well.

SBC Archiver, which was just slightly poorer than WinRK at compression was much faster at the job. Although it was nowhere near the top of the speed rankings, its faster speed allowed it to attain a moderate efficiency ranking.

WinRAR, which is a favourite of many Internet users, displayed a surprisingly bland performance at default settings. Although it had a pretty good overall compression rate of just under 19%, it was very slow at its default settings. That made it the third most-inefficient compressor. Surprising, isn't it?

In contrast, another perennial favourite, WinZip which had a lower overall compression rate of 13% managed to attain a much higher efficiency rating because it was able to compress the filesets much faster than WinRAR. Quite surprising since many users have abandoned it for WinRAR in view of its rather dated compression algorithm.

StuffIt is a dark horse. It has a pretty good compression rate overall but with an unimpressive compression speed. However, its amazing performance with JPEG files cannot be denied. JPEG files is undeniably StuffIt's forte. No other compressor even comes within a light year of it.

gzip and ARJ32 are both the fastest and the worst compressors of the lot. They have unimpressive overall compression rates but more than makes up for it with their tremendous compression speeds. Therefore, it isn't surprising to see them garner the top two spots in compressor efficiency. However, we would still recommend GUI alternatives like WinZip. It is almost as efficient as gzip and ARJ32 and far more user-friendly.

Based on our results, we can only come to one conclusion. If you do not like to change the settings of your data compressors and want a good, fast and user-friendly data compressor, then WinZip is the best one for the job.

So there you have it - the results of the Normal Compression Test.

Re:ARTICLE TEXT (Conclusions only) by Spikeles · 2007-04-22 17:10 · Score: 1

Based on our results, we can only come to one conclusion. If you do not like to change the settings of your data compressors and want a good, fast and user-friendly data compressor, then WinZip is the best one for the job.
Sounds to me like WinZip astroturfing.

--
I don't need to test my programs.. I have an error correcting modem.
Re:ARTICLE TEXT (Conclusions only) by Vintermann · 2007-04-22 21:34 · Score: 1

The PAQ data compressor (best of the open source ones in most comparisons, sometimes overall best) is avaliable from http://www.cs.fit.edu/~mmahoney/compression/paq8l. zip

Yes, that's right. They seem to agree with you...

--
xkcd is not in the sudoers file. This incident will be reported.

My response to this article... by wbren · 2007-04-22 16:08 · Score: 0, Troll

PK....k..6]..Y..Q...zip.huSmk.A..~..&.K!..3...GYo. s../..w.^..3...rw.na.sT.9..,$z..Tf..K..os..r.i.saS ..a..O.7...*.._BP.8.W!.`9..*..k..R;.".0.^..;.'..*. o.~L_.7.. T(w.J...6t..i..X.]...u.+..W..?.r..K...Y.O..{.."}.. *,.;..Zp..WZ).YQ.0~2)xE..59C..m+.Vk..t

--
-William Brendel

Re:My response to this article... by The+MAZZTer · 2007-04-22 16:48 · Score: 0, Troll

01000010011010010110111001100001011100100111100100 10000001100101011110000111000001100001011011100111 00110110100101101111011011100010000001110111011000 01011100110111010001100101011100110010000001111001 01101111011101010111001000100000011100110110000101 11011001100101011001000010000001110011011100000110 00010110001101100101001000010010000001000110010101 000101011100100001

Default options and stuffit by Rosyna · 2007-04-22 16:10 · Score: 2, Informative

By default, Stuffit won't even bother to compress MP3 files. That's what it shows an increase in file size (for the archive headers) and why it is the fastest throughput (it's not trying to compress). If you change the option, the results will be different.

I imagine some other codecs also have similar options for specific file types.

Re:ATTN: SWITCHEURS! by Anonymous Coward · 2007-04-22 16:15 · Score: 0

The only thing more pathetic than a PC user is a PC user trying to be a Mac user. We have a name for you people: switcheurs.

We have a name for you people too. Unfortunately, it can't be repeated in mixed company.

And slashdotting == no comression at all by EmbeddedJanitor · 2007-04-22 16:21 · Score: 1

Server Too Busy

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.Web.HttpException: Server Too Busy

--
Engineering is the art of compromise.

Re:What's the point of compressing JPEG,MP3,DivX e by slim · 2007-04-22 16:22 · Score: 2, Insightful

"the program can "(partially) decode the image back to the DCT coefficients and recompress them with a much better algorithm then default Huffman coding." Whew, that makes me feel a bit dirty: detecting a file format an applying special rules. It's a bit like firewalls stepping out of their network-layer remit to mess about with application-layer protocols (e.g. to make FTP work over NAT).

Still, in both cases, it works; who can argue with that.

rar? by EmbeddedJanitor · 2007-04-22 16:25 · Score: 1

wtf? How is this highly compatable? gzip has a much larger install based.

--
Engineering is the art of compromise.

Re:rar? by Petrushka · 2007-04-22 17:19 · Score: 4, Funny

I take it you come from a planet where very few people use Windows. Please, I'm curious to know, what are things like there?
Re:rar? by jawtheshark · 2007-04-22 17:53 · Score: 1

I don't use RAR, I use 7Zip. It seems to be able to open RAR files fine though. You cannot create them, but that's okay with me.

--
Ahhh...the great dumpster continuum. Many a free computer will be found there. -- sowth (748135)
Re:rar? by Anonymous Coward · 2007-04-23 02:30 · Score: 0

People are happier here.
Re:rar? by krunk7 · 2007-04-23 02:43 · Score: 1

I suppose your suggesting that zip and rar are the only way to go. With zip, yeah. It's the only format supported by default if I remember right.
When it comes to rar though, it has no claim to superior ease of use or cross compatibility. You have to install a 3rd party app on windows to use it as you do on other platforms like OS X, linux, or *bsd. gzip is supported by default on all but windows platforms and requires the same single app third party install that winrar would, so in a mixed environement where compatibilty is key it would make more sense to install a single program on your windows images and using gzip and have 100% support across the board with the least effort.
That's only assuming cross compatibility is important. There's also the benefit of openness. Any IT guy that's had to find some program to open the latest and greatest "never going away" proprietary format of 8 years ago knows how important open formats are.
Re: rar? by Anonymous Coward · 2007-04-23 04:26 · Score: 0

I take it you come from a planet where very few people use Windows. Please, I'm curious to know, what are things like there?

WONDERFUL! ~7K Unix (Sun, HP, IBM, others) boxes at work to play on, Unix (Mac OS, others) at home.

Sure I have a PC on my desktop, but I don't need to compress on it, the PC doesn't matter, only the (gzip'd when possible, Unix compress'd when have to) files from the real computers

Agreed completely. by Kadin2048 · 2007-04-22 16:28 · Score: 5, Interesting

Back in the early/mid 90s I was pretty obsessed with data compression because I was always short on hard drive space (and short on money to buy new hard drives with); as a result I tended to compress things using whatever the format du jour was if it could get me an extra percentage point or two. Man, was that a mistake.

Getting stuff out of some of those formats now is a real irritation. I haven't run into a case yet that's been totally impossible, but sometimes it's taken a while, or turned out to be a total waste of time once I've gotten the archive open.

Now, I try to always put a copy of the decompressor for whatever format I use (generally just tar + gzip) onto the archive media, in source form. The entire source for gzip is under 1MB, trivial by today's standards, and if you really wanted to cut size and only put the source for deflate on there, it's only 32KB.

It may sound tinfoil-hat, but you can't guarantee what the computer field is going to look like in a few decades. I had self-expanding archives, made using Compact Pro on a 68k Mac, thinking they'd make the files easy to recover later, which didn't help me at all now -- a modern (Intel) Mac won't touch it (although to be fair a PPC Mac will run OS 9 which will, and allegedly there's a Linux utility that will unpack CPP archives, although maybe not self-expanding ones).

Given the rate at which bandwidth and storage space are expanding, I think the market for closed-source, proprietary data compression schemes should be very limited; there's really no good reason to use them for anything that you're storing for an unknown amount of time. You don't have to be a believer in the "infocalypse" to realize that operating systems and entire computing-machine architectures change over time, and what's ubiquitous today may be unheard of in a decade or more.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."

Re:Agreed completely. by maxume · 2007-04-22 16:38 · Score: 2, Insightful

7zip, and thus any format it supports, is as reliable as sourceforge. It's not a guarantee, but it isn't exactly 'you never know' territory either.

--
Nerd rage is the funniest rage.
Re:Agreed completely. by Anonymous Coward · 2007-04-22 22:18 · Score: 0

I'm just left wondering why people don't realize this before they start. It's not like it a problem that's easy to miss, it starts showing it self the first time you need to share a file with someone else or you have to reinstall you machine. This is just like the people who use all sorts of audio and video codecs, don't they realize that it's just going to be harder to find the codecs later?
Re:Agreed completely. by Bearhouse · 2007-04-23 08:54 · Score: 1

But even if you keep the software, (especially source), that you used to make the compressed file, will you be able to run / compile it in the future?

This is a major issue for people looking at the electronic archiving of documents. I used to be a big fan of Pagis Pro, (came out of Xerox) which used a proprietry format (remember *.xif?) that was very efficient and effective for scanned documents - it essentially stored the text as (OCR-recognized) text, and images as JPEGs. Much better quality, and much smaller files.

http://www.guides.sk/scantips2/pagis1.html

Then Pagis was purchased, and XIF was abandoned in favour of PDF. An open-source alternative, 'DejaVu' exists:

http://djvu.org/

But the main products to create files is windows commercial...

UHA by dj245 · 2007-04-22 16:39 · Score: 2, Insightful

Theres also another, rather uncommon format that wasn't tested that is somewhat important. UHARC- File extension UHA. It is dog slow, but offers better compression than probably any of the others. It is still used by software pirates with their custom install scripts, and I have seen it in official software install routines as well.

You can keep Rar and zip and toss out the others, but the UHA extension (or a dummy extension) will probably exist on your computer at some point in time.

--
Even those who arrange and design shrubberies are under considerable economic stress at this period in history.

Re:Screw speed, size reduction: gimme compatibilit by Jeff+DeMaagd · 2007-04-22 16:44 · Score: 2, Insightful

RAR irritates me though. It's rare enough that I usually have to dig up a decompresser for it and install it special for just one file and then I never use it again. I just don't like having to deal with files that require me to install new software just so I can use that one file. In that vein, I really don't think the article is relevant. I certainly won't use novelty file formats unless it looks like it has "legs". It's not like I want to make a file that becomes useless when the maintainer of the decompression utility loses interest and it goes away.

Re:Screw speed, size reduction: gimme compatibilit by zippthorne · 2007-04-22 16:45 · Score: 1

Why is "ease of splitting archives" considered to be important? You can do it with zip automatically, or any other archive format you care to choose by using, for instance, split -d -b 2048m filename, to split the output stream of any compressor into files no larger than 2 gig, with names starting with filename001.

How many systems don't have any form of cat?

--
Can you be Even More Awesome?!

Few things missing by Anonymous Coward · 2007-04-22 16:45 · Score: 0

There is no mention of the file format that is being used for the compression. I would have liked to see the test done comparing all the common formats as well as each programs specialty format.

Re:Few things missing by aivankovic · 2007-04-22 18:19 · Score: 1

Info I need most is not covered:

- handling the archives larger than 2GB and files in archive larger than 2GB
- crossplatform compatibility

Few percent or seconds up or down does not matter to me at all.

It's actually pretty sensible by DoofusOfDeath · 2007-04-22 16:57 · Score: 1

Exhaustive Data Compressor Comparison

It makes a lot of sense, considering how my eyelids feel after reading what the article is about.

Didn't have Tridge's rzip... by agristin · 2007-04-22 17:06 · Score: 2, Interesting

Andrew Tridgell's rzip wasn't on there either.

http://samba.org/junkcode/

Tridge is one of the smart guys behind samba. And rzip is pretty clever for certain things. Just ask google.

Zip? by Anonymous Coward · 2007-04-22 17:32 · Score: 0

Is TFA available in zip format?

[offtopic] Pagination by dotwaffle · 2007-04-22 17:49 · Score: 0, Troll

This site uses *23* pages. Does anyone else hate pagination? Sure, it has it's uses (decreasing bandwidth consumption) if you're planning on using it as a reference, but if it's an ARTICLE meant for one-off reading, please, for Pete's sake, just use single page format and save us all the hassle of waiting 5 annoying seconds as the next page loads with all your shitty design!

At the end of the day, if you want to use a "quick link" system to quickly get someone to the conclusion, use this handy thing invented back in "the day". It's called Hypertext - use a hyperlink, use it in a table of contents, and save us all a crapload of wasted time.

Re:Screw speed, size reduction: gimme compatibilit by jawtheshark · 2007-04-22 17:58 · Score: 1

Well, on Windows I use 7Zip. It is registred to the following extentions 001, 7z, arj, bz2, cab, cpio, deb, gz, iso, rar, rpm, tar, z and zip. All those I have used worked just fine for decompression (meaning arj, bz2, gz, iso, rar, tar, z and zip) It can only create 7z, zip and tar though.

--
Ahhh...the great dumpster continuum. Many a free computer will be found there. -- sowth (748135)

Re:Screw speed, size reduction: gimme compatibilit by jawtheshark · 2007-04-22 18:01 · Score: 1

For those that don't know how to join files on Windows, it would be:

copy /b file1 + file2 + file3 + .... + fileN resultfile

The /b parameter is very important because it indicates to join the files in binary format. That said, I do not know how to split files on Windows.

--
Ahhh...the great dumpster continuum. Many a free computer will be found there. -- sowth (748135)

how about non-windows platforms anyone? by sofar · 2007-04-22 18:18 · Score: 3, Insightful

The article conveniently forgets to mention whether the conpression tools are cross-platform (OSX, Linux, BSD) and/or open source or not.

That makes a lot of them utterly useless for lots of people. Yet another windows-focussed review, bah.

Re:how about non-windows platforms anyone? by amokk · 2007-04-22 20:20 · Score: 0, Flamebait

Yes, imagine that...
A review done on software that more than 90% of the personal computing world uses. It's an absolutely useless review.

Fucking retard.

--
I think, therefore I am an Atheist.
Re:how about non-windows platforms anyone? by Anonymous Coward · 2007-04-22 21:18 · Score: 0

Fucking windows weenie cunt
Re:how about non-windows platforms anyone? by sofar · 2007-04-23 02:49 · Score: 1

Thanks for staying polite and maintaining a proper level of attitude. You've demonstrated that you truly are better than single-celled organisms. Way to go.
Re:how about non-windows platforms anyone? by xtracto · 2007-04-23 03:40 · Score: 1

Agreed.

I used to love (and currently kind of love) RAR files. I have Winrar in my windows partition (with the corresponding license) of course but as it is closed source, there is no descent Linux program w/GUI that can handle them (not just decompressing but compressing with all the options).

Besides that, I think the Winrar interface and shell extensions are the best ones I've seen in a long time. And I havent seem anything similar in Linux (I am specifically thinking in cascaded context menus in the Windows Explorer with auto .rar file creation/naming and also auto file naming and creation when selecting some files in a folder). But still, closed source and even if it is usable with wine, it has no integration with any Linux file manager... so it is a no no.

--
Ubuntu is an African word meaning 'I can't configure Debian'

Best compression for what? by Joce640k · 2007-04-22 18:37 · Score: 1

The "winners" have special compression modes for .wav files, etc. (lossless audio algorithms) so of course they "won" for datasets which include those files.

On the other hand, "zip" works everywhere. I can send a zip file to my granny and she could find things inside it.

--
No sig today...

What the blurb SHOULD link to: by imsabbel · 2007-04-22 18:44 · Score: 1

http://www.maximumcompession.com/
THERE is the most exhaustive data compressor comparison. Including many different tests, and scores of compressors.

--
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?

Easily the WORST article.... by Joce640k · 2007-04-22 18:48 · Score: 1

This is easily the WORST article I've ever seen on data compressors.

Does anybody zip their mp3 or avi files?

As for their "Compression Efficiency" calculation, that's about as much use as a chocolate teapot.

--
No sig today...

overflows by OriginalArlen · 2007-04-22 19:04 · Score: 1

Does it address the number of overflows, smashed stacks, tap-dancing on the heap vulnerabilities? I monitor software vulnerabilities for my employer, and there has been a stready flow of exploitable bugs in archiving software (everything from zlib , to Winzip. (Who knew Wz includes an ActiveX control, allowing users to be owned via a wenbsite?!) Many anti-virus apps have also been vulnerable to issues in unpackers of various flavours.

--

Everything I needed to know about life, I learnt from Blake's Seven

"Exhaustive" by streepje · 2007-04-22 19:09 · Score: 1

Poor article. Even the Wikipedia article is more "exhaustive." http://en.wikipedia.org/wiki/Data_compression

Even two minutes googling for "data compression" will get you more useful and better "compressed" information.

http://www.ics.uci.edu/~dan/pubs/DataCompression.h tml

http://datacompression.info/

http://www.maximumcompression.com/

http://www.compression-links.info/Link/248_Markov_ Predictive_Coders_PPMZ.htm

for italian reader... by netvandal · 2007-04-22 19:18 · Score: 1

i found a similar article written in italian: http://www.amdplanet.it/archivio/articoli/131/ the result are the same...

Don't care by Schraegstrichpunkt · 2007-04-22 19:28 · Score: 1

I don't care about which compression mechanism works the fastest or produces the smallest files. I care about usefulness. The format has to be open and widely-used, and the algorithms have to be reasonably fast. That means I either use .zip, .tar.gz, or .tar.bz2. Goofy formats like .rar and .ace just aren't worth the headache.

--
http://outcampaign.org/

Exhaustive? by Daniel+Phillips · 2007-04-22 19:30 · Score: 1

This is easily the best article I've seen comparing data compression software

Hardly exhaustive. There is no mention of rzip.

--
Have you got your LWN subscription yet?

How about by DrSkwid · 2007-04-22 19:34 · Score: 4, Funny

Give it am MD5 hash and a file length and it will compute all the possible files that could have produced the hash. Automatically filter our the invalid files and the set you're left with can't be that large.

--
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter

Re:How about by hab136 · 2007-04-23 03:08 · Score: 1

Give it am MD5 hash and a file length and it will compute all the possible files that could have produced the hash. Automatically filter our the invalid files and the set you're left with can't be that large.

I thought of that, and even created a proof of concept program. It's ridiculously slow.

Imagine a 1 KB file. That's 1024 kilo bytes, or 8192 bits (1 byte = 8 bits). That's 2^8192 (1.09x10^2466) different combinations of 1 and 0 to test.

Assuming you could test each combination in one instruction, and processed 2,000,000,000 instructions per second (2 GHz), you'd have to run for 5.45x10^2456 seconds. That's 1.72 x 10^2449 years.

A long time for 1 KB!

On a similar note, Plan 9's Venti filesystem uses SHA1 to avoid duplicating data on disk.
Re:How about by hab136 · 2007-04-23 03:10 · Score: 1

Imagine a 1 KB file. That's 1024 kilo bytes, or 8192 bits (1 byte = 8 bits).

Bah, the math is right, but this is wrong - should be 1KB = 1024 bytes = 8192 bits.
Re:How about by Surt · 2007-04-23 03:16 · Score: 1

You'd be surprised at just how large that set would be. Particularly since you presumably can't really rule out most non-binaries without significant difficulties.

For the math: Assume you used a 256 bit version of MD5 with perfect distribution. You encode some 1k byte (8192 bit file). How many possible files must collide on the given hash?

How many if we can also prove that roughly half the bytes (leaving 4096 bits) must fit some odd file format you've picked?

(The answers to both questions are unfortunately uselessly large numbers).

--
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Re:How about by fracai · 2007-04-23 03:55 · Score: 1

Here you go:

4630b6d609b92a872f85e959c103c8468c291691 403

(I should note that this is sha1, not md5)

happy decompressing

--
-- i am jack's amusing sig file
Re:How about by drinkypoo · 2007-04-23 04:03 · Score: 1

You'd be surprised at just how large that set would be. Particularly since you presumably can't really rule out most non-binaries without significant difficulties.

Would providing the beginning of the byte stream, perhaps some percentage of it, significantly reduce the search space?

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:How about by slazzy · 2007-04-23 04:23 · Score: 1

Very nice movie - thank you!

--
Website Just Down For Me? Find out
Re:How about by Surt · 2007-04-23 05:51 · Score: 1

No, unfortunately, the search space is always 2^unforced bits in size, which is always going to be a huge number for any reasonable file. So unfortunately the suggested strategy can pretty much never be made useful. This is essentially because MD5 (or any hash) is just an extreme example of a lossy compression algorithm. And it's extremely lossy.

--
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Re:How about by nbritton · 2007-04-23 11:01 · Score: 1

SHA-256 would work better* for this... and even though you were joking... This could be useful for recovering corrupted files.

*Less collisions etc.
Re:How about by Anonymous Coward · 2007-04-23 12:00 · Score: 0

..just when I thought I had a foolproof goatse filter.
Re:How about by DrSkwid · 2007-04-23 14:23 · Score: 1

I know about Venti, I'm running it. Score 1 for glenda (or rather 723cae95e0e6f0f8ecebc64ab41c4ece93c60362)

You don't even have to be running Plan 9. The Plan9 port has venti stuff

for instance backing up non-plan9 file systems
http://swtch.com/plan9port/man/man8/vbackup.html

or just plain venti itself. http://swtch.com/plan9port/man/man7/venti.html

Russ just did a new version today with better performance.

http://groups.google.com/group/comp.os.plan9/brows e_thread/thread/88a9b4cf365e8246/5c4c506b1c5fc92d

--
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Re:How about by GWBasic · 2007-04-24 09:08 · Score: 1

Give it am MD5 hash and a file length and it will compute all the possible files that could have produced the hash. Automatically filter our the invalid files and the set you're left with can't be that large.

The various replies state that such a technique is computationally impossible. What would be interesting is if you sent every 3rd byte and used MD5 to fill in the gaps.

--
No, I will not work for your startup

There Are Only A Few Really Useful Algorithms by MCTFB · 2007-04-22 19:36 · Score: 2, Informative

for general purpose lossless compression. Most modern compression utilities out there mix and match the same algorithms which do the same thing.

With the exception of compressors that use arithmetic coding (which has patents out the wazoo covering just about every form of it), virtually all compressors use some form of Huffman compression. In addition, many use some form of LZW compression before executing the Huffman compression. That is pretty much it for general purpose compression.

Of course, if you know the nature of the data you are compressing you can come up with a much better compression scheme.

For instance, with XML, if you have a schema handy, you can do some really heavy optimization since the receiving side of the data probably already has the schema handy which means you don't need to bother sending some sort of compression table for the tags, attributes, element names, etc.

Likewise, with FAX machines, run length encoding is used heavily because of all the sequential white space that is indicative of most fax documents. Run length encoding of white space can also be useful in XML documents that are pretty printed.

Most compression algorithms that are very expensive to compress are usually pretty cheap to decompress. If you are providing a file for millions of people to download, it doesn't matter if it takes 5 days to compress the file if it still only takes 30 seconds for a user to decompress it. However, when doing peer to peer communication with rapidly generated data, you need the compression to be fast if you use any at all.

Nevertheless, most generaly purpose lossless compression formats are more or less clones of each other once you get down to analyzing what algorithms they use and how they are used.

Re:There Are Only A Few Really Useful Algorithms by kyz · 2007-04-22 22:55 · Score: 3, Informative

Wow, are you speaking beyond your ken. When you say "some form of LZW compression", you should have said "some form of LZ compression" - either Lempel and Ziv's 1977 (sliding window) or 1978 (dictionary slots) papers on data compression by encoding matched literal strings. LZW is "some form" of LZ78 compression which, apart from GIFs, almost nobody uses. It's too fast and not compressy enough. Most things use LZH (LZ77 + Huffman), specifically DEFLATE, the kind used in PKZIP, firstly because the ZIP file format is still very popular, and because zlib is a very popular free library that can be embedded into anything.

Fax machines use a static Huffman encoding. They've never used run-length encoding. Run-length encoding is nothing compared to how efficiently LZ77 or LZ78 would handle pretty-printed XML.

Compression algorithms vary on both their compression and decompression speed. LZ77 is slow to compress and fast to decompress. Arithmetic coding and PPM are slow both compressing and decompressing.

--
Does my bum look big in this?
Re:There Are Only A Few Really Useful Algorithms by Anonymous Coward · 2007-04-23 01:13 · Score: 0

And strictly speaking Huffman encoding and arithmetic encoding are not compression algorithms. They are encoding techniques that need to be paired with a statistical model in order to make a compression algorithm.
Re:There Are Only A Few Really Useful Algorithms by Anonymous Coward · 2007-04-23 07:16 · Score: 0

Nevertheless, most generaly purpose lossless compression formats are more or less clones of each other once you get down to analyzing what algorithms they use and how they are used. You mean like PPM, BWT and LZMA?
Re:There Are Only A Few Really Useful Algorithms by splutty · 2007-04-24 02:08 · Score: 1

And to add to that: LZ in for example ARJ archives (if you select the Huffman coding that is), is only used to build the initial statistical tables that are then used by the Huffman algorithm. Although you can get ARJ to just 'store' files and inflate/deflate them without using Huffman purely in LZ. (and that gives horrible compression rates)

--
Coz eternity my friend, is a long *ing time.

What I'd like to know is.... by Bazman · 2007-04-22 19:36 · Score: 1

How well do the compression algorithms compress the other compression algorithms files? :)

NTFS Compression is a horror story by rdebath · 2007-04-22 19:38 · Score: 2, Insightful

Because NTFS filesystem compression is horrible.
It has poor compression and slows down the filesystem viciously, mostly due to fragmentation; I've see 200000 fragments in a single file!
I think the compression algoritim it uses is ZLW, you're lucky to get 1.5:1 in the best cases.

There are other issues, like a 20Gb compressed file giving fake disk errors (on a drive with 40Gb of free space) but generally the poor compression and performance is enough to ensure that you don't want to use it.

Re:NTFS Compression is a horror story by Anonymous Coward · 2007-04-23 01:10 · Score: 0

What I am wondering about mostly is, if you compress data files on your disks, do you get a potential performance increase because files are compressed, & not just because the files are read faster up from disk (into RAM for working on them)?

E.G./I.E.-> Since the filesize on disk is smaller on a NTFS compressed volume, it gets read up from disk faster, a speed increase there results.

(This I do not think there is much question of, and since disks are the SLOWEST part of a PC, this is actually quite a large gain (compared to operations that occur in RAM for loading the files to run there uncompressed, & processed by the CPU, today's very fast ones, for compression/decompression)).

I.E.-> Since today's CPU's and memory are SO fast (especially by comparison to say, 10 yrs. ago or more), does this also get "further enhanced for speed" (more) by the fact that CPU's & RAM are so much faster?

I think so, because, the gain in init. diskread up from disk since the file is tinier to read up from disk is even MORE an overall gain because the file is smaller on disk and the cpu & memory is much faster so the decompression time in memory (done by LZ compression/decompression on the CPU level) is negligible?

I hope you see my possible point here. Faster today @ least, than yesteryear running NTFS compression, but also because today's disks are faster as well, but mainly because decompression stages take far less time also, so the init. read from the slowest part of the PC, the diskdrives, is less than ever because files are tinier while NTFS compressed, and the stage to decompress them for a read now is FAR smaller due to faster CPU's and RAM today.

"Because NTFS filesystem compression is horrible." - by rdebath (884132) on Monday April 23, @03:38AM (#18837889)

I don't know about the last time you've used it, but I've been using it consistently since Windows 2000, thru XP, into Windows Server 2003 SP #2 currently on my datastorage disks, with no hassles thusfar in that timeframe (1999-2007 currently). Perhaps I've been lucky, but I think it works and well, w/out the types of errors you mention. Mileage here, and luck, may vary.

"It has poor compression and slows down the filesystem viciously, mostly due to fragmentation; I've see 200000 fragments in a single file!" - by rdebath (884132) on Monday April 23, @03:38AM (#18837889)

I've noted it's "bad" in this capacity, when you first compress a disk that was NOT initially compressed (after you say, run the commandline to covert FAT to NTFS is another that can get in your face here too iirc, & cause some fragmentation, but not as bad as NTFS compression init. run over a disk that already has data on it).

The cure (for both)? Defrag. Pretty simple in that case.

Compression, and the mileage you get out of it, varies by the types of files compressed (text does great for instance, way over 2:1, whereas executables and already compressed formats like media files (jpg, mpg, etc.) do not do very well) as I am sure you know.

Re:Linux is fading away? by HeroreV · 2007-04-22 20:06 · Score: 1

Explain this.

Compressing already compressed formats by Anonymous Coward · 2007-04-22 20:27 · Score: 0

MP3 (aka MPEG-1 Layer 3), MPEG-1/2/4 video, DivX, JPEG, and non-PCM WAV are all compressed media files. Using a file compression utility meant for documents on these formats is a silly waste of time.

I mean, why not compress your already compressed media with all eleven tools in a chain while you're at it?

"Mine goes to eleven."

bofh by losec · 2007-04-22 20:30 · Score: 1

/dev/null beats them all in time and space.

Worthless padding by rdebath · 2007-04-22 20:43 · Score: 1

These tests include neither the Calgary Corpus nor the more recent Canterbury Corpus so there is no baseline to measure their fileset against.

Without that there is "Nothing to see", "Move along, move along"

Perhaps to http://compression.ca/act/act-calgary.html

Re:Screw speed, size reduction: gimme compatibilit by moonbender · 2007-04-22 20:45 · Score: 1

Maybe RAR includes special features for multi-part archives: seeing the archive contents when you only have a single archive, or even extracting as much as possible from only a subset of the archives. Or even something like PAR, letting you get all of the data when you are missing one or two archives by adding error correction data. I don't know that RAR has any of these features, though, except for the first one.

--
Switch back to Slashdot's D1 system.

Don't forget not to go too far by gr8dude · 2007-04-22 20:59 · Score: 3, Funny

This reminds me of... pkunzip.zip

--
The saddest poem

I did one of these about 15 years ago by steevc · 2007-04-22 21:01 · Score: 1

Back then it was a case of trying to compress all the source for a project (in Turbo Basic) onto a single floppy for a quick backup. I vaguely remember that ARJ gave the best compression then. I suspect we were comparing with ZIP and LHA.

We also went through various sorts of DOS (MS, DR) trying to find the one that gave us the maximum free RAM so we could compile the project.

Happy days :)

Re:ATTN: SWITCHEURS! by wdnsdy · 2007-04-22 21:21 · Score: 2, Funny

*imagines parent comment spoken in the voice of comicbook store guy off the simpsons* . . .

heh

LZMA by Ed+Avis · 2007-04-22 21:25 · Score: 1

Yes, LZMA is good, and more importantly it's free (would you really trust your data to some binary blob implementing a secret algorithm?). On Windows there's the excellent 7-zip (also free) and on Unix you can use LZMA Utils to get a gzip-style single file compressor, though it's still a bit developmental and it doesn't have gzip or bzip2's advantage of being well-known and installed everywhere.

However, the very best lossless compression, not mentioned in the article, is probably lrzip which combines LZMA compression with a pre-compression stage of shuffling around the data somehow (a bit technical I know, but bear with me). It likes to gobble memory but it tends to be either much smaller than bzip2, much faster than bzip2, or both.

--
-- Ed Avis ed@membled.com

Another comparison by AnuradhaRatnaweera · 2007-04-22 21:25 · Score: 1

Here is another comparison on the Linux Journal which compares tools such as rzip, lzop, lzma and 7za in addition to bzip2 and gzip.

What about tar+gzip and tar+bzip2 by fintux · 2007-04-22 21:33 · Score: 1

Gzip and bzip2 compress only one file into one package*, and the common method thus is to use tar+gzip or tar+bzip2. While this may not make any difference for video and audio, I think that it makes at least some difference for the documents. The article does not say anything about tar, so I wonder if that might have changed the results, at least a bit. The man page of gzip at least say that using an archive such as tar before packaging improves the results.

On the other hand, tar could be used in combination with other packages as well - maybe it would have changed their results, too..? But still, .tar.gz and .tar.bzip2 are so common combination, that in a sense, they can be considered as an archive format, imho.

*) Although for bzip2, you can concatenate the packages (you can concatenate even gzipped files, but they can't be uncompressed into separate files, and you don't usually want that). I haven't tried this ever, though, but that's what the man page says.

Re:Screw speed, size reduction: gimme compatibilit by Anonymous Coward · 2007-04-22 21:44 · Score: 0

It was more important back in the floppy days where you could also do things like 'split to use all the free space on this disk'. RAR was the first one (or first one I saw) that implemented that right. Might still be useful e.g. to split huge datasets to write to multiple DVDs.

cat: not all compressors/decompressors accept streams (e.g. rzip), and cat won't work across volumes: taking the DVD dataset example again you couldn't use a streaming cat decompress unless you've got as many DVD drives as you have data discs whereas using RAR et al you could use a single drive that would let you switch DVDs in and out. Or I suppose you could write a disk-change-capable cat, though.

Re:Screw speed, size reduction: gimme compatibilit by Anonymous Coward · 2007-04-22 21:54 · Score: 0

It may not be an Open Source license, but it source is available and it is portable. ftp://ftp.rarlabs.com/rar/unrarsrc-3.7.3.tar.gz.

Re:Screw speed, size reduction: gimme compatibilit by fsiefken · 2007-04-22 22:15 · Score: 0

7-zip, multiplatform, superior speed and compression, open source.

SMP hardware? by MrNemesis · 2007-04-22 22:16 · Score: 2, Insightful

I only skimmed the article but what with all the hullabaloo about dual/quad core chips, why didn't they use "exhaustive" as an excuse to check out the parallelisability (if that's a word) of each compression algorithm? IIRC they didn't list the hardware they used or any of the switches they used, which is a glaring omission in my book.

Of all the main compression utils I use, 7-zip, RAR and bzip2 (in the form of pbzip2) all have modes that will utilise multiple chips, often giving a pretty huge speedup in compression times. I'm not aware of any SMP branches for gzip/zlib but seeing as it appears to be the most efficient compressor by miles it might not even need it ;)

It's mainly academic for me now though anyway, since almost all of the compression I use is inline anyway, either through rsync or SSH (or both). Not sure if any inline compressors are using LZMA yet, but the only time I find myself making an archive is for emailing someone with file size limits on their mail server. All of the stuff I have at home is stored uncompressed because a) 90% of it is already highly compressed and b) I'd rather buy slightly bigger hard drives that attempt to recover a corrupted archive a year or so down the line. Mostly I'm just concerned about decompression time these days.

--
Moderation Total: -1 Troll, +3 Goat

Re:SMP hardware? by Anonymous Coward · 2007-04-23 02:39 · Score: 0

It's not that they didn't list the hardware or settings, it's just that you are too blind to notice it. It's all stated here [techarp]. If you took as much time to read as you did to write that pointless comment, you would have noticed it.
Re:SMP hardware? by justthinkit · 2007-04-23 03:51 · Score: 1

the parallelisability (if that's a word) of each compression algorithm?

And perhaps just as important, the non-parallelisability of each program. On Intel HT cpus it is nice when a program runs in just one core. Then, even if at 100% of that core, my machine runs cooler and quieter. Yes, it might be slightly (but probably not 100%) faster if it used two cores -- but it will definitely be twice as hot and use twice as much electricity.

--
I come here for the love

Size v Speed by ThirdPrize · 2007-04-22 22:34 · Score: 1

I suppose while CPUs are becoming faster the things that we want to zip are becoming larger. I thik the average time spent zipping things up has stayed the same over the years and its just the .zip files that have grown bigger.
With CPUs these days is there any reason not to default to max compression?

--
I have excellent Karma and I am not afraid to Troll it.

Re:What's the point of compressing JPEG,MP3,DivX e by kyz · 2007-04-22 22:36 · Score: 4, Interesting

While the main thrust of JPEG is to do "lossy" compression, the final stage of creating a JPEG is to do lossless compression on the data. There are two different official methods you can use: Huffman Coding and Arithmetic Coding.

Both methods do the same thing: they statistically analyse all the data, then re-encode it so the most common values are encoded in a smaller way than the least common values.

Huffman's main limitation is that each value compressed needs to consume at least one bit. Arithmetic coding can fit several values into a single bit. Thus, arithmetic coding is always better than Huffman, as it goes beyond Huffman's self-imposed barrier.

However, Huffman is NOT patented, while most forms of arithmetic coding, including the one used in the JPEG standard, ARE patented. The authors of Stuffit did nothing special - they just paid the patent fee. Now they just unpack the Huffman-encoded JPEG data and re-encode it with arithmetic coding. If you take some JPEGs that are already compressed with arithmetic coding, Stuffit can do nothing to make them better. But 99.9% of JPEGs are Huffman coded, because it would be extortionately expensive for, say, a digital camera manufacturer, to get a JPEG arithmetic coding patent license.

So Stuffit doesn't have remarkable code, they just paid money to get better compression that 99.9% of people specifically avoid because they don't think it's worth the money.

--
Does my bum look big in this?

Best article? by Frozen+Void · 2007-04-22 22:50 · Score: 1

http://www.maximumcompression.com/ is better source

Simpletron by Anonymous Coward · 2007-04-22 22:54 · Score: 0

I have been thinking about creating a new language with about 60 or so words. The idea is that you don't need a lot of words when you can figure out the meaning by context. Strong points are that the language would be very easy to pick up, and you would get that invigorating feeling of talking like a primitive cave man.

Simpletron may be what you require, pizzach. It refactors the English language such that there is only one word for any given concept. All colours are purple, all distances are a mile, all numbers are seven, etc. It's quite handy!

You bike math, Johnson. Math is Johnson biked so seven maths are seven math. Seven purples are purple, seven miles are a mile, seven sevens are seven. It's purple.

Re:Linux is fading away? by fatphil · 2007-04-22 22:58 · Score: 1

But by that metric opera's doing well.

http://www.google.com/trends?q=firefox%2C+internet +explorer%2C+opera&ctab=0&geo=all&date=all

By any other metric it isn't doing particularly well, despite it being a fairly competant browser.

--
Also FatPhil on SoylentNews, id 863

Wrong Ordering in Graphs by ror · 2007-04-22 23:07 · Score: 2, Insightful

In it's efficiency graphs they order the negative scoring ratios wrong! Afterall, they considering something that adds 1MB in 2 seconds to be worse than one that increases the size by 1MB in 2 minutes. So doing the same thing *slower* actually ranks it ABOVE the other one. Plus, what matters, even for large files, is NOT the time for compression. What you REALLY want to compare is the ratio and the time for EXTRACTION on those settings. Any file will be compressed once, decompressed thousands of times. A minute longer to produce means little. A minute longer to extract for everyone extracting it matters a lot.

Re:Wrong Ordering in Graphs by maxwell+demon · 2007-04-23 01:27 · Score: 1

Well, if compression time matters depends on how you intend to use it. If you want to have compression on the fly, you certainly care about compression speed. OTOH if you just want to build up an archive and put it up for download, compression speed is of course largely irrelevant. Also in some situation decompression speed might be largely irrelevant, e.g. say you are sending data over an extremely slow link (in which case a better compression rate will save you more time than a faster algorithm, however in that case you would care if decompression may already start before you received the whole file).

--
The Tao of math: The numbers you can count are not the real numbers.

Re:Linux is fading away? by HeroreV · 2007-04-22 23:25 · Score: 1

That was my point. Google Trends is a really great way of seeing what people are searching for on Google, but it isn't a good way to learn what software people are using.

Re:What's the point of compressing JPEG,MP3,DivX e by Anonymous Coward · 2007-04-22 23:25 · Score: 0

In many jurisdictions, mathematical processes are absolutely not patentable. Are payware compression tools using "patented" algorithms cheaper in those countries where zero royalties are owed, or are British consumers being shafted up the arse again?

Re:small = slow. Tunning UPX (Ultimate Packer eXe) by Anonymous Coward · 2007-04-22 23:26 · Score: 0

I only need a faster decompressor and slower compressor, no a slower decompressor and faster compressor.

zip: OK.
gzip : OK.
bzip2: Fatal.
7zip: OK.

7-Zip is right where I like it to be by QuietLagoon · 2007-04-22 23:43 · Score: 1

Reasonably fast, reasonably good compression, and free (as in beer).

With disk space becoming less and less expensive with each passing week, any of the compressors would work fine for nearly everyone's need.

What are they actually measuring? by Marcion · 2007-04-22 23:55 · Score: 2, Interesting

The article seems to be measuring the compression speed of each program with its native algorithm, it would have been better to do a set of programs with each algorithm first. As the article is comparing two variables at once, how good the algorithm is and how good the implementation in that program is, the results are slightly meaningless.

Having said that, do I really care in practice that much about if algorithm A is 5% faster than algorithm B? I personally do not, I care if the person receiving them can open them. So the second problem with the article is that it is one computer user on his own, in the real world you would just distribute .zip and .tar.gz because you know people will be able to open them. Proprietary algorithm X may be really efficient but if no one can open it, who cares?

--
My little Linux and tech blog

Re:Compress big files before -ALL- File-Transfers by Derwood5555 · 2007-04-23 00:01 · Score: 1

Apache 2 has mod_deflate for compressing data on the fly as its sent. There are caveats with that though. Some browsers don't support it well, and like anything else compressing files that are already compressed doesn't buy you anything.

Thanks to the people who posted helpful comments by pizzach · 2007-04-23 00:16 · Score: 1

I was actually expecting to get moderated funny instead of interesting...but there is still time. I suppose my post had shown even more thought than I had even thought it did. Choosing the number 60 was a bit extreme and random, but I was trying to emphasize what I am aiming for. It's called shooting for the stars and landing on the moon. I don't know how many words the language would come out to be if/when I start creating it.

One thing that I didn't expect was the stream of very informative posts. Thanks to the people who replied with constructive comments! 118 word toki pona shows that you can do a lot with a little. Simpleton is a bit more extreme than I was aiming for though. (laugh) You don't have to have a concept for everything in a language. Klingon is a language whose words tend to relate to war. Toki pona focuses on "the good things in life."

The idea for a language where one word can have a lot of meanings actually came from studying Japanese. Here are some examples from the Japanese English edict dictionary on Jim Breen's site:

seisan (n) confidence in success; (P) seisan (n) hydrocyanic (prussic) acid; (P) seisan (n,vs) exact calculation; squaring of accounts; (P) seisan (n,vs) liquidation; settlement; (P) seisan (n,vs) production; manufacture; (P) seisan (adj-na,n) ghostliness; gruesomeness seisan ghastliness; gruesomeness; luridness seisan (n) banquet; formal dinner seisan (n) celebrant seisan (n) Emperor's age yaru (v5r,vt) (uk) (col) to do; to have sexual intercourse; to kill; to give (to inferiors, animals, etc.); to dispatch (a letter); to despatch; to send; to study; to perform; to play (sports, game); to have (eat, drink, smoke); to row (a boat); to run or operate (a restaurant); (P)

--
Once you start despising the jerks, you become one.

Compression is for wimps by Gothmolly · 2007-04-23 00:16 · Score: 1

Real men upload their code to public FTP servers and let the world mirror it.

--
I want to delete my account but Slashdot doesn't allow it.

Re:Compression is for wimps by Anonymous Coward · 2007-04-25 08:05 · Score: 0

And my hovercraft is full of eels.

Here on Non-Windows planet by DrYak · 2007-04-23 00:18 · Score: 3, Funny

Please, I'm curious to know, what are things like there?

Things are less blue.
(And I'm not speaking of the sky)

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]

KGB by BenEnglishAtHome · 2007-04-23 00:50 · Score: 1

What is this kgb compression I'm starting to see? I found the site and couldn't install from source under Linux. The Windows software installed but I haven't been able to open a damn thing with it. The docs are pretty crappy.

Anybody had any good experience with kgb-compressed files?

Re:KGB by imsabbel · 2007-04-23 01:22 · Score: 1

Its a paq8 clone. Like all of them, it compresses incredibly well, and incredibly slowly.
And "you havent been able to open a damn thing with it"? Are you, well, a rather dim bulb, or didnt you notice that you need files compressed with that compressor?

--
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
Re:KGB by BenEnglishAtHome · 2007-04-23 02:05 · Score: 1

I'm frequently a very dim bulb, but (at least in this case) my lights aren't completely out.

Yes, I have several archives supposedly compressed with kgb. The file name extensions are right (though I'm at work and I don't remember all the permutations off the top of my head). Generally, I get two different errors. Either the software reports that the file isn't really a kgb archive and can't open it or, in the case of password protected files, the provided passwords are *always* wrong. In the forums where these files are being posted, others report success with them though no one is willing to help with technical questions (which are considered offtopic and usually just get a rousing "RTFM" in response, even though there really is no FM to R).

duh! by Stooshie · 2007-04-23 01:03 · Score: 1

... gives the best compression but operates slowest ... is fastest but compresses least ...

Hardly a spoiler!

--
America, Home of the Brave. ... .and the Squaw.

No mention of PAQ ? by Hobart · 2007-04-23 01:07 · Score: 1

As linked by other folks on this thread, maximumcompression.com will show that WinRK (proprietary) and PAQ8 (GPL) take the crown in compression. The free PAQ series (wiki, homepage) kick some serious butt...

(Tested on a Project Gutenberg text "The Man who was Thursday")
79105___thurs.paq8l-7 79112___thurs.paq8l 96495___thurs.bz2-9 96708___thurs.rz 107583__thurs.7z 123847__thurs.gz-9 320553__thurs.txt

--
Slashcode bug # 497457 - unfixed since December 2001 - Go look it up!

--
o/~ Join us now and share the software ...

Re:Screw speed, size reduction: gimme compatibilit by snemarch · 2007-04-23 01:19 · Score: 1

...and then you have to join the split-up files again before you can extract. RAR (and other archivers with split archive support) automagically extracts without wasting time and disk spice on a 'join' operation.

--
Coffee-driven development.

Re:Thanks to the people who posted helpful comment by Viol8 · 2007-04-23 01:20 · Score: 1

"The idea for a language where one word can have a lot of meanings actually came from studying Japanese"

You don't need japanese for that: http://dictionary.reference.com/browse/set

Re:ATTN: SWITCHEURS! by Anonymous Coward · 2007-04-23 01:29 · Score: 0

The Mac may have started out as a machine for artists, but along the way Apple figured out artists never have much money and hang on to computers for decades past when they should have been replaced.

So Apple started selling computers to regular people and business people and sysadmins and anyone else who can afford one.

Starving artists? Who gives a shit about them when Apple can sell a 4 grand PC to somebody who just wants to have one? Compression method? It doesn't fucking matter WHAT compression method they use. SIT, DMG, ZIP, whatever.

Use the one that suits your needs and shut the hell up. Oh and get a real job you artist bum.

Ever try a multi threaded compression program? by drusifer2 · 2007-04-23 01:33 · Score: 1

Hey I was having to deal with slow compression on large files so I wrote a multi threaded compression program. On a multi CPU machine you get double/quadruple/... the speed with the same level of compression. I used bzip2 since it allowed me to compress the file in chunks that could be handled in separate threads (not all compression algos can work this way). Check out my results and try out my compression program here:

http://code.google.com/p/zipmt/

Re:L-Zip by Anonymous Coward · 2007-04-23 01:39 · Score: 0

Wow, that's great compression. What's the status of the decompression support?

Re:Linux is fading away? by maxwell+demon · 2007-04-23 01:39 · Score: 1

Well, maybe it's related to this! :-)

--
The Tao of math: The numbers you can count are not the real numbers.

Re:Screw speed, size reduction: gimme compatibilit by maxume · 2007-04-23 01:43 · Score: 1

http://www.freebyte.com/hjsplit/#win32

--
Nerd rage is the funniest rage.

AJR32 is fastest by bogomipz · 2007-04-23 02:19 · Score: 0, Troll

No shit! AJR32 isn't even the name of a program. No wonder it executes fast, and your faith must be stronger than Yoda's if it has any effect on your file size at all.

C:\>AJR32 "My Crappy Document.doc" 'AJR32' is not recognized as an internal or external command, operable program or batch file. C:\>

Re:Screw speed, size reduction: gimme compatibilit by Fweeky · 2007-04-23 02:24 · Score: 2, Informative

RAR has recovery records (settable percentage of each archive dedicated to ECC, default off) and recovery volumes (dedicated files with PAR-like recovery capabilities). "Keep broken files" can be used to extract from broken or truncated archives.

They fucked up JPEGs by DrYak · 2007-04-23 02:26 · Score: 1

The "winners" have special compression modes for .wav files, etc.

The thing that I find the strangest is that modern compressors have also special modes for JPEG files.
Either they detect them quickly to completely avoid trying to compress them and achieve superior speed.
Or some compressor use special mode, where the software decompresses the JPEG data back to the DCT stage and then use some more modern and efficient algorithm to store the DCT data than the original Huffman code.

It's strange because although their suit of software included StuffIt, they completely failed to demonstrate it.
(Instead, apparently StuffIt went for the "avoid compression to gain speed" route)

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]

Re:They fucked up JPEGs by Lars+T. · 2007-04-23 04:36 · Score: 1

The "winners" have special compression modes for .wav files, etc.

The thing that I find the strangest is that modern compressors have also special modes for JPEG files.
Either they detect them quickly to completely avoid trying to compress them and achieve superior speed.
Or some compressor use special mode, where the software decompresses the JPEG data back to the DCT stage and then use some more modern and efficient algorithm to store the DCT data than the original Huffman code.

It's strange because although their suit of software included StuffIt, they completely failed to demonstrate it.
(Instead, apparently StuffIt went for the "avoid compression to gain speed" route) This could be because they used StuffIt 9.5, the JPEG compressor came in 10, recent is 11. All other compressors but WinZip (11.0 instead of 11.1, which came out last week) use the latest versions. Gee, I wonder how that happened.

--
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

exact file extraction by Anonymous Coward · 2007-04-23 03:00 · Score: 0

I'm really sorry but a compression program that "re-encode" the file and produce an extracted file different than the original when de-compressing is not a compressor. Call it a re-encoder or what you will, but it is not a compressor.

Put it another way, if I compress a file whose SHA1 sum is:

712be36b6f1f3c7ecfd5572a8e51324d06d01182

I do want the decompressed file to give the same SHA1 sum. Otherwise we are not talking about compression/decompression and it's really comparing apples to oranges.

Another thing: JPEG is (usually) lossy.

Say I've got my nice picture and save it from Photoshop (or The Gimp) using a "80% efficiency". And I've got a lossy JPEG that I consider OK. The last thing I want is a compressor adding *additional* compression artefacts to my pictures. I know JPEG in my case is lossy, but that doesn't mean I want to have a picture crappier than my already lossy "original". So do these fake compressor (as explained before, if the uncompressed file isn't bit-for-bit to the original, we're not about a compression program) add other artefacts to JPEG pictures or are they smart enough to "replace" the Hufman with another algo that produce exactly the same output?

Otherwise not only are we not talking about a compression program but we're not talking either about a program correctly encoding a picture.

One mistake in the test by DeVilla · 2007-04-23 03:35 · Score: 1

I don't know if I'd call foul, but according to the article, the default configuration that they test gzip with was 'gzip -5 -v' but according to http://www.gnu.org/software/gzip/manual/gzip.html and every other version of the gzip manual I've read, the default compression level is -6. This will make gzip's default setting appear to compress less and run faster than the real default settings. This is incorrect.

Re:What's the point of compressing JPEG,MP3,DivX e by phasm42 · 2007-04-23 03:37 · Score: 1

Does this mean that the jpeg you put into a Stuffit archive may not match the jpeg you pull out of it, or does it recode to Huffman when you extract?

--
"No one likes working in a hamster wheel, and your shop smells of cedar shavings from here." - TaleSpinner

Mostly useless... by gweihir · 2007-04-23 03:40 · Score: 1

The person that did the comparison does not undertsand the significance of compression in practice. Files that are not really compressible by general-purpose compressors do not form a useful benchmark set. For example trying to compress JPEG is an execise in futility and only demonstrates an astonishing degree of incompetence. One typical thing that should have been done is eleminate all tests were not at least a compression to 70% size was reached, since compression makes not sense in these cases at all. Another thing that definitely should have been in there is compression of a typical HDD backup! That is were I, and probably many other people, use compression most.

Bottom line: 90% useless, at least 90% of the useful test-cases missing.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:L-Zip by inviolet · 2007-04-23 03:48 · Score: 1

The L-Zip project at http://lzip.sourceforge.net/ [sourceforge.net] seems to be down right now but it should be included in any file compression comparison. It could reduce files to 0% of their original size and it was quick too.

I have written one such compression algorithm myself. And it really does work. I'm going to ship it, just as soon as I fix this one lingering bug in the decompression routine . . .

I also have a different but equally revolutionary compression algorithm under development. It can compress a file of any size down to one byte. In my proof-of-concept tests, it successfully compressed and then decompressed 256 different files, some of which were over 100GB in size! I'm working on adding support for more than 256 files, but I've got more research to do first.

--
FATMOUSE + YOU = FATMOUSE

7Zip by XSforMe · 2007-04-23 03:57 · Score: 1

Try 7 Zip. The 7z compression format is totally unknown, but 7Zip will manage to compress and uncompress into a wide variety of formats (including zip, rar). It has a decent GUI and shell integration and to top it all, it is open source.

--
My other OS is the MCP!

there are others by Anonymous Coward · 2007-04-23 04:05 · Score: 0

for example bzip uses the burrows wheeler transform, which dates to about 1992, and is interesting to read about. basically it transforms blocks (say 200k at a time) so the block is "sorted" and stores the transform required to reverse this. then applies a simple runlength/delta style compression to the sorted block. this turns out to produce compression ! huffman and arithmetic compression are about as old as information theory itself.

http://en.wikipedia.org/wiki/Burrows-Wheeler_trans form

one advantage is that if there is corruption in a block then you only lose that block, instead of the entire file.

of course there are many ways of encoding redundancy/recovery info. some much more sophisticated. i remember doing a math unit which involved the group properties of elliptic curves. which are cubic in one var and quadratic in the other. they look like a circle and a hyperbola from memory and you define a group operation by using the property that two points form a line that intersects somewhere else (then you flip on an axis). they can be used for encryption, but also to provide "global" "hologram style" redundancy. i think i have read about using it for some next gen optical disks (they use several layers of error correction), this was years ago, so its possible it has already been implemented in hddvd or bd.

Dude! by roelbj · 2007-04-23 04:10 · Score: 1

Thanks for the free pr0n!

Exhaustive? by brianary · 2007-04-23 04:13 · Score: 1

Text? XML? Source code? Executable code? GIF? PNG? AAC? Flash SWF? JAR files? Actual E-Books (MS LIT, eReader, Plucker)? Which version of Office docs are those? Are they XML? Quality level of JPEGs? Could the ratios of HTML and Office documents be any more arbitrary?

Re:What's the point of compressing JPEG,MP3,DivX e by kyz · 2007-04-23 04:31 · Score: 1

Yes, it recodes it into Huffman after extraction. Here is Aladdin's white paper on how they do it.

--
Does my bum look big in this?

Re:What's the point of compressing JPEG,MP3,DivX e by bigbigbison · 2007-04-23 04:32 · Score: 1

I'm no expert, but I don't think this is accurate. When you use stuffit on a jpeg you get a stuffit file (.sitx) not a jpeg. They are using their own algorythm to compress the jpeg not simply changing it to another form of jpeg. A small project like PAQ8 has similar (but not as good) jpeg compression but I seriously doubt that they paid a patent fee. Both are compressing the data with their own algorithms and then decompressing them and converting them back to the original file.

Again, I'm no expert, so if I'm wrong please let me know.

--
http://www.popularculturegaming.com -- my blog about the culture of videogame players

LZMA is used in 7-zip by Anonymous Coward · 2007-04-23 04:35 · Score: 1, Interesting

Its true that LZMA often flies below the radar and not many people are aware of it (just try Googling for it, or looking for research papers about it--there's not much).

However, it is the algorithm used in 7-Zip. It is represented in this test.

Speaking as a person with interest in 64K intros, LZMA is an awesome, awesome algorithm if you need fast decompression and *small decompression code*. A carefully hand-tuned implementation of an LZMA decompressor would be less than 2K of assembly code, and could perhaps be crammed into 1K by a sufficiently clever hacker. This is an order of magnitude smaller than most algorithms that can give comparable compression performance.

The high compression of LZMA comes from combining two basic, well-proven compression ideas: sliding dictionaries i.e. LZ77/78, and markov models (i.e. the thing used by every compression algorithm that uses an arithmetic encoder or similar order-0 entropy coder as its last stage). LZMA is awesome because the contexts used in its model are segregated according to what the bits are used for. Folding that knowledge right into the model results in a simple but very effective compression scheme.

The Tick by HTH+NE1 · 2007-04-23 04:44 · Score: 1

Basically, it's a variation on the "but all our documents are .DOC!" issue which keeps so many people using Office

"[I c]an't lose my name. It's on all my stationery!"

--
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?

HEY, ELITIST SNOBS! by Anonymous Coward · 2007-04-23 06:20 · Score: 0

Come on a powertrip where you can claim to be trendy, hip and even avant garde for the more advanced.

You get to pretend to be interested in, nay like computers, yet at the same time deride, insult and attempt to belittle your fellow comrades that share your interests. Behold the wonders of modern technology as you too can become a power user without actually having to care about computers, algorithms or hacking! You don't need to learn any obscure and arcane mumbo jumbo to run your new limited range of overpriced junk.. *ahem* feature-full all in one complete deluxe no-need-to-ever-upgragde top of the line hardware.

So don't wait any longer. Tilt your head high and stare down your nose at the pathetic dweebs that have no class. Get superiority, snobbishness and a dash of self loathing as a Mac geek!

--

(apologies to real Mac users)

Re:What's the point of compressing JPEG,MP3,DivX e by kyz · 2007-04-23 06:44 · Score: 1

Yes, they convert JPEGs to .SITX files. But the data inside the .SITX file is basically that JPEG data unpacked and then repacked with an arithmetic coder. The actual method used is called "Arsenic" and uses arithmetic coding, RLE and a block sorting compressor based on the Burrows-Wheeler Transform.

--
Does my bum look big in this?

Re:What's the point of compressing JPEG,MP3,DivX e by Anonymous Coward · 2007-04-23 06:44 · Score: 0

Huffman's main limitation is that each value compressed needs to consume at least one bit. Arithmetic coding can fit several values into a single bit. Thus, arithmetic coding is always better than Huffman, as it goes beyond Huffman's self-imposed barrier.

huh i don't get this, since a bit is a computers' smallest unit of information, how does one fit several values into it ?
seriously i would like to know that

Re:What's the point of compressing JPEG,MP3,DivX e by Anonymous Coward · 2007-04-23 07:12 · Score: 0

It seems PAQ8L does something similar to Stuffit while being free. Did they pay for the patent as well, but decided to distribute their code for free anyways?

someone tell me by Khashishi · 2007-04-23 07:56 · Score: 1

What I don't get is, why is the *nix community attached to tar+gzip or tar+bzip? It seems like a couple of outdated formats with bad archiving features. Why would anyone want to go through the trouble of processing files twice just to compress a directory?

Re:Screw speed, size reduction: gimme compatibilit by jawtheshark · 2007-04-23 08:40 · Score: 1

I was more thinking of a native way ;-)

--
Ahhh...the great dumpster continuum. Many a free computer will be found there. -- sowth (748135)

Re:What's the point of compressing JPEG,MP3,DivX e by Myopic · 2007-04-23 09:07 · Score: 1

I'm having a hard time following you, as I am not a compression algorithm expert. Can you explain how to "fit several values into a single bit"? Where I come from, that would be considered a very good trick indeed.

Re:What's the point of compressing JPEG,MP3,DivX e by kyz · 2007-04-23 09:49 · Score: 1

The basic premise is that a specific sequence of symbols, based on probability, boils down into a thin fractional range. The Wikipedia article on arithmetic coding explains it quite well.

When you add a symbols, sometimes the binary representation is not precise enough to represent that range, so you add a bit (or several bits) to make it more precise. At other times, the binary fraction is already precise enough to represent the updated range after you've added a symbol, and in those cases you don't need to add any more bits to the output value.

--
Does my bum look big in this?

Data is wrong? by Alomex · 2007-04-23 10:03 · Score: 1

Looking at their compression rate for documents the data looks highly suspicious. Experiment after experiment reports text compression in the order of 20-30% of original size using bzip2, yet they only get 60% of original size??? This is half as good as widely reported figures!

really matters for datastreams by hguorbray · 2007-04-23 10:04 · Score: 1

>>file compression is pretty much only used for large downloads

not entirely correct -compression is also used to reduce bandwidth/maximize data in streaming data applications such as financial market trade data.

The company I work for is switching backend feeds, and although the new feed is 'richer' we need to have it compressed so that clients who have been using dedicated 512kbps circuits will not have to upgrade to fractional T-3 circuits at great expense.

So we are compressing on the fly at the server end and decompressing at the client side -which except for a slight delay is transparent to the customers.

We were going to use PKzip, but the license was too dear, so we settled for gzip

-I'm just sayin'

Re:What's the point of compressing JPEG,MP3,DivX e by swilver · 2007-04-23 10:06 · Score: 2, Informative

Read up on arithmic encoding. Basically it works by creating a huge floating point number. For example, let's say you want to encode a stream like this: "ABBBBABBBB". Statistically, A has a 20% chance of occuring, while B has a 80% chance of occuring. With Huffman you could encode this (obviously) as 0111101111, which takes 10 bits. Huffman encoding being limited to bits has no way to take advantage of the fact that the "B" occurs 80% more often than the "A".

With Arithmic encoding however you'd encode each character according to the exact probability it has of occuring and write it as a fractional number between 0 and 1. For example, if you want to encode an "A", you'd pick a number between 0.0 and 0.2 (the lower 20% of our number); if you want to encode a "B", you'd use a number between 0.2 and 1.0 (the upper 80% of our number).

What you keep track of during encoding is the upper and lower bound of this number. So, when I want to encode the first "A", my lower bound is 0.0 and the upper bound is 0.2. The next character to encode is "B". We already now the range we can pick from is 0.0 - 0.2, but to encode a "B" we need to pick a number in the upper 80% of this range, so 0.04 to 0.20 (picking a number between 0.0 and 0.04 would encode another "A").

The next letter, another "B", would use a range 0.072 - 0.200. The 3rd "B" would narrow the range to 0.0976-0.2000. The 4th "B" narrows it to 0.11808 - 0.2000.

At some point, the upper and lower bound will have a few most significant digits in common that cannot change anymore. When this occurs, you can start writing these out as part of your compressed stream. For example, when we encode the 6th character (the 2nd "A"), the range becomes 0.118080 - 0.134464. The first two digits (0.1) can't change anymore now, so we can write them out, and just continue narrowing the range further for subsequent data to be compressed.

At some point, there'll be no more data to be compressed, and you then just pick a number (as convenient as possible) between the upper and lower bound you have established, write it out and end the stream. The process is the same when doing this with binary floating point numbers.

Re:Screw speed, size reduction: gimme compatibilit by nbritton · 2007-04-23 10:36 · Score: 1

Nice comparison, but there's really only two that matter (at least on PCs): ...

That's not the conclusions I made! I just tried SBC and it managed to compress a 619MB ISO to 293MB! For comparison WinRAR compressed it to 353MB, and WinZip to 382MB.

SBC archiver is worth the extra hassle... If your dealing with billable network transfers. Someone needs to reverse engineer the application so we can implement it on *nix systems.

UHARC is missing by mariushm · 2007-04-23 12:38 · Score: 1

UHARC (http://en.wikipedia.org/wiki/UHarc) is missing from that list. It's best know for Game rips, it compresses multimedia files really well but also takes a lot of time to do it.

And this is called a report these days? by Anonymous Coward · 2007-04-23 22:15 · Score: 0

Well, first thing that I notice in this so called report is a constantly rotating adverts. They refresh page every second. So I believe this was the only reason in writing this report.

Then, let's take a look at the data. WinRAR - the fastest method was "fastest", ok, but why the "default" was also "fastest" and not "Normal" as it is in reality? And where is its "Slowest" method?

My opinion - this report was bought by WinRK or what is its name. Typical FUD.

Unimpressive results by clupus5150 · 2007-04-24 03:42 · Score: 1

So what I take from those figures is that the best in the field are only just better than half again as effective at compression as my old stalwart compressors (gzip and a copy of WinRAR I bought 4 years ago), and less than 10% better if you take speed into account.

Not really an incentive to buy these 'advanced' and considerably more expensive compressors. I'll stick with the free and old versions thanks.

Not all accept streams.. by zippthorne · 2007-04-24 19:00 · Score: 1

Yes, but if you design a compressor to split files well, it will do that, and if you want it to do something else, you'll have to program that in as well.

If you design it to work on streams, it can do anything with streams that you have utilities for. Including splitting across volumes. For instance, iirc, you can pipe through tar and get the ability to change media. Or, growisofs will, i believe, do the same thing for DVDs, with a bit of command line fu.

--
Can you be Even More Awesome?!

Slashdot Mirror

Exhaustive Data Compressor Comparison

305 comments