Text Compressor 1% Away From AI Threshold

Posted by kdawson on Monday July 9, 2007 @06:10PM from the second-hutter-prize dept.

Baldrson writes "Alexander Ratushnyak compressed the first 100,000,000 bytes of Wikipedia to a record-small 16,481,655 bytes (including decompression program), thereby not only winning the second payout of The Hutter Prize for Compression of Human Knowledge, but also bringing text compression within 1% of the threshold for artificial intelligence. Achieving 1.319 bits per character, this makes the next winner of the Hutter Prize likely to reach the threshold of human performance (between 0.6 and 1.3 bits per character) estimated by the founder of information theory, Claude Shannon and confirmed by Cover and King in 1978 using text prediction gambling. When the Hutter Prize started, less than a year ago, the best performance was 1.466 bits per character. Alexander Ratushnyak's open-sourced GPL program is called paq8hp12 [rar file]."

2 of 442 comments (clear)

Program size is 1.02 MB! by seanadams.com · 2007-07-09 18:51 · Score: 0, Troll

Which is included in the size calculation... but this raises the question of how much data you'd really want to compress with such a program. It might be quite reasonable to use a decompressor which is, say, 100MB in size if it gives you a better net compression ratio on several GB of text.

100MB of input text seems kind of small and might rule out more useful or more creative solutions to this problem. It also calls into question the relevance of Shannon's theory - what size data set was _he_ talking about?
Re:interesting program name by arivanov · 2007-07-09 20:36 · Score: 0, Troll

And I also suggest a revisit of this load of horrid tripe recently prominently featured on slashdot:http://developers.slashdot.org/article.pl ?sid=07/07/08/0547234
Compare it to the reasons behind this guy achievement. Sit back. Reminisce. Enjoy der blinkenlichten.

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/