Text Compressor 1% Away From AI Threshold

← Back to Stories (view on slashdot.org)

Text Compressor 1% Away From AI Threshold

Posted by kdawson on Monday July 9, 2007 @06:10PM from the second-hutter-prize dept.

Baldrson writes "Alexander Ratushnyak compressed the first 100,000,000 bytes of Wikipedia to a record-small 16,481,655 bytes (including decompression program), thereby not only winning the second payout of The Hutter Prize for Compression of Human Knowledge, but also bringing text compression within 1% of the threshold for artificial intelligence. Achieving 1.319 bits per character, this makes the next winner of the Hutter Prize likely to reach the threshold of human performance (between 0.6 and 1.3 bits per character) estimated by the founder of information theory, Claude Shannon and confirmed by Cover and King in 1978 using text prediction gambling. When the Hutter Prize started, less than a year ago, the best performance was 1.466 bits per character. Alexander Ratushnyak's open-sourced GPL program is called paq8hp12 [rar file]."

6 of 442 comments (clear)

Min score:

Reason:

Sort:

but... by Hes+Nikke · 2007-07-09 18:15 · Score: -1, Redundant

... does it run linux?

--
Don't call me back. Give me a call back. Bye. So yeah. But bye our, well, but alright we are on a shirt this chill.
Just one question by Anonymous Coward · 2007-07-09 18:19 · Score: -1, Redundant

Will this work on Linux?
derrrrr by stewbacca · 2007-07-09 18:56 · Score: -1, Redundant

This entire thread just points out how stupid I am on the Grand-Nerd scale of things ;-)
how does compression relate to AI? by timmarhy · 2007-07-09 19:02 · Score: 0, Redundant

so if we compress google, we will give birth to skynet? how the fuck does a compression program == AI

--
If you mod me down, I will become more powerful than you can imagine....
Re:That's cool.. by Anonymous Coward · 2007-07-10 03:53 · Score: -1, Redundant

Note that you didn't eliminate the spaces - you changed their form into requiring the next character to have a "preceding character is a space" bit.

This increases the required bits for each character by 1. 6 bits are required for lowercase letters + numerals (36 characters).

In the long run, it probably comes out better that way (for that text sample), since the average 'word' length is under 6 characters, but it doesn't do much.

One of these days, I should actually create an account here, I'm really just an Anonymous Lazy Lurker Person...
Re:I'll be reading the source... by Megatronium · 2007-07-10 11:42 · Score: -1, Redundant

However, you'd want a lexicon including phonetic and textual transcription, and some probabilistic sound : phoneme mappings. And then maybe an ontology and a semantic parser would help.
Well, if you're gonna do all that, you may as well be creating AI. Which I guess is the point. :)