Compress Wikipedia and Win AI Prize

← Back to Stories (view on slashdot.org)

Compress Wikipedia and Win AI Prize

Posted by ryuzaki0 on Sunday August 13, 2006 @10:50AM from the what-does-this-mean dept.

Baldrson writes "If you think you can compress a 100M sample of Wikipedia better than paq8f, then you might want to try winning win some of a (at present) 50,000 Euro purse. Marcus Hutter has announced the Hutter Prize for Lossless Compression of Human Knowledge the intent of which is to incentivize the advancement of AI through the exploitation of Hutter's theory of optimal universal artificial intelligence. The basic theory, for which Hutter provides a proof, is that after any set of observations the optimal move by an AI is find the smallest program that predicts those observations and then assume its environment is controlled by that program. Think of it as Ockham's Razor on steroids. Matt Mahoney provides a writeup of the rationale for the prize including a description of the equivalence of compression and general intelligence."

7 of 324 comments (clear)

Min score:

Reason:

Sort:

Comparison by ronkronk · 2006-08-13 11:08 · Score: 2, Informative

There are some amazing compression programs out there, trouble is they tend to take a while and consume lots of memory. PAQ gives some impressive results, but the latest benchmark figures are regularly improving. Let's not forget that compression is not good unless it is integrated into a usable tool. 7-zip seems to be the new archiver on the block at the moment. A closely related, but different, set of tools are the archivers, of which there are lots with many older formats still not supported by open source tools
Re:for those who rtfa by kfg · 2006-08-13 11:15 · Score: 2, Informative

a) how big the compressed size was

18MB

b) how many bytes was wikipedia before it was compressed

A sample of 100MB

Your goal:
.

KFG
Wrong contest by Baldrson · 2006-08-13 11:55 · Score: 3, Informative

That's another contest that is useless for the reason you cite.
The contest for the Hutter Prize requires the compressed corpus to be a self-extracting archive -- or failing that to add the size of the compressor to the compressed corpus.

--
Seastead this.
Re:It's a big world out there by DrJimbo · 2006-08-13 12:13 · Score: 4, Informative

Harmonious Botch said:
This - in humans, at least - can lead to the cyclic reinforcement of one's belief system. The belief system that explains observations initially is used to filter observations later.
I encourage you to read E. T. Jaynes' book: Probability Theory: The Logic of Science. It used to be available on the Web in pdf form before a published version became available.

In it, Jaynes shows that an optimal decision maker shares this same tendency of reinforcing exiting belief systems. He even gives examples where new information reinforces the beliefs of optimal observers who have reached opposite conclusions (due to differing initial sets of data). Each observer believes the new data further supports their own view.

Since even an optimal decision maker has this undesirable trait, I don't think the existence of this trait is a good criteria for rejecting decision making models.

--
We don't see the world as it is, we see it as we are.
-- Anais Nin
Re:Can it be "lossy" compression? by KiloByte · 2006-08-13 12:37 · Score: 2, Informative

Why so? The test file is exactly 10^8 bytes.
I downloaded the corpus, and indeed, you're right -- it's 10^8 bytes. The article is incorrect, it says 100M where it means 95.3M.

This inconsistency doesn't have any effect on the challenge, though -- that 50kEUR[1] is offered for compressing the given data corpus, not for compressing a string of 100MB.

[1] 1kEUR=1000EUR. 1M EUR=1000000EUR. 1KB=1024B. 1MB=1048576B.
And by the way, what about fixing Slash to finally allow Unicode -- either natively or at least as HTML entities?

--
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Re:It's a big world out there by Baldrson · 2006-08-13 12:49 · Score: 2, Informative

In it, Jaynes shows that an optimal decision maker shares this same tendency of reinforcing exiting belief systems. He even gives examples where new information reinforces the beliefs of optimal observers who have reached opposite conclusions (due to differing initial sets of data). Each observer believes the new data further supports their own view.
I think what Hutter has shown is that there is a solution which unifies the new data with the old within a new optimum, which is most likely unique. I think it is based on the idea that Kolmogorov complexity is a unique value for any string and is most likely represented by a single optimum program (the "self-extracting archive" of the string).

--
Seastead this.
Barebones Windows or Linux by Baldrson · 2006-08-13 14:24 · Score: 2, Informative

See the detailed rules for specifics but generally the rules are just what you would expect: The program runs (and completes in a reasonable time) on a relatively recent system running Windows (currently XP) or Linux with no external inputs, eg no dynamically loaded libraries not included in the submission, no net communication and no disk I/O that isn't generated by the program itself.
Points are not awarded for attempting to circumvent the intent of the competition. I expect such attempts would result in future submissions from the same source being ignored.

--
Seastead this.