Compress Wikipedia and Win AI Prize
Baldrson writes "If you think you can compress a 100M sample of Wikipedia better than paq8f, then you might want to try winning win some of a (at present) 50,000 Euro purse. Marcus Hutter has announced the Hutter Prize for Lossless Compression of Human Knowledge the intent of which is to incentivize the advancement of AI through the exploitation of Hutter's theory of optimal universal artificial intelligence. The basic theory, for which Hutter provides a proof, is that after any set of observations the optimal move by an AI is find the smallest program that predicts those observations and then assume its environment is controlled by that program. Think of it as Ockham's Razor on steroids. Matt Mahoney provides a writeup of the rationale for the prize including a description of the equivalence of compression and general intelligence."
I'd love to be able to have the whole WikiPedia available on my iPod (or cell phone), but without destroying
info.edu.org - Speedy information and news from the Top 10 educational organisations.
Marcus Hutter has announced the Hutter Prize for Lossless Compression of Human Knowledge the intent of which is to incentivize the advancement of AI through the exploitation of Hutter's theory of optimal universal artificial intelligence.
But captain, if we reverse the tachyon inverter drives then we will have insufficient dilithium crystals to traverse the neutrino warp.
For the love of god, proofread!
If it's not on fire, it's a software problem.
Using the same data lossy compressed, with an algorithm that was able to permute data in a similar way to the human mind, seems like it would come closer to real intelligence than the lossless compression would
There. All of wiki, in 31 bytes.
I am very small, utmostly microscopic.
Man, WinRar is taking its bloody time. But oh god, when its done, I'll be rich!
Convert it to AOL! tis wikpedia, teh fri enpedia . teh bst in da wrld.
Stupidity is like nuclear power, it can be used for good or evil. And you don't want to get any on you.
There are some amazing compression programs out there, trouble is they tend to take a while and consume lots of memory. PAQ gives some impressive results, but the latest benchmark figures are regularly improving. Let's not forget that compression is not good unless it is integrated into a usable tool. 7-zip seems to be the new archiver on the block at the moment. A closely related, but different, set of tools are the archivers, of which there are lots with many older formats still not supported by open source tools
"The basic theory...is that after any set of observations the optimal move by an AI is find the smallest program that predicts those observations and then assume its environment is controlled by that program." In a finite discrete environment ( like Shurdlu: put the red cylinder on top of the blue box ) that may be possible. But in the real world the problem is knowing that one's observations are all - or even a significant percentage - of the possible observations.
This - in humans, at least - can lead to the cyclic reinforcement of one's belief system. The belief system that explains observations initially is used to filter observations later.
TFA is a neat idea theoreretically, but it's progeny will never be able to leave the lab.
--
I figured out how to get a second 120-byte sig! Mod me up and I'll tell you how you can have one too.
a) how big the compressed size was
18MB
b) how many bytes was wikipedia before it was compressed
A sample of 100MB
Your goal:
.
KFG
Given that the hypothesis is valid (which is arguable), it seems to me that compressing wikipedia is a fairly useless way of supporting it. It seems like an abstraction error: Wikipedia is *not* a set of rules that predict the observations in it. It's a list of observations, sure, but there's no ruleset involved. Now, someone/thing who can read and parse language can get educated based on the knowledge in wikipedia, but then the intelligence is providing the ruleset, just training itself with the raw data in wiki.
It really seems like one of those mistaking-the-map-for-the-territory errors.
-b
If I wanted a sig I would have filled in that stupid box.
Hmmm...well in that case, someone go edit the Wikipedia entry on "computers" and allow them to store data at the bit level. Also, I heard somewhere where computers in Africa have tripled in the past six months!
I have a program that compresses 100M of Wikipedia to one bit with no loss at all. The program is somewhat special-purpose, and at 100,024,076 bytes, a little chunkier than I'd like.
Rediculous: A word indicating the writer is ridiculously ignorant.
Removing all the incorrect and inaccurate data from the Wikipedia sample should "compress" it down to at least 20mb.
Then just apply your personal favourite compression utility.
I like lharc, which according to Wikipedia was invented in 1904 as a result of bombarding President Lincoln, who plays Commander Tucker in Star Trek: Enterprise with neutrinos.
Sorry, anything which uses the word "incentivize" does not involve intelligence, natural or artificial.
The contest for the Hutter Prize requires the compressed corpus to be a self-extracting archive -- or failing that to add the size of the compressor to the compressed corpus.
Seastead this.
echo "!#/bin/sh\nwget en.wikipedia.org/enwiki/" > archive
Mine wins as it is roughly 40 bytes total.To get your results, you simply need to run the self-extracting archive, and wait. Be warned, it will take a while, but that is the cost of such a great compression scheme!
DYWYPI?
I would argue that lossless compression really is not the best measure of intelligence. Humans are inherently lossy in nature. Everything we see, hear, fear, smell, and taste is pared down to its essentials when we understand it. It is this process of discarding irrelevant detials and making generalizations that is truly intelligence. If our minds had lossless compression we could regurgitate textbooks, but never be able to apply the knowledge contained within. If we really understand, we could reproduce what we've read, but not verbatim. A better measure of intelligence would be lossy text compression that still retains the knowledge contained within the corpus.
A (good) sign of the times, I guess.
This inconsistency doesn't have any effect on the challenge, though -- that 50kEUR[1] is offered for compressing the given data corpus, not for compressing a string of 100MB.
[1] 1kEUR=1000EUR. 1M EUR=1000000EUR. 1KB=1024B. 1MB=1048576B.
And by the way, what about fixing Slash to finally allow Unicode -- either natively or at least as HTML entities?
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
that the entire knowledge of the world could simply be compressed without loss to
yeah, you guessed it..
42...
He who would trade liberty for some temporary security, deserves neither liberty nor security
So, we need a WikiCast - remember folks, you heard it here first!
antipaucity
... now all we need is a dictionary for nudity and we could save a lot of bandwidth on the Internet!
Points are not awarded for attempting to circumvent the intent of the competition. I expect such attempts would result in future submissions from the same source being ignored.
Seastead this.
Human poker players address this issue by deliberately introducing slight randomness into their play. I think a "Hutter AI" will make better real-world decisions if it does the same (see Game Theory).
Occam's razor is also highly suspect. There's the issue of cultural bias when counting assumptions. And all programmers will be aware of how they fixed "the bug" that caused all the problems in an application, only to find there were other bugs that caused identical symptoms.
Reduce, reuse, cycle
Can't I just punch the monkey for $20 instead?
Of course he was joking. If he was serious he would've said "verbificate".