Slashdot Mirror


Compress Wikipedia and Win AI Prize

Baldrson writes "If you think you can compress a 100M sample of Wikipedia better than paq8f, then you might want to try winning win some of a (at present) 50,000 Euro purse. Marcus Hutter has announced the Hutter Prize for Lossless Compression of Human Knowledge the intent of which is to incentivize the advancement of AI through the exploitation of Hutter's theory of optimal universal artificial intelligence. The basic theory, for which Hutter provides a proof, is that after any set of observations the optimal move by an AI is find the smallest program that predicts those observations and then assume its environment is controlled by that program. Think of it as Ockham's Razor on steroids. Matt Mahoney provides a writeup of the rationale for the prize including a description of the equivalence of compression and general intelligence."

17 of 324 comments (clear)

  1. But captain by Anonymous Coward · · Score: 5, Funny

    Marcus Hutter has announced the Hutter Prize for Lossless Compression of Human Knowledge the intent of which is to incentivize the advancement of AI through the exploitation of Hutter's theory of optimal universal artificial intelligence.

    But captain, if we reverse the tachyon inverter drives then we will have insufficient dilithium crystals to traverse the neutrino warp.

    1. Re:But captain by Anonymous Coward · · Score: 5, Funny

      You left out the part involving the deflector shield. Remember, the first rule of star trek technobabel is always involve the deflector in some way.

  2. lossy compression by RenoRelife · · Score: 5, Insightful

    Using the same data lossy compressed, with an algorithm that was able to permute data in a similar way to the human mind, seems like it would come closer to real intelligence than the lossless compression would

    1. Re:lossy compression by swillden · · Score: 4, Insightful

      You just need to re-create afile that matches the md5sum and still follows the rules of a Linux kernel. It is extremely unlikely any other file that can be recognized as some kind of Linux kernel and matches. Of course there are countless blocks of data that still match, but very few will follow the ruleset of "ELF kernel executable" structure which can be deduced numerically.

      Mmmm, no. You were fine up until you said "very few will follow the ruleset". That's not true. To see that it's not true, take your kernel, which consists of around 10 million bits. Now find, say, 512 of those bits that can be changed, independently, while still producing a valid-looking kernel executable. The result doesn't even have to be a valid, runnable kernel, but it wouldn't be too hard to do it even with that restriction.

      So you now have 2^512 variants of the Linux kernel, all of which look like a valid kernel. But there are only 2^128 possible hashes, so, on average, there will be four kernels for each hash value, and the odds are very, very good that your "real" kernel's hash is also matched by at least one of them. If by some chance it isn't, I can always generate a whole bunch more kernel variants. How about 2^2^10 of them?

      A hash plus a filter ruleset does not constitute a lossless compression of a large file, even if computation power/time is unbounded.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  3. Who'da thunk... by blueadept1 · · Score: 5, Funny

    Man, WinRar is taking its bloody time. But oh god, when its done, I'll be rich!

  4. It's a big world out there by Harmonious+Botch · · Score: 4, Interesting

    "The basic theory...is that after any set of observations the optimal move by an AI is find the smallest program that predicts those observations and then assume its environment is controlled by that program." In a finite discrete environment ( like Shurdlu: put the red cylinder on top of the blue box ) that may be possible. But in the real world the problem is knowing that one's observations are all - or even a significant percentage - of the possible observations.
    This - in humans, at least - can lead to the cyclic reinforcement of one's belief system. The belief system that explains observations initially is used to filter observations later.

    TFA is a neat idea theoreretically, but it's progeny will never be able to leave the lab.

    --
    I figured out how to get a second 120-byte sig! Mod me up and I'll tell you how you can have one too.

    1. Re:It's a big world out there by gardyloo · · Score: 4, Funny

      TFA is a neat idea theoreretically, but it's progeny will never be able to leave the lab.

            Your use of "TFA" is a good compressional technique, but you could change "it's" to "its" and actually GAIN in meaning while losing a character! You're well on your way...

    2. Re:It's a big world out there by DrJimbo · · Score: 4, Informative
      Harmonious Botch said:
      This - in humans, at least - can lead to the cyclic reinforcement of one's belief system. The belief system that explains observations initially is used to filter observations later.
      I encourage you to read E. T. Jaynes' book: Probability Theory: The Logic of Science. It used to be available on the Web in pdf form before a published version became available.

      In it, Jaynes shows that an optimal decision maker shares this same tendency of reinforcing exiting belief systems. He even gives examples where new information reinforces the beliefs of optimal observers who have reached opposite conclusions (due to differing initial sets of data). Each observer believes the new data further supports their own view.

      Since even an optimal decision maker has this undesirable trait, I don't think the existence of this trait is a good criteria for rejecting decision making models.

      --
      We don't see the world as it is, we see it as we are.
      -- Anais Nin
  5. Re:Can it be "lossy" compression? by richdun · · Score: 4, Funny

    Hmmm...well in that case, someone go edit the Wikipedia entry on "computers" and allow them to store data at the bit level. Also, I heard somewhere where computers in Africa have tripled in the past six months!

  6. Re:Can it be "lossy" compression? by Bill+Kilgore · · Score: 5, Funny

    I have a program that compresses 100M of Wikipedia to one bit with no loss at all. The program is somewhat special-purpose, and at 100,024,076 bytes, a little chunkier than I'd like.

    --
    Rediculous: A word indicating the writer is ridiculously ignorant.
  7. Solution. by Funkcikle · · Score: 5, Funny

    Removing all the incorrect and inaccurate data from the Wikipedia sample should "compress" it down to at least 20mb.

    Then just apply your personal favourite compression utility.

    I like lharc, which according to Wikipedia was invented in 1904 as a result of bombarding President Lincoln, who plays Commander Tucker in Star Trek: Enterprise with neutrinos.

  8. Incentivize? by noidentity · · Score: 5, Funny
    the intent of which is to incentivize the advancement of AI

    Sorry, anything which uses the word "incentivize" does not involve intelligence, natural or artificial.

  9. I'll try: by dcapel · · Score: 5, Funny

    echo "!#/bin/sh\nwget en.wikipedia.org/enwiki/" > archive

    Mine wins as it is roughly 40 bytes total.To get your results, you simply need to run the self-extracting archive, and wait. Be warned, it will take a while, but that is the cost of such a great compression scheme!

    --
    DYWYPI?
  10. Re:WikiPedia on iPod! by Asztal_ · · Score: 4, Funny

    Umm... which of the 5 thousand links is the article?

  11. Would be useful for images by aliquis · · Score: 4, Funny

    ... now all we need is a dictionary for nudity and we could save a lot of bandwidth on the Internet!

  12. Compress Wikipedia and win a prize? by Dachannien · · Score: 4, Funny

    Can't I just punch the monkey for $20 instead?

  13. Re:Painful to read by PeeAitchPee · · Score: 4, Funny

    He did, but Slashdot's AI compressed it for him.

    :-D