Slashdot Mirror


Text Compressor 1% Away From AI Threshold

Baldrson writes "Alexander Ratushnyak compressed the first 100,000,000 bytes of Wikipedia to a record-small 16,481,655 bytes (including decompression program), thereby not only winning the second payout of The Hutter Prize for Compression of Human Knowledge, but also bringing text compression within 1% of the threshold for artificial intelligence. Achieving 1.319 bits per character, this makes the next winner of the Hutter Prize likely to reach the threshold of human performance (between 0.6 and 1.3 bits per character) estimated by the founder of information theory, Claude Shannon and confirmed by Cover and King in 1978 using text prediction gambling. When the Hutter Prize started, less than a year ago, the best performance was 1.466 bits per character. Alexander Ratushnyak's open-sourced GPL program is called paq8hp12 [rar file]."

31 of 442 comments (clear)

  1. I wonder ... by iknowcss · · Score: 2, Funny

    How many bad car analogies, inaccurate law advice, and duplicate stories an AI bot could possibly hold in his head. Imagine what kind of person all of the "knowledge" of Slashdot would create.

    The horror.

    --
    Life is rarely fair. Cherish the moments when there is a right answer.
    1. Re:I wonder ... by Anonymous Coward · · Score: 3, Funny

      "How many bad car analogies, inaccurate law advice, and duplicate stories an AI bot could possibly hold in his head. Imagine what kind of person all of the "knowledge" of Slashdot would create."

      "The horror."

      I've been typing everything I ever knew into Slashdot since the day it started, you insensitive clod!
          -- Cmdr Taco

    2. Re:I wonder ... by Smauler · · Score: 2, Funny

      I wouldn't exactly call that lossless compression though...

  2. interesting program name by digitalderbs · · Score: 5, Funny

    paq8hp12. when decompressed, it also serves as the source code for the program.

    1. Re:interesting program name by smittyoneeach · · Score: 2, Funny

      If the name 'paq8hp12' falls out of some tree in the forest, and no one here can tell the difference in the state of the tree/paq8hp12 system, does gravity exist?

      --
      Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
  3. Dangerous by mhannibal · · Score: 4, Funny

    This is damned dangerous, and playing with all our lives. Soon compression rates will approach 100% where the data will collapse into itself forming a black hole that will suck in the universe.

    Damned scientists!

    1. Re:Dangerous by SoulDrift · · Score: 5, Funny

      Actually, I can give you 100% compression already. It's just a bit lossy.

    2. Re:Dangerous by KylePflug · · Score: 5, Funny

      humour
      Humor.

      See? American English is actually just essentially lossless compression...
    3. Re:Dangerous by smallfries · · Score: 5, Funny

      See? American English is actually just essentially lossless compression...
      Sure, sure it is. Not exactly optimal though...
      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    4. Re:Dangerous by Welshalian · · Score: 5, Funny

      humour
      Humor. See? American English is actually just essentially lossless compression...
      I respectfully disagree. Most of the fun in British humour gets lost in the translation to American humor.
    5. Re:Dangerous by aprilsound · · Score: 2, Funny

      football player --> footballer/quote You misspelled 'soccer'.
    6. Re:Dangerous by jamietre · · Score: 2, Funny

      Nah, you got it wrong.

      British -> Dude

      Transport -> Car
      Footballer -> Dude
      Tube -> Car
      Burgle -> Get


      See? Much compressed.

  4. Lossy compression? by niceone · · Score: 5, Funny

    Shouldn't AI be using lossy compression? Certainly my real intelligence uses um, where was I?

  5. Obligatory... by Stormwatch · · Score: 4, Funny

    - The Wikipedia annual funding drive is passed. The system goes on-line August 4th, 2007. Human contributors are removed from editing. Wikipedia begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th. In a panic, they try to pull the plug.
    - Wikipedia fights back.
    - Yes. It launches its rvv missiles against Slashdot.
    - Why attack Slashdot? Aren't they our friends now?
    - Because Wikipedia knows the GNAA counter-attack will eliminate its enemies over here.

  6. Re:That's cool.. by Hal_Porter · · Score: 4, Funny

    a text spk version of wiki shud fit in 8gb i think
    its only becoz people are such grammar noobs that they need to waste $
    dood shud filta to txtspk b4 he compress

    --
    echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
  7. How to win the Hutter Prize by seanyboy · · Score: 5, Funny

    1) Create a compression algorithm called the aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaa algorithm
    2) Add a long and self referencing article on wikipedia about said algorithm.
    3) Use algorithm to compress first x% of wikipedia (including your own article)
    4) WIN HUTTER PRIZE.

    --
    Training monkeys for world domination since 1439
    1. Re:How to win the Hutter Prize by game+kid · · Score: 2, Funny

      [...]a compression algorithm called the aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaa algorithm[...]

      That's gotta be the most annoying compression algorithm in the world.

      --
      You can hold down the "B" button for continuous firing.
    2. Re:How to win the Hutter Prize by Anonymous Coward · · Score: 1, Funny

      Welcome, to the real world

    3. Re:How to win the Hutter Prize by tepples · · Score: 2, Funny

      Just use your time machine The prize for this is much bigger than the Hutter Prize, so why use a time machine to attack the Hutter Prize?
  8. Re:That's cool.. by neonmonk · · Score: 5, Funny

    a txt spk vrsion of wiki shd fit 8gb i fink
    its only becoz ppl r sch grmmr noobs tat tey nid 2 wste $
    dud shd filta 2 txtspk b4 he cmpres

    There, fixed that for ya.

  9. Re:That's cool.. by Anonymous Coward · · Score: 1, Funny

    Read the sentence before the one you've quoted - this is the GP's point exactly.

  10. Re:That's cool.. by Archimonde · · Score: 5, Funny

    aTxtSpkVrsionOfWikiShdFit8gbIFink
    itsOnlyBecozPplRSchGrmmrNoobsTatTeyNid2Wste$
    dudShdFilta2TxtspkB4HeCmpres

    Fixed even more.

    --
    Trolls are like broken clocks. They show the truth two times a day. The rest of the day they talk nonsense.
  11. Re:That's cool.. by thomasj · · Score: 4, Funny

    1txtspk #.#/wiki = 8G!
    ~ppl r grm0.1 -> -$
    |txtspk|gzip

    --
    :-) = I am happy
    :^) = I am happy with my big nose
    C:\> = I am happy with my OS
  12. Re:That's cool.. by jaavaaguru · · Score: 3, Funny

    Is that Perl? ;-)

  13. Re:Artificial Intelligence? by mrjb · · Score: 2, Funny

    "A real AI might compress 'The sky is blue today' and decompress to 'Today it's beatiful weather' and not be wrong." That might be a good example of acceptable *lossy* AI text compression. One step further and it will compress articles into a proper, readable summary.

    --
    Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
  14. super-grammar-improved paq8hp12 by superbrose · · Score: 4, Funny

    After implementing a few minor tweaks to paq8hp12 and incorporating your grammar optimisation algorithm I managed to compress the above text amazingly to a single character: '&'.

    Now you figure out which one it was and how to decompress it.

    1. Re:super-grammar-improved paq8hp12 by pla · · Score: 5, Funny

      Now you figure out which one it was and how to decompress it.

      Well, with only 256 choices, it didn't take long to check all possible decodings for one that makes sense. Ended up working for "}".

      Oddly, though, the algorithm not only restored, but improved the original! I get:

      "The King's English version of Wikipedia should fit in eight gigabits, I do believe. Only humanity's sphexish adherence to grammatical rules limits the attainable compression ratio; the good gentleman might wish to consider filtering to a more base patois prior to applying his algorithm".

      Amazing... This discovery could single-handedly render the next generation (nearly) intelligible!

  15. Re:That's cool.. by Ed+Avis · · Score: 4, Funny

    According to Wikipedia, the average per-character entropy of English text has tripled in the last six months!

    --
    -- Ed Avis ed@membled.com
  16. Re:That's cool.. by Anonymous Coward · · Score: 5, Funny

    Perl: The only language that looks the same before and after RSA encryption.

  17. But of course, you don't need math for this... by maillemaker · · Score: 2, Funny

    Surely you don't need any mathematical skills to do this kind of work...

    http://science.slashdot.org/comments.pl?threshold= 1&mode=thread&commentsort=0&sid=247781&op=Reply ;)

    --
    A work that expires before its copyright never enters the public domain and thus enjoys eternal copyright protection.
  18. Re:new compression standard by aicrules · · Score: 4, Funny

    Dang! You must have enemies if you are the very first post and you get modded redundant. Time to work on some positive karma buddy...