1st International Longest Tweet Results
Dr_Evil6_6_6 writes "Slashdot had a story about the 1st International Longest Tweet Contest last month, and the winners have just been announced." The winner is impressive.
← Back to Stories (view on slashdot.org)
I don't get it.
If they ask what can be arbitrarily stored in the 4339bits available then there you can store 4339 arbitrary bits. It's a rule of compression. If they are asking for an English language compression program there are plenty better out there. Also if the goal is compression of English text and they aren't including the program size in the tweet then the competition can easily be cheated using a dictionary in the program that can be looked up.
At the winner it's not a particularly good compression algorithm. It doesn't even seem to take bayesian probability of characters into account. I can't see any arithmetic coding (mathematically the perfect entropy encoder) either.
Well his algorithm is dead set simple. It encodes a 4339bit block of data into a series of valid tweetable characters and back again. It does nothing more and it could have been written in a much simpler way.
The maths at the top refer to the number of unique tweet-able messages. If there are 2^4339 total unique tweet-able messages then that means there are 4339 bits available.
Except for the fact the algorithms he has submitted have NOTHING to do with compression, and are just a method of mapping the 4339 bits into the allowable Unicode character set over 140 x 32 bit character "slots", i.e. encoding / decoding only.
With 4339 bits, hell in theory the longest actual tweet you could make is 2^4339 of any single character you choose, using the 4339 bits just to represent a (very large) counter of how many times to repeat the character.
Considering that 2^4339 is approximately 10^1305, and there are probably only 10^82 atoms in the whole universe, that's one bloody long tweet.
1) Every "character" in Twitter, in this algorithm, can store one of 2146369536 values, since the other 1114112 run into problems.
2) 2^4339 2146369596^140
3) Thus, any 4339-bit integer in base 2 can be converted into a 140-bit integer in base 2146369596. To encode, do this conversion. To decode, do it backwards.
Sorry, Slashdot removed my mathematical symbols. Number 2 should read "2^4339 is less than 2146369596^140"
For those wondering of a better way.
//output lowest 31bits of our 4339bit block of data // Shift down
for( i = 4339; i > 0; i-=31) {
output((wxchar)(bigInt & 0xef));
bigInt = bigInt >> 31;
}
Reverse
// add the 31 bits to the current bitInt // Shift up
while( curInput = input() ) {
bigInt += curInput;
bigInt = bigInt 31;
}
You've got it almost right.
Your encoder is encoding the original 31-bit "words" from right to left, but your decode is decoding the original 31-bit "words" from left to right
"His name was James Damore."
Sorry, Slashdot removed my mathematical symbols.
We should have a 1st International Longest Slashdot Post competition. Same rules as the Twitter competition, except you have to deal with Slashdot's draconian input stripping
http://www.steike.com/code/useless/zip-file-quine/ ...infinite compression.
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
Ah, so someone has finally determined the absolute breadth of the Twittersphere. If the world ran on tweets, maybe we wouldn't ever need more than 64kB of memory.
Solution - Just tweet the following picture of a swimming fish:
".`.`..`.>"
Given that 1 word is 16 bits, and a picture is equal to 1,000 words, :-)
that makes my above tweet 16,000 bits of information (fitting
several pictures in a tweet may extend this further)
(.)(.)
(.Y.)
d^_^b
48000bits!
Congratulations, he discovered the Chinese Remainder Theorem. While it's a reasonable encoding method, it offers little to no actual compression of the data.
It would make far more sense to first compress the data (LZW for example)and then encode with CRT. That would give you about a 4600 4.5K ZIP file you could send. With typical 85% compression on English language files, that means the resulting output could be about 30K in length.
Life, the Universe, and Everything... in my image.
The contest required a scheme that would work for arbitrary data. Compressing random data with LZW can result in a file that's larger than the input.
Give me Classic Slashdot or give me death!
Long tweet is looooooooooong.
It's not that simple. You can't use the full 31 bits per character, and therefore you can't use base-2 shifting. You can use *almost* 31 bits, but since the first 1114112 characters (the valid Unicode characters) are unavailable for use, you really only get ~30.99925 bits per character.
I eventually gave in and read TFA; they actually describe the winning algorithm...in the contest description. The contest was just to implement it (sort of). And (apparently) no one attempted to use the valid unicode characters as well. They just avoided them (like the contest bloggers) because they weren't sure that there wasn't some arbitrary string of characters that would mess up the message.
I suppose that the contest could continue on that basis alone: how many more bits can you encode by using the printable characters, without choking on arbitrary data? It is more likely that it will vanish in a poof of apathy instead, since no one bothered to do that this time.
"It's kind of in bad web etiquette to ninja that entire post from Ksplice."
/. to post an article in case a site get's slashdotted
Actually AC it's very common on
my karma will be here long after I'm gone