1.7 Billion Digits Of Pi On CD
H0ek writes "Not that there is any use for this whatsoever, but there is a torrent available for 1.7 billion digits of pi on a CD. The data is everything after the '3.' on one line, bzipped. There are a couple of the Cygwin tools on the disk as well as source for a small search tool (because grep just didn't cut it this time). Inside the ISO there's links to the source of the data, in case you want the rest of the 4.2 billion digits available. Wear your geek badge with pride! Be the first kid on your block to have the entire set!"
What? They couldn't fit the '3' on the disc???
00101010
I had the same thought. To put it in dirt-simple terms, they're only using 10 out of the 256 possible values in every byte, due to the ASCII encoding. This is how bzip2 is able to find any redundancy; pi itself has none.[*]
So the best compression ratio (just compressed size/uncompressed size, right? so lower is better) is ln(10) / ln(256) = 41.5%. On a 700 MiB CD with no filesystem and nothing but pi, this means 700 * 2^20 / ln(256) * ln(10) = 1.77 billion digits (1767655840, with almost room for one more).
You'd do better than bzip2 by just using fixed blocks of N bytes to represent M digits. (Larger choices would get you closer to that best ratio; lower choices would less work to decode each block, which might make seeking more practical and reduce memory requirements.) This would be superior to bzip2 in that it'd get somewhat better compression, use a lot less CPU time, and be seekable. You could encode and decode with a one-line Perl script.
[*] - I suppose you could simply include the algorithm they used to generate the digits...but it'd take a long time to run, negating the whole point of putting pi on a CD.