Slashdot Mirror


1.7 Billion Digits Of Pi On CD

H0ek writes "Not that there is any use for this whatsoever, but there is a torrent available for 1.7 billion digits of pi on a CD. The data is everything after the '3.' on one line, bzipped. There are a couple of the Cygwin tools on the disk as well as source for a small search tool (because grep just didn't cut it this time). Inside the ISO there's links to the source of the data, in case you want the rest of the 4.2 billion digits available. Wear your geek badge with pride! Be the first kid on your block to have the entire set!"

2 of 202 comments (clear)

  1. No 3? by SilkBD · · Score: 5, Funny
    The data is everything after the '3.' on one line, bzipped.

    What? They couldn't fit the '3' on the disc???

    --
    00101010
  2. Re:Shouldn't compress well by slamb · · Score: 5, Interesting
    At first, I was thrown off by the idea of compressing something like pi, as it shouldn't compress. The answer is that they're storing ASCII decimal digits, which require less than 4 bits per number, instead of 8. So you should get at least a 50% compression ratio, which would be 850 million bytes. But it's actually 3.something bits of information per byte, so they're able to fit it on a CD. I would be surprised if bzip could do any better than that.

    I had the same thought. To put it in dirt-simple terms, they're only using 10 out of the 256 possible values in every byte, due to the ASCII encoding. This is how bzip2 is able to find any redundancy; pi itself has none.[*]

    So the best compression ratio (just compressed size/uncompressed size, right? so lower is better) is ln(10) / ln(256) = 41.5%. On a 700 MiB CD with no filesystem and nothing but pi, this means 700 * 2^20 / ln(256) * ln(10) = 1.77 billion digits (1767655840, with almost room for one more).

    You'd do better than bzip2 by just using fixed blocks of N bytes to represent M digits. (Larger choices would get you closer to that best ratio; lower choices would less work to decode each block, which might make seeking more practical and reduce memory requirements.) This would be superior to bzip2 in that it'd get somewhat better compression, use a lot less CPU time, and be seekable. You could encode and decode with a one-line Perl script.

    [*] - I suppose you could simply include the algorithm they used to generate the digits...but it'd take a long time to run, negating the whole point of putting pi on a CD.