To decipher the math-speak on that page for the less mathematically inclined, here's my explanation of what a normal number is, geared towards a programmer.
Say you generated a number by randomly picking digits 0-9. After generating 100 digits, you'd expect close to 10 of them to be "7" (1/10). After generating 1000 digits, you'd expect about 100 to be "7" (1/10 again), but you'd expect only about 10 copies of the string "57" (10/1000 = 1/100), since there are 100 possible two-digit strings ("00", "01",..." 99") and there are about 1000 length-2 substrings in a string of 1000 digits (999, to be precise). In general, for such a string of length N, we'd expect about 1/10th of the digits to be "7" and 1/100th = 1/10^2 of the substrings to be "57". If we made N very large we would also expect these estimates to get closer and closer to the truth.
You might get some strange abberations by random number generation. For instance, with astronomical bad luck you might generate 0 each time, and then your estimated fraction of "5"'s would be completely wrong. Still, the above properties are pretty good measures of how "well mixed" the digits of a number are, and they're taken (with mild generalizations) as the defining conditions of a normal number.
Specifically, for a given number x, imagine writing out its (infinitely many) digits in base b. Pick a substring of length m that you're interested in--say an encoding of Shakespeare's complete works in the original Klingon. In the first N digits, we would like to require the fraction of substrings matching our given string to be 1/b^m in analogy with the above (1/10^2 came about from b=10, m=2). That's too much to ask, so instead specify a small tolerance above and below 1/b^m. The key condition for normality is that if we look at the first N digits where N is larger than some number (which depends on the tolerances, the substring we picked, and x itself), the actual fraction of matching substrings will be within our tolerances of 1/b^m. A normal number is one where you can perform this operation in any base, with any substring, and with any tolerances.
If pi were normal, there would have to be at least one (indeed, infinitely many) occurrence of a given encoding of Shakespeare's works, since otherwise for N large enough the number of matching substrings would be near 0, and we could specify our tolerances to be between, say, 1/2 * 1/b^m and 3/2 * 1/b^m, which is strictly greater than the fraction of matches for N large enough since that fraction tends to 0, so it can't be within these bounds.
It's not too surprising that proving the normality of a number is much harder than believing it. Essentially, any number whose decimal digits appear "quite random" feels normal.
Perhaps it is a proper noun which just breaks the typical capitalization rule since it's the transliteration of a lower case letter. That is, capitalizing it would change the meaning of the translation.
He was using a processor with an outdated instruction set. It was missing the "compute next digit of pi" instruction, so he had to cobble together his own.
(Also, supposedly, to determine if PI is actually infinite or whether it contains a repeating pattern after you get to a certain point)
What? There's a mathematical proof that pi is irrational (in fact, transcendental). Specifically, if it were not, -1 would be irrational (in fact, transcendental) thanks to the Lindemann-Weierstrass theorem and the fact that e^(pi*i) = -1. The digits cannot simply start repeating after a while (in particular, they cannot eventually just become 0, as happens with, for instance, 1/2 = 0.5000....
The only practical application I've ever heard of for projects like this is as an integrity check on new supercomputers. They compute the first X digits of pi and then compare it to a known result which someone computed and verified earlier.
On a completely separate note, it's "pi", not "Pi". The Greek letter used is lowercase, and the standard English version is similarly lowercase.
US and EU regulations are somewhat close to what you said. The US regs actually vary their PPM requirement with the water's salinity, though the average is about 10 ppm, which is extremely low. Japan's requirements are (approximately) 100ppm, for instance. The EU regs do only require 90% non-oil (so sand, water, chlorofluorohexane, what-have-you), however the EU "emergency response fleet" is only half a dozen ships and would barely have made a dent in this particular spill.
Note: the preceding paragraph was completely made up.
A hobby is a hobby. Who am I to judge how worthwhile it is? I agree, the results are wildly unsurprising, but so long as the guy enjoyed doing it, so what? I suppose he's made quite a few nerds indignant, though I'm not sure if that's a bad thing or not. A few people have probably found it interesting, and maybe a few were lead to read up on the infinite monkey theorem.
Strictly speaking, even an infinite number of monkeys wouldn't work if they produced particularly non-random text. The output I linked wasn't even close to making words, let alone sentences, scenes, acts, and plays. If they always typed a bunch of s's on each page, for instance, they would never type the complete works. Genetic aberrations--methinks I see a monkey Shakespeare, ho! [It's in iambic pentameter. I'm a nerd.]
...that monkeys are an extremely poor imitation of a random text generator. In Wikipedia's words:
In 2003, scientists at Paignton Zoo and the University of Plymouth, in Devon in England reported that they had left a computer keyboard in the enclosure of six Sulawesi Crested Macaques for a month; not only did the monkeys produce nothing but five pages consisting largely of the letter S, they started by attacking the keyboard with a stone, and continued by urinating and defecating on it.
The original version of this story appeared on September 26th. The Ars Technica piece you linked was published on the same day, the 26th of September, actually later (~3am on/. vs. 12pm on Ars). In fact my link appears in the "Related Links" below the summary on this story, though I have to admit I typically never look there. It would have been nice if the summary had linked to the previous story to prevent all these stupid reposts.
This was discussed to death in the original version of this story. Here's a copy of one of several +5 comments describing the strategy:
This experiment, while fun, isn't exactly the infinite monkey experiment.
What's happening here (if I understand the writeup) is that the monkeys are typing random letter combinations, until they hit a small phrase that happens to be in shakespeare. Then that phrase is marked as done.
Let n be the size in characters of the target phrase. If n=1, then the complete works of shakespeare are obtained as soon as each of the letters of the alphabet have been typed at least once. You could do this in a few seconds on your computer keyboard. If n=2, then the complete works are obtained as soon as all the possible pairs of letters have been typed. The experiment in TFA has n=9 I think.
As n grows larger, the time until completion grows exponentially. Once his expeiment is done, the case n=10 should take roughly 26 times as long (ignoring punctuation capitals and diacritical marks). Alternatively, it would require a cloud roughly 26 times bigger to do it in the same amount of time.
The author knows it's not the regular interpretation. Here's his response to one of my comments:
I found that mathematicians and statisticians had the most adverse reaction to my project. If you have half an infinite resource to give me I would gladly use it and run the project again. I even wrote a brief section on the post saying: I realize there are different interpretations to this saying/theorem and I have done 2 different ones already. I understand the definition of infinite and infinite monkey theorem and I realize that this project does not have infinite resources. This project was funded and written by myself and was not supported by any grant money or federal money. No monkeys were harmed during the making of this code. This project is my attempt to find a creative way to attain an answer without infinite resources. It is a fun side project.
And here's a repost of some of my own calculations concerning the improbability of the real version:
If he had successfully randomly achieved a shakespeare play, [...] It would be like a flying saucer landing and informing someone that they won the galactic lottery.
It's far, far, far, far, far, far, far, far, far, (...), far more improbable than that. The text of Hamlet (see Project Gutenberg [gutenberg.org]) is around 180 KB long, so around 1.44 million bits. Being generous and lopping off half (since most of the characters aren't present), and then rounding down, let's say it's 500,000 bits. There are 2^500,000 possibilities; this is a number with around 150,000 decimal digits. It's comparable to the odds of winning a 1-in-a-million lottery 25 thousand times in a row.
Winning a galactic lottery, in comparison, would be extremely, almost incomparably, frequent. There are something like 300 billion stars in the Milky Way. Suppose each star had 30 planets with 100 billion "people", being very generous. That's only about one million billion billion inhabitants. Winning such a lottery would be the same as winning 4 1-in-a-million lotteries in a row. 4 versus 25,000, and that 25,000 is an exponent--these two can't just be divided to property compare them.
It's closer to winning 6 thousand galactic lotteries in a row.
There was a brilliant comment in the survey thread to the effect of "we should be able to rate down stories". At least that way even if the editors and submitter fail, the community would be able to smother a poor story.
As a mathematician, TFA's argument appeals to me. Types made up of disjoint unions and Cartesian products, type inference (which, incidentally, mathematicians often do implicitly: eg. let f(x) = 1/x; it's implied that x is real and non-zero), pattern matching--these are all things I'm used to and even enjoy. That doesn't sound promising for widespread adoption, though. For all its power, most people hate math, and something this math-y has a serious hill to climb. That the author simply assumed his readers would have any idea what a disjoint union is (see the set theory definition for his usage) is a great example of the article's lack of realistic consideration of who "the masses" are.
If you want a programming language to gain widespread adoption, you really only have one option: allow the programmer to think less than they have had to in the past. Migrations from C to C++ to Java to scripting languages like Python all progressively took care of more and more details for the programmer--organizing objects, hiding pointers, doing automatic garbage collection, and getting rid of static types, to name a few. Does using OCaml allow you to think less than using another modern language? Almost certainly not. Therefore, the language will not gain widespread adoption "for the masses".
That said, some higher-end coders could probably benefit from using functional languages more often. They're certainly beautiful and powerful in the right hands (while in the wrong hands they're just confusing). Perhaps that's all the author really meant to advocate--that competent people check to see if a functional language they might otherwise ignore is a good fit for their problem.
Re:After VBA anything seems like deliverance
on
OCaml For the Masses
·
· Score: 1
Wow, your reading comprehension is poor. He had learned OCaml in grad school, not at work: "Why OCaml? I had learned it in grad school and fell in love with the language then." He also never says he wrote a line of VBA. Perhaps he did, though he doesn't say so. He does call himself part of the scrapped Java rewrite, for what that's worth.
Re:haskell for the masses? sure, but only...
on
OCaml For the Masses
·
· Score: 1
What functional language makes everything immutable always? Typically, immutability is the default behavior, but not the only behavior.
To follow up on my own comment about the success of these challenges, I read the Wikipedia page on And Tango Makes Three, #1 on this year's list and also #1 for 5 of the last 6 years. It's based on a true story where two male penguins formed a couple and were given an egg to raise.
To summarize the list of challenges on the linked page, which is hopefully representative of the challenges that went particularly far, there were...
3 failed requests to restrict the book
2 failed removal requests
1 successful request to move it to non-fiction
1 successful removal, oddly based on no requests; the removal was reviewed and at least temporarily reversed, though I didn't find the ultimate outcome
This is the mildest form of censorship I can think of. I imagine most school districts wouldn't bother to go to court over this book. It's good that this list is kept and some organizations work to keep controversial books around, but until some real censorship takes place it's not really news.
A challenge is defined as a formal, written complaint, filed with a library or school requesting that materials be removed because of content or appropriateness.
It doesn't say anything about how successful these challenges are.
To decipher the math-speak on that page for the less mathematically inclined, here's my explanation of what a normal number is, geared towards a programmer.
Say you generated a number by randomly picking digits 0-9. After generating 100 digits, you'd expect close to 10 of them to be "7" (1/10). After generating 1000 digits, you'd expect about 100 to be "7" (1/10 again), but you'd expect only about 10 copies of the string "57" (10/1000 = 1/100), since there are 100 possible two-digit strings ("00", "01", ..." 99") and there are about 1000 length-2 substrings in a string of 1000 digits (999, to be precise). In general, for such a string of length N, we'd expect about 1/10th of the digits to be "7" and 1/100th = 1/10^2 of the substrings to be "57". If we made N very large we would also expect these estimates to get closer and closer to the truth.
You might get some strange abberations by random number generation. For instance, with astronomical bad luck you might generate 0 each time, and then your estimated fraction of "5"'s would be completely wrong. Still, the above properties are pretty good measures of how "well mixed" the digits of a number are, and they're taken (with mild generalizations) as the defining conditions of a normal number.
Specifically, for a given number x, imagine writing out its (infinitely many) digits in base b. Pick a substring of length m that you're interested in--say an encoding of Shakespeare's complete works in the original Klingon. In the first N digits, we would like to require the fraction of substrings matching our given string to be 1/b^m in analogy with the above (1/10^2 came about from b=10, m=2). That's too much to ask, so instead specify a small tolerance above and below 1/b^m. The key condition for normality is that if we look at the first N digits where N is larger than some number (which depends on the tolerances, the substring we picked, and x itself), the actual fraction of matching substrings will be within our tolerances of 1/b^m. A normal number is one where you can perform this operation in any base, with any substring, and with any tolerances.
If pi were normal, there would have to be at least one (indeed, infinitely many) occurrence of a given encoding of Shakespeare's works, since otherwise for N large enough the number of matching substrings would be near 0, and we could specify our tolerances to be between, say, 1/2 * 1/b^m and 3/2 * 1/b^m, which is strictly greater than the fraction of matches for N large enough since that fraction tends to 0, so it can't be within these bounds.
It's not too surprising that proving the normality of a number is much harder than believing it. Essentially, any number whose decimal digits appear "quite random" feels normal.
That only works in Haskell, which nobody uses.
Do you mean "phi" instead of "fi", that is, the golden ratio (which is also for some reason not a proper noun)?
Perhaps it is a proper noun which just breaks the typical capitalization rule since it's the transliteration of a lower case letter. That is, capitalizing it would change the meaning of the translation.
He was using a processor with an outdated instruction set. It was missing the "compute next digit of pi" instruction, so he had to cobble together his own.
Thank you for wasting the earths resources (electricity, etc..) to make the world a better place!
Aren't you wasting those resources by reading such a story in the first place?
(Also, supposedly, to determine if PI is actually infinite or whether it contains a repeating pattern after you get to a certain point)
What? There's a mathematical proof that pi is irrational (in fact, transcendental). Specifically, if it were not, -1 would be irrational (in fact, transcendental) thanks to the Lindemann-Weierstrass theorem and the fact that e^(pi*i) = -1. The digits cannot simply start repeating after a while (in particular, they cannot eventually just become 0, as happens with, for instance, 1/2 = 0.5000... .
The only practical application I've ever heard of for projects like this is as an integrity check on new supercomputers. They compute the first X digits of pi and then compare it to a known result which someone computed and verified earlier.
On a completely separate note, it's "pi", not "Pi". The Greek letter used is lowercase, and the standard English version is similarly lowercase.
Easy: quantum tunneling.
Citations please?
US and EU regulations are somewhat close to what you said. The US regs actually vary their PPM requirement with the water's salinity, though the average is about 10 ppm, which is extremely low. Japan's requirements are (approximately) 100ppm, for instance. The EU regs do only require 90% non-oil (so sand, water, chlorofluorohexane, what-have-you), however the EU "emergency response fleet" is only half a dozen ships and would barely have made a dent in this particular spill.
Note: the preceding paragraph was completely made up.
A hobby is a hobby. Who am I to judge how worthwhile it is? I agree, the results are wildly unsurprising, but so long as the guy enjoyed doing it, so what? I suppose he's made quite a few nerds indignant, though I'm not sure if that's a bad thing or not. A few people have probably found it interesting, and maybe a few were lead to read up on the infinite monkey theorem.
Strictly speaking, even an infinite number of monkeys wouldn't work if they produced particularly non-random text. The output I linked wasn't even close to making words, let alone sentences, scenes, acts, and plays. If they always typed a bunch of s's on each page, for instance, they would never type the complete works. Genetic aberrations--methinks I see a monkey Shakespeare, ho! [It's in iambic pentameter. I'm a nerd.]
In 2003, scientists at Paignton Zoo and the University of Plymouth, in Devon in England reported that they had left a computer keyboard in the enclosure of six Sulawesi Crested Macaques for a month; not only did the monkeys produce nothing but five pages consisting largely of the letter S, they started by attacking the keyboard with a stone, and continued by urinating and defecating on it.
(source)
Here's their output and a little more info/some pictures.
The original version of this story appeared on September 26th. The Ars Technica piece you linked was published on the same day, the 26th of September, actually later (~3am on /. vs. 12pm on Ars). In fact my link appears in the "Related Links" below the summary on this story, though I have to admit I typically never look there. It would have been nice if the summary had linked to the previous story to prevent all these stupid reposts.
This experiment, while fun, isn't exactly the infinite monkey experiment.
What's happening here (if I understand the writeup) is that the monkeys are typing random letter combinations, until they hit a small phrase that happens to be in shakespeare. Then that phrase is marked as done.
Let n be the size in characters of the target phrase. If n=1, then the complete works of shakespeare are obtained as soon as each of the letters of the alphabet have been typed at least once. You could do this in a few seconds on your computer keyboard. If n=2, then the complete works are obtained as soon as all the possible pairs of letters have been typed. The experiment in TFA has n=9 I think.
As n grows larger, the time until completion grows exponentially. Once his expeiment is done, the case n=10 should take roughly 26 times as long (ignoring punctuation capitals and diacritical marks). Alternatively, it would require a cloud roughly 26 times bigger to do it in the same amount of time.
(source; taken from martin-boundary)
The author knows it's not the regular interpretation. Here's his response to one of my comments:
I found that mathematicians and statisticians had the most adverse reaction to my project. If you have half an infinite resource to give me I would gladly use it and run the project again. I even wrote a brief section on the post saying: I realize there are different interpretations to this saying/theorem and I have done 2 different ones already. I understand the definition of infinite and infinite monkey theorem and I realize that this project does not have infinite resources. This project was funded and written by myself and was not supported by any grant money or federal money. No monkeys were harmed during the making of this code. This project is my attempt to find a creative way to attain an answer without infinite resources. It is a fun side project.
(source; taken from eljefe6a)
And here's a repost of some of my own calculations concerning the improbability of the real version:
If he had successfully randomly achieved a shakespeare play, [...] It would be like a flying saucer landing and informing someone that they won the galactic lottery.
It's far, far, far, far, far, far, far, far, far, (...), far more improbable than that. The text of Hamlet (see Project Gutenberg [gutenberg.org]) is around 180 KB long, so around 1.44 million bits. Being generous and lopping off half (since most of the characters aren't present), and then rounding down, let's say it's 500,000 bits. There are 2^500,000 possibilities; this is a number with around 150,000 decimal digits. It's comparable to the odds of winning a 1-in-a-million lottery 25 thousand times in a row.
Winning a galactic lottery, in comparison, would be extremely, almost incomparably, frequent. There are something like 300 billion stars in the Milky Way. Suppose each star had 30 planets with 100 billion "people", being very generous. That's only about one million billion billion inhabitants. Winning such a lottery would be the same as winning 4 1-in-a-million lotteries in a row. 4 versus 25,000, and that 25,000 is an exponent--these two can't just be divided to property compare them.
It's closer to winning 6 thousand galactic lotteries in a row.
(source; taken from me)
There was a brilliant comment in the survey thread to the effect of "we should be able to rate down stories". At least that way even if the editors and submitter fail, the community would be able to smother a poor story.
Have you done both--that is, develop both a .NET and C++/Qt (mostly) Windows app? I haven't, which is why I'm curious.
N00b writes, "...".
N00b is a first-time accepted submitter.
Stories should be demote-able
I strongly agree.
As a mathematician, TFA's argument appeals to me. Types made up of disjoint unions and Cartesian products, type inference (which, incidentally, mathematicians often do implicitly: eg. let f(x) = 1/x; it's implied that x is real and non-zero), pattern matching--these are all things I'm used to and even enjoy. That doesn't sound promising for widespread adoption, though. For all its power, most people hate math, and something this math-y has a serious hill to climb. That the author simply assumed his readers would have any idea what a disjoint union is (see the set theory definition for his usage) is a great example of the article's lack of realistic consideration of who "the masses" are.
If you want a programming language to gain widespread adoption, you really only have one option: allow the programmer to think less than they have had to in the past. Migrations from C to C++ to Java to scripting languages like Python all progressively took care of more and more details for the programmer--organizing objects, hiding pointers, doing automatic garbage collection, and getting rid of static types, to name a few. Does using OCaml allow you to think less than using another modern language? Almost certainly not. Therefore, the language will not gain widespread adoption "for the masses".
That said, some higher-end coders could probably benefit from using functional languages more often. They're certainly beautiful and powerful in the right hands (while in the wrong hands they're just confusing). Perhaps that's all the author really meant to advocate--that competent people check to see if a functional language they might otherwise ignore is a good fit for their problem.
Wow, your reading comprehension is poor. He had learned OCaml in grad school, not at work: "Why OCaml? I had learned it in grad school and fell in love with the language then." He also never says he wrote a line of VBA. Perhaps he did, though he doesn't say so. He does call himself part of the scrapped Java rewrite, for what that's worth.
What functional language makes everything immutable always? Typically, immutability is the default behavior, but not the only behavior.
To follow up on my own comment about the success of these challenges, I read the Wikipedia page on And Tango Makes Three, #1 on this year's list and also #1 for 5 of the last 6 years. It's based on a true story where two male penguins formed a couple and were given an egg to raise.
To summarize the list of challenges on the linked page, which is hopefully representative of the challenges that went particularly far, there were...
3 failed requests to restrict the book
2 failed removal requests
1 successful request to move it to non-fiction
1 successful removal, oddly based on no requests; the removal was reviewed and at least temporarily reversed, though I didn't find the ultimate outcome
This is the mildest form of censorship I can think of. I imagine most school districts wouldn't bother to go to court over this book. It's good that this list is kept and some organizations work to keep controversial books around, but until some real censorship takes place it's not really news.
A challenge is defined as a formal, written complaint, filed with a library or school requesting that materials be removed because of content or appropriateness.
It doesn't say anything about how successful these challenges are.
Thanks for the link. It's too bad the comments aren't displayed.