A Few Million Virtual Monkeys Randomly Recreate Shakespeare
First time accepted submitter eljefe6a writes "On September 23 at 2:30 PST the A Million Amazonian Monkeys project successfully recreated A Lover's Complaint. This is the first time a work of Shakespeare has actually been randomly reproduced. It is one small step for a monkey, one giant leap for virtual primates everywhere. From the article: 'For this project, I used Hadoop, Amazon EC2, and Ubuntu Linux. Since I don’t have real monkeys, I have to create fake Amazonian Map Monkeys. The Map Monkeys create random data in ASCII between a and z. It uses Sean Luke’s Mersenne Twister to make sure I have fast, random, well behaved monkeys. Once the monkey’s output is mapped, it is passed to the reducer which runs the characters through a Bloom Field membership test. If the monkey output passes the membership test, the Shakespearean works are checked using a string comparison. If that passes, a genius monkey has written 9 characters of Shakespeare. The source material is all of Shakespeare’s works as taken from Project Gutenberg.'"
I wish I'd thought of it - and what a neat way to go about it.
I guess he had to use virtual monkeys because all real monkeys have progressed to randomly downloading things from bit-torrent.
...and wouldn't it be easier to let them evolve and then one of them can BE Shakespeare 2.0?
These posts express my own personal views, not those of my employer
I'm virtually impressed, virtually speechless even! The man is a virtual genius.
These posts express my own personal views, not those of my employer
He's working with a very loose interpretation of the thought experiment here. Also he's apparently letting these monkeys get away with multiple character overlap to successively build the text.
There was a comic strip/sketch where scientists have a roomful of monkeys and typewriters, and their latest "Work" is clutched in a researcher's hand. As they go through, it's page after page of perfect Shakespeare, and they're going through with great excitement until they get to the very last page, they look in disappointment as it degenerates into "Ook eek ook". Anyone remember it?
I always thought the idea was that the characters would be produced sequentially throughout the entire play, not just every word produced independently. Much less credit.
If i'm understanding this, this isn't as cool as it seems. It seems like his 'monkeys' are just randomly creating words, and he matches those words against any word used in Shakespeare. If he gets a match, he marks that one as done. So, as some point one monkey made the word "be" and all of a sudden green lights all over the place.
I think the original saying was how random and unique it would be for a solid set of strings to randomly create a whole piece of work _in one go_ . Not a word here, a word there, OMG 100% of Shakespeare words have been randomly created.
-Malakai
A Dragon Lives in my Garage
Does this mean we can also create the next US President using Amazon? I mean, looking at the choices so far, how hard can that be?
Do your own thing. And overdo it!
So the virtual monkeys are recreating a subset of the work of Shakespeare not an entire work. And the Hadoop instance is splicing them together?
"It was the best of times, it was the BLURST of times! Stupid monkeys!" {strikes them with script...}
srand (time(NULL));
while (1)
if (rand()==1234)
puts("OMGOOSES!");
Kinda a waste of CPU cycles...
Better known as 318230.
What a colossal waste of energy and computing resources.
this only shows that the output string of a generator contains a certain string... since the output of the generator is fixed, there are strings, literary works, that is, that can not arise, while others can...
this should be done with "true" randomness generated by processes in labs, or some HQ solid state noise generator...
thoughts?
and that he missed the point of the expression?
Of course it will work the Mersenne twister will eventually cover the entire 9 letter space and then he can search though for the parts that match (yes he is doing it concurrently but that’s just an inefficient way of doing it). If he had the RAM and time he could eventually recreate every book possible.
The Wikipedia page explains it better that infinite random sting is bound to contain something that is perceived as useful. Of course the literal take on on the expression is the most funny.
What's happening here (if I understand the writeup) is that the monkeys are typing random letter combinations, until they hit a small phrase that happens to be in shakespeare. Then that phrase is marked as done.
Let n be the size in characters of the target phrase. If n=1, then the complete works of shakespeare are obtained as soon as each of the letters of the alphabet have been typed at least once. You could do this in a few seconds on your computer keyboard. If n=2, then the complete works are obtained as soon as all the possible pairs of letters have been typed. The experiment in TFA has n=9 I think.
As n grows larger, the time until completion grows exponentially. Once his expeiment is done, the case n=10 should take roughly 26 times as long (ignoring punctuation capitals and diacritical marks). Alternatively, it would require a cloud roughly 26 times bigger to do it in the same amount of time.
I did think of it. I even registered a domain (see my URL and e-mail address). Planned on making a screensaver that would randomly generate stuff, and convince people to run it, ala SETI@Home. Then college happened, then graduate school happened, then marriage happened, then baby happened... And then (once again), I read on SlashDot that someone else has done one of my ideas again and made the front page.
But then again, literally as I'm reading this, my daughter is singing the Blue's Clues theme song next to me while my wife and I get ready to queue up for our nightly game of League of Legends... Sitting in the downstairs den/office that's full of years of gamer stuff that all represents the happy memories of those several years of college. That guy can have my monkeys. Good for him. I found something better. :)
Maxim: People cannot follow directions.
Increases in truth directly with the length of time spent explaining them
As a programmer of several "stupid computer tricks" myself (like a filesystem driver for mounting IRC!), I am very appreciative for the fast computers that let us simulate very complex systems very quickly. I understand that it is my responsibility, as a software engineer, to use that speed and memory efficiently to optimize the results of the simulation.
This project has generated better illustrative proof than ever before that randomness will eventually produce everything. This is often a difficult concept for non-mathematical people to accept, so a nice example is always welcome among those who seek to educate. It is also worth noting that this project is running on Hadoop, which is not yet considered stable. While monkeys type Shakespeare, they also find bugs, stress-test releases, and educate at least one programmer. After such a test, Hadoop is much more favorable as a platform for more "real computing work" projects, like processing medical records looking for previously-unknown medication side effects.
While on the subject of "real computing work", please note that all nontrivial computation is done by software, and that all software can run on a Turing machine as designed in 1937. Those hardware engineers are doing real electrical engineering work, making circuits run with less power and smaller size. Those chemical engineers are doing real chemistry work, making semiconductors that can switch faster and at lower voltage. The software engineers are doing real computing work, finding fast algorithms and optimizing processes.
You do not have a moral or legal right to do absolutely anything you want.
...using a random generator like Mersenne twister wouldn't work...
Now, for the mathematicians in the room: What is the probability that the particular string of length 11621 that corresponds to a particular work of Shakespeare is one of the ~2^20000 possible sequences from the Mersenne twister?
You do not have a moral or legal right to do absolutely anything you want.
I think that the goal is that one of the many monkeys types an entire work of Shakespeare, not that many monkeys each type a very small segment of Shakespeare mixed in with gibberish, and then the many very small segments of Shakespeare are cut from the surrounding gibberish and combined by a person of intelligence into a work of Shakespeare.
what is the point? this is so stupid. you can recreate the entire internet given a big enough computer power but how cool is that? not cool, just a big waste of computer power.
I think it started about the time they introduced the floppy dik drive.
Certainly this story must interest some people. To you, I ask this question: what makes this story interesting? To me it's a waste of energy that doesn't produce anything unexpected or particularly interesting. Compared to this, the Minecraft Enterprise-D is useful--it's at least interesting.
(Note: I am a mathematician, so maybe I'm missing some of the novelty associated with random number generation and exponential growth.)
I am the submitter, they messed up my username in the article. Anyway it is a Bloom Filter. IMHO, it is one of the things that sets the project apart technically. The speed gains from using the Bloom Field filter membership test were significant. That is the part of the program that is used the most. http://en.wikipedia.org/wiki/Bloom_filter
This project has generated better illustrative proof than ever before that randomness will eventually produce everything.
This project proves no such thing. It has shown only that randomness can reproduce (duplicate) something that already existed. This project can never reproduce War and Peace in the original Russian, as the Cyrillic alphabet is not included. It demonstrates effectively that some people will see what they want to see.
The US government have made it clear that we have no inalienable rights; any we do not defend vigorously will be taken.
If anyone wants to see the status of the other works of Shakespeare, you can view them here (http://www.jesse-anderson.com/2011/08/a-few-more-million-amazonian-monkeys/).
Slightly off topic, but thought someone might find this interesting.
In regard to the whole monkey/typewriter thing, even if you had infinite monkey/typewriter/time resources it would still never have any useful output.
Suppose you run the experiment long enough that you are guaranteed that one monkey on a typewriter has written romeo and juliet.
How do you find that monkey and his document?
Well you have to look through every document and find the one that matches romeo and juliet.
But you either initially require a copy of romeo and juliet to match with (in which case you already have the dam document)
Or worse, you need a human being who is capable of detecting such a document without seeing it beforehand, shakespear!, and in that case he might as well just write it himself.
Ah HA, Shakespeare used spaces, too.
Play Command HQ online
I also doesn't reproduce Shakespeare in its original Klingon...
This project has generated better illustrative proof than ever before that randomness will eventually produce everything. This is often a difficult concept for non-mathematical people to accept, so a nice example is always welcome among those who seek to educate.
Here's a simpler example:
while(1)
{
int x = rand() % 10;
if (x==666) printf("Yes, everything!\n");
}
I don't care if it's 90,000 hectares. That lake was not my doing.
Here's a simpler example:
while(1)
{
int x = rand() % 10;
if (x==666) printf("Yes, everything!\n");
}
That code will neither terminate nor print anything.
And that's the way I like it.
That could be used as evidence that there is a benevolent God.
The US government have made it clear that we have no inalienable rights; any we do not defend vigorously will be taken.
Exactly. Breaking down the problem of "randomly finding thousands of characters in the right order" to "randomly finding 9 characters in the right order" is bullshit, because this requires information about the order of all the 9-character-blobs you find.
In other news: I compressed a Gigabyte down to 2 bits. You just have to know the order of the bits!
Stupid article. Stupid submitter. Stupid waste of energy. That's the 21st century for you. Idiocracy at its best.
You could prove that for the length of a work of Shakespeare (N), the amount of "monkeys" required to solve the problem in the same amount of time is 26^(N-9). Or, as it relates to the proverb, the solution to the equation has the time required to create a work of Shakespeare as infinite and the number of monkeys required to solve it in that time as infinite.
Of course, that solution didn't require programming the monkeys. But it is extrapolatable out to an entire work.
The ______ Agenda
Randomness will produce everything indeed. But this experiment is not random. The monkeys are not *producing* the work of Shakespeare. They are *reproducing* it. The master program already know the work, and has it programmed in its tests. There is a big filter here: take this random bit, and decide whether it is "part of Shakespeare's work." Not quite the same as letting the monkeys type a full page, and then have readers decided whether this is "as good as Shakespeare." Prior knowledge killed Schrödinger's Cat!
This experiment, while fun, isn't exactly the infinite monkey experiment.
Of course it is not. It is impossible to simulate an infinite amount of monkeys working for an infinite amount of time. Some concession has to be made to the fact that we have a finite amount of computing power.
can they provide the RNG and seed value that produces the poem? thought not
bite my glorious golden ass.
And now question is, given a finite amount of time, is it likely that a team of virtual monkeys sitting at virtual typewriters come up with working keys for the programs that I download before the downloads are finished?
Yes! I could reproduce all of the works of Shakespeare in nearly zero time on a single computer. All I would need to do is reduce the comparison from 9 characters to 1 bit. A random bit generator could randomly reproduce a bit. I would then compare that one bit to a Shakespeare work and save it if it matched. Heck, we could 'optimize' this and just flip the bit if it didn't match, being that it's the only other possible permutation. Bam, O(n). My point is, as I believe others have noted, the saying is referring to the comparison of a single monkey's output with the only validation being that the monkey reproduced said Shakespeare work in it's entirety. Validating a subset of the work is essentially the same as the validator writing the work, not the monkey.
I double-dog dare this guy to do this with the latest Harry Potter and try to sell the result. It would make for an interesting court case ("Your Honor, my client simply carved his own novel out of a mound of gibberish").
Sent from the iPad I found in your car.
"This project has generated better illustrative proof than ever before that randomness will eventually produce everything."
I don't think you understand what this means, randomness in an ideal world will produce anything. Randomness in the real world will not. There are all sorts of gotcha's that make such a statement meaningless outside someones imagination.
Well, what about a Shakespeare@Home client for that? ;-)
The Tao of math: The numbers you can count are not the real numbers.
Not really realistic, considering that a real monkey would get p-d off after a few minutes and start throwing feces at the page, and eating the pages of other monkeys. From what I saw, his virtual monkeys don't do that.
http://en.wikipedia.org/wiki/Infinite_monkey_theorem#Direct_proof
Probabilities
Even if the observable universe were filled with monkeys the size of atoms typing from now until the heat death of the universe, their total probability to produce a single instance of Hamlet would still be many orders of magnitude less than one in 10^183,800.
Then they all went back to writing Perl
Table-ized A.I.
Was a five-minute ten-monkey job.
It is the universe that makes fun of us all.
Until it can analyse the random output and report artworks created. It is nothing more than a password cracker looking for the password Shakespeare created as a really long password..
All donate an infinite amount of cash, you can build an infinite computing platform to run the infinite monkey experiment!
- http://www.milkme.co.uk
Considering we have rational reasons showing how this is retarded- in comments right under the article, I'de say we're doing okay. It sucks that it gets buried sometimes but that's what mod points are for. Ideally the editors would update the story to say that after consideration, what this person is doing isn't as interesting as we thought. But that's putting judgments into an article and that's not good reporting.
The original context had nothing to do with infinity, and it is often quoted as a 'fact' about infinity by people who understand neither the original context nor what infinity is.
The idea was to give you some idea about the probabilities involved in thermodynamics. The probabilities involved in statistical thermodynamics mean that at a macroscopic level, certain things generally regarded as contravening the laws of physics are possible, although extremely unlikely. For example some fluid could divide itself into its constituent elements, or into a warm region and a cool region. To deal with the conceptual problems caused by believing in this, the monkeys were given as an example of an extremely unlikely event. If you sit a single monkey at a typewriter and get it to type randomly, then there is a positive probability that it will type out the complete works of Shakespeare without errors. This probability is very small. However if you give an example of a thermodynamic event at a macro level which contradicts the 'laws' of thermodynamics (which were held to be laws before the statistical approach existed), the positive probability of this will be MUCH smaller. The original paper gave some simple example which I forget, something like a small amount of water at uniform temperature, and the chance of a temperature difference developing which it was possible to detect.
I guess the idea of the monkeys typing is so compelling that this 'thought experiment' has been shared extremely widely outside of the context of thermodynamics. However infinity does not have anything to do with it really. A very large, but finite, number of monkeys would be enough that one of them would type out the complete works of Shakespeare, on the first attempt, without errors, at least with a probability negligibly close to 1.
He kept a million virtual monkeys gainfully employed for X amount of time. The job field is hard out there for virtual monkeys, too, you know. Bunch of anti-virtual monkey people, you are! Hmmmph!
Vote monkeys into Congress. They are cheaper and more trustworthy.
It can't be so difficult to get it.
Experience says that once there were a bunch of monkeys and in a few millions years all of Shakespeare works were written. In order!
This sort of concession misses the point. The "infinite monkey theorem" is about how wildly unlikely things are not the same as impossible things. Therefore you cannot discount the possibility that a thing happened or will happen just because it is very improbable to happen, if it was or is going to be subjected to an arbitrarily high number of "chances" to happen.
This experiment breaks it down to brute-forcing a poor password, billions of times, instead of brute forcing a friggin' insane password, which is a substantial difference.
It seems like it could be useful as a thought experiment in a probability problem set or test, or maybe in a programming course (with n suitably reduced to run on a student's machine in reasonable time).
if it was written in 7 bit ascii it could eventually output binary data which re-encoded into 8 bit would contain cyrillic characters.
otherwise even in 8 bit with an infinite amount of time there would eventually appear bit errors throwing the data off into characters outside the listed charset.
i spent five minutes thinking and all i got was this crappy sig
I did something relatively like this (albeit smaller scale) when I was a 8th grader in the early 80's, learning BASIC.
One of the first programs I actually wrote myself instead of laboriously copying out of a book or a magazine was "Monkeys" - this was a program that randomly generated letters and added them to the right end of a string, checked the string against a target value, and if it didn't fit, deleted the leftmost character to the string.
If a$="tobeornottobethatisthequestion" then b$="The monkeys have succeeded in writing Shakespeare!"
This was on an Apple II, but it ran every time the computer wasn't being used, probably months over the course of a year. My monkeys never succeeded. Stupid monkeys.
-Styopa
But that defeats the point. Why 9-character segments? Why not 1-character segments? Then, when each letter has been generated once by your random number generator, you say 'done' and move on. The point of the gedankenexperiment is to show that a true random number generator will eventually produce any sequence, irrespective of whether you ascribe some meaning to that phrase or not. For example, it is just as probable that a monkey would type 'the original submitter is an idiot who misses the point of probability' as it is that they would type 'mfdag gfnaif pwrg kflgsq hmthwrhdga adsfjn fadfm asdfned qemangasd asv'. They are both 70-character strings of lowercase ASCII characters and spaces, and if you have a random number generator set up to produce these with no bias towards letter frequencies then either combination is equally probable. This 'experiment' added an extra step of determinism, which means that it is not an unbiased random number generator, it's a very badly designed program for generating a Shakespeare play.
I am TheRaven on Soylent News
9 character chunks? Should I spend a 1/2 hour writing a monkey class that emits 1 character chunks and simply monitor it for "Much Ado About Nothing"? I bet it could reproduce the work in less than a second with a single monkey.
9 characters of any particular Shakespearean work isn't slashworthy...
Now if he had a monkey emit the entire work that would be interesting in an examination of how long it took and how it occurred.
Loading...
The rebuttal is that people making that first argument don't understand the replication part of natural selection. Evolution doesn't say atoms randomly come together to form each person. First they formed useful proteins, and those genes got replicated. Repeat and add one level of complexity each time, keep repeating 4 billion years... and you finally make complex organisms.
Back to the analogy of monkeys typing, the idea is once monkeys bash out a useful combination, a word, consider that word created (gene useful) and will replicate. Turns out, if you apply the analogy right, monkeys can bash out shakepeare pretty fast, so the monkey analogy is a bad argument against evolution.
I hear there's another Stephanie Meyer book coming out sometime.
i spent five minutes thinking and all i got was this crappy sig
yes yes they would, they'd rip the page out of the typewriter, run to the front of the class and yell "miss, miss, i'm finished! can i have a banana now" at which point the teacher would scold them for for their terrible spelling. you then laugh at them and call them names for their failure. After a short review the teacher would realise that the monkey was simply dyslexic and that English is probably not the best subject for them. The monkey would drop his dreams of becoming a writer to become an expert in ballistics. then when you venture off into space living your dreams as an astronaut(cause you're oh so smart) you lose communication with earth and crash on some strange planet where apes are the dominant species. you then run into said monkey and realise you are actually on earth and the monkeys have taken over. had you not bullied him about his dyslexia he wouldnt have overthrown humans. thanks a lot asshole. (written randomly by 3rd in command monkey jgadfsasd of the planet apeville)
i spent five minutes thinking and all i got was this crappy sig
i always wondered what japer fforde was on about but never bothered looking into it. and i still haven't
i spent five minutes thinking and all i got was this crappy sig
first a question. what is a Bloom Field membership test? can't google that.
Second, I think this test is cooked. One virtual monkey did not write this uncoached. No way. it's mathematically impossible.
I think what he did was just take 9 character pieces from different monkeys.
Some drink at the fountain of knowledge. Others just gargle.
Hollywood's newest screenwriter!
One of our competitors trademarked the term "hypothesis". From now on, we will call them "boneheaded ideas".
Seems like the coder used genetic/evolutionary programming to achieve the "feat" However, it is not truly random since you are comparing end result with an absolute answer, and not some mathematical approximation. In simple words, the user wanted Shakespeare, so he hardwired the code to get Shakespeare. I don't mean comparing the end result to Shakespeare. The program effective weeds out non-Shakespeare out of the result pool at every iteration. Not random. Here is an example demonstrating how anyone could do it. http://www.generation5.org/content/2003/gahelloworld.asp
I thought the point of the original monkey idea was that they create Shakespeare *without* access to the original. This is like other experiments were genetic algorithms are used to recreate something that exists by comparing the output to the existing thing. That's just glorified curve fitting.
The real goal is to design something new that would not necessarily have been arrived at through traditional design methodology. People have "evolved" FPGA circuits by generating random bit configuration files, testing the functionality, and crossbreeding the best circuits. They don't compare it to an existing circuit. Sometimes the circuits don't even use clocks, and no one can quite explain how they do what they do. Some stop working when loaded onto a supposedly identical FPGA, as if the circuit is taking advantage of some unique variance of that particular FPGA. Another company designed an antenna with a genetic algorithm, and the end result looks like nothing you'd come up with using traditional antenna design techniques.
Is there proof you did this earlier? Did you file a patent on your Infinite Monkey Generator? If so, you can sue his sorry ass goodbye and MAKE MONEY FAST!!!!1!11
given enough time, monkeys DID write Shakespeare's works - they just had to evolve to the point that one of them, named William Shakespeare, decided that writing plays was a good thing for him to do and so he did and here we are.
---- You are fully entitled to my opinion.
Then they invented object-oriented programming, and started thinking it was a good idea to do things like call subroutines just to grab existing values out of memory. Thus sending the hardware engineers back to trying to make circuits faster.
Brett
The posting is kind but a little misleading: I didn't invent Mersenne Twister. I just wrote the fastest Java implementation, built on top of early code by Michael Lecuyer. It's fairly widely used, I suppose. Mersenne Twister proper (MT19937) was the creation of Makoto Matsumoto and Takuji Nishimura. Feel free to ask any questions. (I can't say anything to the Shakespeare project, which sounds fun).
This was a complete pointless waste of time. What on Earth bruteforcing a 9 character quote is good for, or why would it be newsworthy?
Call me when the monkeys create a work of Shakespeare that he could have written if he lived another year ..
Hey don't blame me, IANAB
Sorry, I missed the the correct moderation. How can I miss the correct moderation? I need Coke (The beverage, just in case... )
That said, my comment should fix the screw up.
(Comment written by an open collaboration of 24,342,749 monkeys)
"Science can amuse and fascinate us all, but it is engineering that changes the world. " - Asimov.
Here's a simpler example:
while(1) {
int x = rand() % 10;
if (x==666) printf("Yes, everything!\n"); }
You do know that x will never be greater or equal to 10, right? Thus x can never equal 666. FAIL.
*whoosh*
But no!
Of course the reason that randomly typing text until you have re-created shakespeare is so difficult is because of the greater improbability of getting strings of characters in the correct order as the length of that string increases.
To only consider 9-character strings, and purposefully search for those random strings in a work until you find one, eventually finding them all, is a drastic and cheat-ful short cut.
I cry foul.
Object-oriented programming, when done correctly, makes finding fast algorithms easier. Object-oriented programs, when compiled correctly, are just as efficient as imperative programs, because the instructions relating to objects (such as extra jumps for simple accessors) have been optimized away by the compiler.
Of course, this is only when applied by competent programmers. OOP makes finding algorithms easier because it's more intuitive, but that also means that less skilled programmers can find inefficient algorithms more easily, as well. Then management approves those bad algorithms, and users get saddled with slow programs again.
You do not have a moral or legal right to do absolutely anything you want.
Software people are happy to use more resources to do the same work.
I suggest you look into demoscene. Given some particularly small limit (such as 64kB persistent storage), they make real-time rendered videos. Efficient use of resources means more detail in the final video, and that's a major goal.
I certainly can't deny that there are some programmers who will happily waste resources on bad algorithms. Such cases are well-documented and laughed at by those who aim to be better.
You do not have a moral or legal right to do absolutely anything you want.
I absolutely agree. From the point of view of the classic "one million monkeys with typewriters" the "result" described here is completely and utterly uninteresting. If he had had each node randomly generate data until one of them had emitted the full work in _one sequence_ he would have a story. There's just this problem that this is highly unlikely to happen even if you throw all the world's computing grids after it, given that there's ~26^n random texts of length n. Therefore, while reading the abstract I knew it had to be fake but I was hoped to be proved wrong - unfortunately I was right :(
Forgot to say, that I can not exclude the possibility that there might be some interesting things in the execution (interesting algorithms for looking up the 9-char bits etc.)? I haven't read the details so I'm not able to judge. In my critical post above I was discussing the approach in terms of the classic idea about the ability of a monkey to generate the work of Shakespeare.
The naivety demonstrated in regards to the original postulate is mind-numbing. Or maybe those who buy into his implementation as being technically correct for the given postulate is the most mind-numbing of all.
... Waiting for PETA's squawks of outrage at this cruelty to monkeys
Isn't April 1st the agreed time for these pranks? after the 'Cat owners are smarter' one, which everybody swallowed up.. Has nobody learned yet? ;-)