Making The Case That Voynich Is A Hoax
DeadVulcan writes "The Voynich Manuscript, a mysterious book of uncertain age, is widely believed to be written either in an unknown language or a long-lost encryption scheme. Nature reports that computer scientist Gordon Rugg has demonstrated that it's possible to generate a text like the Voynich manuscript -- containing language-like regularities, despite being potentially meaningless -- using cryptographic techniques of the time. This lends some support to those who claim that the book is a hoax."
Somebody is laughing a lot.. Remember way back the Salamander Papers?
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
Abdook artelly oppetrom uplocty?! Astenboorsley... af arcoolodople!
Bli, Fal.
Gordon Rugg has demonstrated that it's possible to generate a text like the Voynich manuscript -- containing language-like regularities, despite being potentially meaningless
That's funny. I thought Darl McBride had already proven that with all those open letters he's written.
Mod me down, hippies!
The theory of relativity doesn't work right in Arkansas.
I think this report is missing the fact that if someone really wanted to make a hoax book, they could simply translate any other book (even the bible) into a made up language. If it's an obscure book the likliness that anyone would every figure it out is slim.
-Zibi
Sounds a bit like the Beale Papers.
Dan East
Better known as 318230.
I'm sorry, but calling the Voynich Manuscript a hoax is unfeasible. Sure, could it have in theory been a hoax? Yes, but there is no point to this. The "hoaxer" creates this in 3+ months, with very accurate drawings, and probably hangs on to it till he dies, so that it can be sold to a king 100 years later and eventually make it to america? Then again, maybe Nostradamus wrote it.
And why did you staple the trout to the RAM?
Translation from binary:
Ich denke sein vermutlich einen
Translation from German from binary:
I probably think its one
The technique really is interesting. We have techniques that can identify patterns that are meaningful (all of cryptology, most of number theory, graph theory) but this application is neat because it is an effort to prove--rigorously--that a given set of data is just total noise.
"Oh, the tragedy of math gone wrong. I can't even talk about it." -Wil Wheaton http://www.wilwheaton.net
Had Mr Rugg just used rot13 he would've cracked the code long ago. Want Crypto?
MoFscker
In case you're wondering what it looks like
http://www.voynich.nu/
Remember, even though TLF has been proved, we still don't have the "simple proof" that Fermat himself discovered.
That's because he almost certainly didn't discover one.
Fermat was known for making some pretty bone-headed mistakes. Also, in his future writings he posed challenges to prove FLT for the case of n=3 or n=4, but never for general n>2. If he had found a truly elegant proof of the general case, and believed it was true, why not pose the general challenge?
I've studied the Voynich manuscript before, and the possibility of a hoax seems just as unlikely as many of the theories that have been floating about. Yes, the language of the Voynich manuscript could be an elaborate hoax, but Rugg's analysis only proves what is already widely known.
The problem of creating such an elaborate hoax is that even Rugg's theory doesn't explain all the features of the Voynich manuscript. Furthermore, it seems unlikely that a sixteenth-century forger would go to the trouble of creating something that would have all the qualities of a real language and would include techniques that would deliberately resemble an actual document when viewed with analytical techniques that wouldn't be developed later. Occam's Razor makes it seem more likely that there some kind of language operating in the manuscript than a random system of patterns. Then again, there's no real way of knowing.
There are some images of the text of the Voynich Manuscript available here. Analysis of the text and the illustrations support the theory that the manuscript has defined sections on astrology, herbal medicine, and other subjects. There have been some serious and some rediculous theories about the manuscript from the intriguing notion that the Voynich text is mathematically similar to East Asian languages like Chinese or Vietnamese, or that the Voynich manuscript is written in an ancient form of Ukrainian. (I've read the supposed translation of it from the Ukrainian, and it hardly makes sense given that the manuscript's illustations don't match the text of the supposed translation.)
In the meantime, this site offers more information on modern translation efforts including a font for the Voynich script. (Which would make a lovely way of annoying co-workers by switching their default system font to Voynich text...)
Prof. Rugg has a website about his methods and results, which may be of interest.
Champolion cracked the Rosetta stone with much much less.
The 'true' examples of lost written languages/cyphers (do a google search) are mysteries because there exist few examples of brief length usually bereft of context (of grammar, history, linguistic evolution etc.).
The sheer volume of the Voynich manuscript, plus its origin in relatively modern Europe is what makes it so interesting to amateur cryptographers.
The Nature Paper is too brief to know how good Rugg's analysis is (and the Cryptologia site has been slashdotted), but if it holds up it is an interesting result, even if it is a conclusion that many "very smart cryptographers"(TM) have suspected for a long time
No. It is the proponents of the idea that the book is genuine's job to prove that it is indeed that. One doesn't need to prove that something is a hoax if it is, Occam's Razor does that job. What explanation is contains the fewest ubstantiated assumptions: That something was written a language nobody knows, containing valuable information nobody has any idea about, or that it was produced using a simple encryption technique to fool somebody to pay loads of shiny ducats?
I find it amazing that some people still hold this myth as true! What kind of history education have you had!?!
Look, no scientist have never claimed the earth was flat. For one thing, in every other culture than the western, it has never been claimed otherwise ("they even knew the earth was spherical"), but some has got the weird notion that Columbus had to argue that the earth wasn't flat.
He didn't. The moron had the wrong numbers, and would have gotten killed if America didn't happen to be there.
Allready the pupils of Thales claimed their master knew the earth was round. Erastostenes, measured the circumference of the earth with an error of 3%! The true circumference of the earth was known to the greeks in antiquity! Plato and his pupil Aristotle himself knew many arguments for the spherical shape of the earth, and why is this important? Because though some Christian scholars around 300 AD didn't like the idea of a spherical earth, St. Augustin adopted much of Plato's philosophy and made it an important part of christianity in the same century, and they adopted the ideas of a spherical earth as well. Through Augustin, every leading authority accepted the idea of a spherical earth.
Eventually, Erastostenes numbers was also accepted , but Columbus didn't like them, because it meant that going the other way to India was infeasible. So, he used some other numbers, and he used Marco Polo's exaggerated estimates of the distance he had travelled, and so he made it quite feasible. But it wasn't, he was wrong.
Columbus thought the distance to Asia was 4000 km, his contemporary scientists 16000 km, the real distance is 23000 km, while Columbus eventually travelled 6500 km.
So, why is this important? Because people who hold this belief often have many other misunderstandings about science. Indeed, you can't prove that the book is a hoax, but for that reason, the burden of the proof rests with the proponents of the idea that it is genuine. Who, of course, might cling to the idea that it is, long after the world has moved on to greener pastures. That's how it usually works anyway.
Employee of Inrupt, Project Release Manager and Community Manager for Solid
Those who read the article can take note of an interesting challenge: though Rugg has shown that it is possible to generate a high quality hoax using a Cardan grille, proving it to be a hoax may require producing a character grid that will actually generate large portions of the text. My question is, could that be done with a genetic algorithm, and are any Slashdotters up to the task?
Also, a few comments about formal analysis. Notice that if you took some arbitrary text, typeset it in a fixed-width font to force the characters into columns, and then skimmed it with a grille in order to generate a new text, you would automatically preserve such basic statistics as character frequency, including spaces and also punctuation if you used them in your grid. (Depending on how you applied the grille, you could actually be generating a simple permutation of the original text.) However, you would disrupt all the within-word correlations.
For example, in compound words derived from Latin there is a familiar pattern where ad C* ==> aCC* (where C is some arbitrary consonant), but that pattern would be completely obscured if the characters were read off a diagonal grille as shown in the photograph. You would still get the increased frequency for C, but not the common aCC pattern.
More subtly, there are some well known universals of syllable structure in natural languages, but those would be scrambled just as the aCC would be. You would have the right proportions of consonants and vowels, but not a realistic distribution within words.
Likewise, prefixes and suffixes would be scrambled. If it is a hoax generated by a Cardan grille, it should not have prefix/suffix patterns that occur commonly in many languages. (Ditto for suffixal inflections.) In fact, the letters appearing at the beginnings and ends of words should be a random sampling from the frequency distribution of letters in the whole text; this may be the easiest metric to check.
Also, by using spaces as characters in your grid you'd get the right proportion of spaces, and therefore the right average word length, but you would obscure any patterns in word length. Someone has already linked to studies of the word lengths in the manuscripts, but those assumed that the distribution of Latin word lengths word lengths would be preserved. However, only the average would be preserved. I suspect the distribution would be converted to a gaussian. Anyone got time for the experiment? (Notice that you may generate extra spaces with the grille, depending on how you use it. For example, what do you do when your grille starts running off the bottom of the page in your source text? Or, if your grille has 10 windows, do you transcribe to the first space and then move the grille, or do you transcribe everything in the grille and insert a "virtual" space for position 11? It looks to me like you might be able to generate the document's actual "word" lengths from Latin, given only some very basic assumptions.)
Sheesh, evil *and* a jerk. -- Jade
One definition of randomness, and one that seems quite reasonable is that a string is "random" if it cannot be compressed to smaller than it is, i.e. listing its characters itself is the most compact possible description. Formally, a string is random if there exists no algorithm generating the string whose description on some universal Turing machine is smaller than the string itself (this is the definition used in the field of Kolmogorov complexity). A string of a billion digits making up Pi, for example, is not random by this definition, as one can easily write a short program, whose length would certainly be less than one billion characters, whose output is the digits of Pi. Think of it this way: the most general form of pattern matching device that we know of is a Turing machine, and if the best device you can construct to match that pattern is as complex or more complex than the pattern itself, then well, you have total randomness. Unfortunately, rigorously proving that a particular string is random by this very strong definition is extremely difficult, as you run into undecidability everywhere you turn.
This is the sort of stuff that real theoretical computer science is made of. For a very good overview of the theory of Kolmogorov Complexity and algorithmic information theory, Gregory Chaitin's home page is a good starting point
To go back to the Voynich manuscript, if there is some sort of regularity that can be discerned from it, then perhaps a context-free or context-sensitive (or something in between) language may be found to characterize it. Once you have such a syntactic characterization, perhaps it might be possible to divine the semantics from context. The shape of the grammar that results may well prove whether the Manuscript is in fact a real language, a fabrication, an elaborate cipher, or just total gibberish.
Qu'on me donne six lignes écrites de la main du plus honnête homme, j'y trouverai de quoi le faire pendre.
Have they tried casting "Read Magic" on it?
Manipulate the moderator system! Mod someone as "overrated" today.
Some other good links for Voynich information:
Elonka :)
There's no direct evidence that the document is forged. There's also no direct evidence that it's genuine, or even what "genuine" would mean. There are stories vaguely associating it with various interesting people, such as John Dee and Roger Bacon, but they're all pretty vague.
People have been studying this document for the better part of a century, because it's fascinating, enigmatic, and beautiful. (You can find some pictures of it at www.voynichinfo.com) We know a bit more than we did about what kinds of hypotheses are plausible and what kinds are not. For example: we can be pretty sure that it is not written in any natural language. We can also be pretty sure that it isn't just a simple substitution cipher. Finally, we can be pretty sure that it isn't a 20th century forgery: it has been given a rough date, it really does look like a manuscript from the 15th or 16th century, and it probably was once owned by Rudolf II. The Roger Bacon rumors are almost certainly false, because the manuscript doesn't appear to be that old. The John Dee rumors may be true.
At present the two most plausible guesses are that it is a real 15th or 16th century treatise on an occult subject, written in a code that has yet to be broken, or that it's a good imitation of an encoded occult text. If the latter, it was probably written specifically for the purpose of fooling Rudolf. It is known that he was fascinated by the occult (there's even an opera where that's a crucial plot point), and it is known that many of the astrologers and alchemists he patronized were quacks and that many of the texts he bought were forgeries.
What's interesting about this research isn't that it's a new argument against the possibility that the manuscript is genuine, but that it's a good counterargument. Until now, many people argued that the manuscript wasn't likely to be a forgery because the text followed a certain statistical property of natural languages (Zipf's law) that weren't known until the 20th century. Thus, the argument goes, it's unlikely to be a 16th century fake because a 16th century forger, inventing a fake code or a fake language, wouldn't have known to match this statistical distribution.
The reason this work is interesting is that it shows that this argument is invalid: there is a plausible method that a 16th century forger might have used that might have produced such a document. This doesn't show that it really is a 16th century forgery, it only shows that there's one fewer argument against that possibility than we once believed.
In the end, of course, we're unlikely to ever have decisive evidence that the manuscript is fake. Either someone will come up with a believable decryption (several people claim to have done it already; none of their claims have stood up), or people will keep trying and failing. The longer scholars bang their heads against the wall trying to get a translation, the less likely people will think it is that there really is one. Messy, but that's the way the world works. Sometimes you don't get to learn for sure whose guess is right.