Slashdot Mirror


Making The Case That Voynich Is A Hoax

DeadVulcan writes "The Voynich Manuscript, a mysterious book of uncertain age, is widely believed to be written either in an unknown language or a long-lost encryption scheme. Nature reports that computer scientist Gordon Rugg has demonstrated that it's possible to generate a text like the Voynich manuscript -- containing language-like regularities, despite being potentially meaningless -- using cryptographic techniques of the time. This lends some support to those who claim that the book is a hoax."

28 of 382 comments (clear)

  1. The Salamander Papers by Saeed+al-Sahaf · · Score: 5, Interesting

    Somebody is laughing a lot.. Remember way back the Salamander Papers?

    --
    "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
  2. Ershlap? by paul248 · · Score: 4, Funny

    Abdook artelly oppetrom uplocty?! Astenboorsley... af arcoolodople!

    Bli, Fal.

    1. Re:Ershlap? by decipher_saint · · Score: 5, Funny

      Are you my boss, 'cause you sound like him...

      Am I fired yet?

      --
      crazy dynamite monkey
  3. Been there, done that by User+956 · · Score: 4, Funny

    Gordon Rugg has demonstrated that it's possible to generate a text like the Voynich manuscript -- containing language-like regularities, despite being potentially meaningless

    That's funny. I thought Darl McBride had already proven that with all those open letters he's written.

    Mod me down, hippies!

    --
    The theory of relativity doesn't work right in Arkansas.
  4. Library of Babel by Mrs.+Grundy · · Score: 5, Interesting
    This reminds me of a passage from Jorge Luis Borges' Library of Babel. In fact a lot reminds me of that story these days.

    Five hundred years ago, the chief of an upper hexagon (2) came upon a book as confusing as the others, but which had nearly two pages of homogeneous lines. He showed his find to a wandering decoder who told him the lines were written in Portuguese; others said they were Yiddish. Within a century, the language was established: a Samoyedic Lithuanian dialect of Guarani, with classical Arabian inflections. The content was also deciphered: some notions of combinative analysis, illustrated with examples of variations with unlimited repetition.
  5. Missing the fact.... by Zibi · · Score: 4, Interesting

    I think this report is missing the fact that if someone really wanted to make a hoax book, they could simply translate any other book (even the bible) into a made up language. If it's an obscure book the likliness that anyone would every figure it out is slim.

    --
    -Zibi
    1. Re:Missing the fact.... by Anonymous Coward · · Score: 5, Insightful

      actually very few people could write on any known topic (such as a topic for which we have a contemporaneous book in a known language) in a consistent but made-up language without being easily decipherable. We couoldn't figure out ancient egyptian because we had no idea what topic they were even talking about.... ALL it took to figure out ancient egyptian was being told (in ancient Greek, which we knew) what topic a couple of sentences of egyptian were talking about...we had no idea, having almost NO idea what various examples of the writing could POSSIBLY have stood for.

    2. Re:Missing the fact.... by 1u3hr · · Score: 5, Insightful
      if someone really wanted to make a hoax book, they could simply translate any other book (even the bible) into a made up language.

      Making up a language, that isn't just a scrambled version of an existing one, is very, very hard. It takes someone like Tolkien (a professor of Old English who could translate Norse on the fly) to do that convincingly, and I doubt that anyone in the period could have done it in a way that would still defy detection.

  6. Beale Papers by Dan+East · · Score: 4, Interesting

    Sounds a bit like the Beale Papers.

    Dan East

    --
    Better known as 318230.
  7. Ridiculous by SargeZT · · Score: 4, Interesting

    I'm sorry, but calling the Voynich Manuscript a hoax is unfeasible. Sure, could it have in theory been a hoax? Yes, but there is no point to this. The "hoaxer" creates this in 3+ months, with very accurate drawings, and probably hangs on to it till he dies, so that it can be sold to a king 100 years later and eventually make it to america? Then again, maybe Nostradamus wrote it.

    --
    And why did you staple the trout to the RAM?
    1. Re:Ridiculous by Seth+Morabito · · Score: 5, Interesting

      The point of a hoax, in my opinion, would most likely have been financial gain.

      There is no clear evidence pointing to an exact date that the manuscript was written, and the only firm circumstantial evidence we have to go on is Marcus Marci's letter to Anasthasius Kirchir, which mentions that the manuscript was sold to King Rudolph for 600 ducats. That is a heck of a lot of money. It seems perfectly reasonable to me that someone manufactured the manuscript to extract 600 ducats from the emperor.

      This assumes a lot. It assumes that the letter is genuine, and it assumes that the facts mentioned in the letter are true, and it assumes that Rudolph was the first buyer, so it is by no means a sure thing. But a lot of us who lean (gingerly) toward the hoax theory stand by Occam's Razor, which points to a hoax being at least a feasable, and probably even likely solution. Rugg's analysis is just more circumstantial evidence, not proof, but every little bit weights the scale more.

    2. Re:Ridiculous by shaitand · · Score: 4, Insightful

      No actually "evidence" THIS broad lends no weight whatsoever. I saw this wholeheartedly as someone who has never even heard of the particular manuscript in question.

      Here is what I know, partly assuming what you've said is accurate. Nobody knows when the manuscript was produced, the only evidence that indicates it's existance at a particular point may be suspect (although this is the case with much of the dates we've fixed for events in history and even the basis for several things we believe happened to the degree we call and teach them as facts). Yet this discovery claims at the time the manuscript was produced it was possible to produce fake meaningless gibberish that appears to have meaning.

      Am I the only one who finds a problem with that in itself? How can you claim something was possible at the creation date when you don't know the creation date?

      Next, giving that magically the date looked into did happen to coincide with the creation date that nobody knows. How exactly does a process being theoretically possible at a date get considered as evidence that is what was done in a particular instance?

      Example, my house catches fire. Firefighters are unable to determine the source. The insurance company denies my claim on the grounds that the technology existed to rub two sticks together to generate heat and produce fire.

      I wouldn't even call that circumstantial evidence. That isn't EVIDENCE at all. Hell if there were two sticks in the lawn right under the tree, then it would become the most ridiculous circumstantial evidence that should obviously be tossed aside. But it would be the sticks that are the evidence there, not the fact that it's possible to create fire by rubbing two sticks together and the technology existed at the time. However there isn't even that much here.

  8. Re:My 2 cents by Anonymous Coward · · Score: 5, Informative

    Translation from binary:
    Ich denke sein vermutlich einen

    Translation from German from binary:
    I probably think its one

  9. The pattern of nonsense by the+end+of+britain · · Score: 4, Interesting

    The technique really is interesting. We have techniques that can identify patterns that are meaningful (all of cryptology, most of number theory, graph theory) but this application is neat because it is an effort to prove--rigorously--that a given set of data is just total noise.

    --
    "Oh, the tragedy of math gone wrong. I can't even talk about it." -Wil Wheaton http://www.wilwheaton.net
  10. so obvious by segment · · Score: 4, Funny
    Gordon Rugg has used the techniques of Elizabethan espionage to recreate the Voynich manuscript, which has stumped code-breakers and linguists for nearly a century

    Had Mr Rugg just used rot13 he would've cracked the code long ago. Want Crypto?

  11. Google found me this by ElDuque · · Score: 5, Informative


    In case you're wondering what it looks like

    http://www.voynich.nu/

  12. Re:It's one thing to say something is a hoax... by cpeikert · · Score: 4, Insightful

    Remember, even though TLF has been proved, we still don't have the "simple proof" that Fermat himself discovered.

    That's because he almost certainly didn't discover one.

    Fermat was known for making some pretty bone-headed mistakes. Also, in his future writings he posed challenges to prove FLT for the case of n=3 or n=4, but never for general n>2. If he had found a truly elegant proof of the general case, and believed it was true, why not pose the general challenge?

  13. A Hoax? To What End? by WombatControl · · Score: 4, Informative

    I've studied the Voynich manuscript before, and the possibility of a hoax seems just as unlikely as many of the theories that have been floating about. Yes, the language of the Voynich manuscript could be an elaborate hoax, but Rugg's analysis only proves what is already widely known.

    The problem of creating such an elaborate hoax is that even Rugg's theory doesn't explain all the features of the Voynich manuscript. Furthermore, it seems unlikely that a sixteenth-century forger would go to the trouble of creating something that would have all the qualities of a real language and would include techniques that would deliberately resemble an actual document when viewed with analytical techniques that wouldn't be developed later. Occam's Razor makes it seem more likely that there some kind of language operating in the manuscript than a random system of patterns. Then again, there's no real way of knowing.

    There are some images of the text of the Voynich Manuscript available here. Analysis of the text and the illustrations support the theory that the manuscript has defined sections on astrology, herbal medicine, and other subjects. There have been some serious and some rediculous theories about the manuscript from the intriguing notion that the Voynich text is mathematically similar to East Asian languages like Chinese or Vietnamese, or that the Voynich manuscript is written in an ancient form of Ukrainian. (I've read the supposed translation of it from the Ukrainian, and it hardly makes sense given that the manuscript's illustations don't match the text of the supposed translation.)

    In the meantime, this site offers more information on modern translation efforts including a font for the Voynich script. (Which would make a lovely way of annoying co-workers by switching their default system font to Voynich text...)

  14. Author's Page by mlc · · Score: 4, Informative

    Prof. Rugg has a website about his methods and results, which may be of interest.

  15. Missing a (cryptographic) clue ... by Professor+D · · Score: 5, Insightful
    But, a volume of self consistent language (even a made up one) of over a hundred pages of text with accompanying pictures should fall to statistical and linguistic analysis.

    Champolion cracked the Rosetta stone with much much less.

    The 'true' examples of lost written languages/cyphers (do a google search) are mysteries because there exist few examples of brief length usually bereft of context (of grammar, history, linguistic evolution etc.).

    The sheer volume of the Voynich manuscript, plus its origin in relatively modern Europe is what makes it so interesting to amateur cryptographers.

    The Nature Paper is too brief to know how good Rugg's analysis is (and the Cryptologia site has been slashdotted), but if it holds up it is an interesting result, even if it is a conclusion that many "very smart cryptographers"(TM) have suspected for a long time

    1. Re: Missing a (cryptographic) clue ... by Black+Parrot · · Score: 5, Insightful


      > But, a volume of self consistent language (even a made up one) of over a hundred pages of text with accompanying pictures should fall to statistical and linguistic analysis.

      I doubt it. How many possible mappings are there between strings of characters and meanings? And even with plausible interpretations of the pictures (e.g., a herbarium), the number of things that might be said in that context is for all purposes unbounded:

      xyz =?= "this soothes the throbbing toe"
      xyz =?= "this is very poisonous"
      xyz =?= "this grows only in Ys"
      xyz =?= "I learned this from my grandmother" ...
      Surely it will never be deciphered if it is in an unknown language.

      > Champolion cracked the Rosetta stone with much much less.

      Actually, he had the benefit of a parallel text.

      In the absence of a parallel text, this will only be decyphered the way Linear B was: after a rigorous analysis of the patterns in the text, and a much tighter context (essentially lists of <picture,name,number> tuples), it was noticed that some very obvious translations ("man" and "woman", or such) fit the inflectional pattern of a language historically spoken in the region where the texts were found, and that simple mapping could be extended to other obvious <picture,name> pairs without introducing inconsistencies.

      I suppose it's possible that something similar could be done with the manuscript, but IMO only if there are some clearly labeled images that give tight enough a context to guess the specific word being used. And then some luck, because somebody has to recognize some language-specific patterns (such as the Greek masculine/feminine inflectional suffixes). And of course, more luck in what language it happens to be: Linear B might never have been deciphered if Greek didn't use gender-based patterns in its noun declensions.

      If it happens to be written in some unknown language, IMO it will never be deciphered.

      --
      Sheesh, evil *and* a jerk. -- Jade
  16. repeats by 1u3hr · · Score: 5, Insightful
    The Nature story says:
    The text contains some features that are not seen in any language. The most common words are often repeated two or three times, for example - the equivalent of English using 'and and and' - giving weight to the hoax theory.
    Indonesian pluralises words by duplicating them (anak = child, anak anak = children). And many languages, including English ("he was really, really stupid") intensify by repetition, so this point is not at all conclusive.
  17. Re:It's one thing to say something is a hoax... by KjetilK · · Score: 4, Informative

    Anyone can say anything is a hoax but it takes scientific evidence - actual empirical data - to prove such a claim.

    No. It is the proponents of the idea that the book is genuine's job to prove that it is indeed that. One doesn't need to prove that something is a hoax if it is, Occam's Razor does that job. What explanation is contains the fewest ubstantiated assumptions: That something was written a language nobody knows, containing valuable information nobody has any idea about, or that it was produced using a simple encryption technique to fool somebody to pay loads of shiny ducats?

    For example, people once believed that the Earth was flat (some people still do) but the circumnavigation of the globe by explorers such as Magellan, lunar exclipses, etc provide evidence to the contrary.

    I find it amazing that some people still hold this myth as true! What kind of history education have you had!?!

    Look, no scientist have never claimed the earth was flat. For one thing, in every other culture than the western, it has never been claimed otherwise ("they even knew the earth was spherical"), but some has got the weird notion that Columbus had to argue that the earth wasn't flat.

    He didn't. The moron had the wrong numbers, and would have gotten killed if America didn't happen to be there.

    Allready the pupils of Thales claimed their master knew the earth was round. Erastostenes, measured the circumference of the earth with an error of 3%! The true circumference of the earth was known to the greeks in antiquity! Plato and his pupil Aristotle himself knew many arguments for the spherical shape of the earth, and why is this important? Because though some Christian scholars around 300 AD didn't like the idea of a spherical earth, St. Augustin adopted much of Plato's philosophy and made it an important part of christianity in the same century, and they adopted the ideas of a spherical earth as well. Through Augustin, every leading authority accepted the idea of a spherical earth.

    Eventually, Erastostenes numbers was also accepted , but Columbus didn't like them, because it meant that going the other way to India was infeasible. So, he used some other numbers, and he used Marco Polo's exaggerated estimates of the distance he had travelled, and so he made it quite feasible. But it wasn't, he was wrong.

    Columbus thought the distance to Asia was 4000 km, his contemporary scientists 16000 km, the real distance is 23000 km, while Columbus eventually travelled 6500 km.

    So, why is this important? Because people who hold this belief often have many other misunderstandings about science. Indeed, you can't prove that the book is a hoax, but for that reason, the burden of the proof rests with the proponents of the idea that it is genuine. Who, of course, might cling to the idea that it is, long after the world has moved on to greener pastures. That's how it usually works anyway.

    --
    Employee of Inrupt, Project Release Manager and Community Manager for Solid
  18. Interesting problem. by Black+Parrot · · Score: 4, Interesting


    Those who read the article can take note of an interesting challenge: though Rugg has shown that it is possible to generate a high quality hoax using a Cardan grille, proving it to be a hoax may require producing a character grid that will actually generate large portions of the text. My question is, could that be done with a genetic algorithm, and are any Slashdotters up to the task?

    Also, a few comments about formal analysis. Notice that if you took some arbitrary text, typeset it in a fixed-width font to force the characters into columns, and then skimmed it with a grille in order to generate a new text, you would automatically preserve such basic statistics as character frequency, including spaces and also punctuation if you used them in your grid. (Depending on how you applied the grille, you could actually be generating a simple permutation of the original text.) However, you would disrupt all the within-word correlations.

    For example, in compound words derived from Latin there is a familiar pattern where ad C* ==> aCC* (where C is some arbitrary consonant), but that pattern would be completely obscured if the characters were read off a diagonal grille as shown in the photograph. You would still get the increased frequency for C, but not the common aCC pattern.

    More subtly, there are some well known universals of syllable structure in natural languages, but those would be scrambled just as the aCC would be. You would have the right proportions of consonants and vowels, but not a realistic distribution within words.

    Likewise, prefixes and suffixes would be scrambled. If it is a hoax generated by a Cardan grille, it should not have prefix/suffix patterns that occur commonly in many languages. (Ditto for suffixal inflections.) In fact, the letters appearing at the beginnings and ends of words should be a random sampling from the frequency distribution of letters in the whole text; this may be the easiest metric to check.

    Also, by using spaces as characters in your grid you'd get the right proportion of spaces, and therefore the right average word length, but you would obscure any patterns in word length. Someone has already linked to studies of the word lengths in the manuscripts, but those assumed that the distribution of Latin word lengths word lengths would be preserved. However, only the average would be preserved. I suspect the distribution would be converted to a gaussian. Anyone got time for the experiment? (Notice that you may generate extra spaces with the grille, depending on how you use it. For example, what do you do when your grille starts running off the bottom of the page in your source text? Or, if your grille has 10 windows, do you transcribe to the first space and then move the grille, or do you transcribe everything in the grille and insert a "virtual" space for position 11? It looks to me like you might be able to generate the document's actual "word" lengths from Latin, given only some very basic assumptions.)

    --
    Sheesh, evil *and* a jerk. -- Jade
  19. Can you say "Kolmogorov complexity"? by dido · · Score: 5, Interesting

    One definition of randomness, and one that seems quite reasonable is that a string is "random" if it cannot be compressed to smaller than it is, i.e. listing its characters itself is the most compact possible description. Formally, a string is random if there exists no algorithm generating the string whose description on some universal Turing machine is smaller than the string itself (this is the definition used in the field of Kolmogorov complexity). A string of a billion digits making up Pi, for example, is not random by this definition, as one can easily write a short program, whose length would certainly be less than one billion characters, whose output is the digits of Pi. Think of it this way: the most general form of pattern matching device that we know of is a Turing machine, and if the best device you can construct to match that pattern is as complex or more complex than the pattern itself, then well, you have total randomness. Unfortunately, rigorously proving that a particular string is random by this very strong definition is extremely difficult, as you run into undecidability everywhere you turn.

    This is the sort of stuff that real theoretical computer science is made of. For a very good overview of the theory of Kolmogorov Complexity and algorithmic information theory, Gregory Chaitin's home page is a good starting point

    To go back to the Voynich manuscript, if there is some sort of regularity that can be discerned from it, then perhaps a context-free or context-sensitive (or something in between) language may be found to characterize it. Once you have such a syntactic characterization, perhaps it might be possible to divine the semantics from context. The shape of the grammar that results may well prove whether the Manuscript is in fact a real language, a fabrication, an elaborate cipher, or just total gibberish.

    --
    Qu'on me donne six lignes écrites de la main du plus honnête homme, j'y trouverai de quoi le faire pendre.
  20. The Voynich manuscript by t0ny · · Score: 4, Funny

    Have they tried casting "Read Magic" on it?

    --

    Manipulate the moderator system! Mod someone as "overrated" today.

  21. More information about Voynich by Elonka · · Score: 4, Informative
    On my own list of Famous Unsolved Codes, the Voynich Manuscript is right up there at #2, just under the Beale Ciphers (which also have some pretty compelling arguments that they're a hoax).

    Some other good links for Voynich information:

    • An excellent viewer which lets you quickly see thumbnails of all of the pages at once.
    • A good overview page
    • The Voynich Mailing List - a site maintained by Jim Gillogly (famous for cracking the first few parts of Kryptos).

    Elonka :)

  22. Re:Your analogy is incomplete by Matthew+Austern · · Score: 4, Informative

    There's no direct evidence that the document is forged. There's also no direct evidence that it's genuine, or even what "genuine" would mean. There are stories vaguely associating it with various interesting people, such as John Dee and Roger Bacon, but they're all pretty vague.

    People have been studying this document for the better part of a century, because it's fascinating, enigmatic, and beautiful. (You can find some pictures of it at www.voynichinfo.com) We know a bit more than we did about what kinds of hypotheses are plausible and what kinds are not. For example: we can be pretty sure that it is not written in any natural language. We can also be pretty sure that it isn't just a simple substitution cipher. Finally, we can be pretty sure that it isn't a 20th century forgery: it has been given a rough date, it really does look like a manuscript from the 15th or 16th century, and it probably was once owned by Rudolf II. The Roger Bacon rumors are almost certainly false, because the manuscript doesn't appear to be that old. The John Dee rumors may be true.

    At present the two most plausible guesses are that it is a real 15th or 16th century treatise on an occult subject, written in a code that has yet to be broken, or that it's a good imitation of an encoded occult text. If the latter, it was probably written specifically for the purpose of fooling Rudolf. It is known that he was fascinated by the occult (there's even an opera where that's a crucial plot point), and it is known that many of the astrologers and alchemists he patronized were quacks and that many of the texts he bought were forgeries.

    What's interesting about this research isn't that it's a new argument against the possibility that the manuscript is genuine, but that it's a good counterargument. Until now, many people argued that the manuscript wasn't likely to be a forgery because the text followed a certain statistical property of natural languages (Zipf's law) that weren't known until the 20th century. Thus, the argument goes, it's unlikely to be a 16th century fake because a 16th century forger, inventing a fake code or a fake language, wouldn't have known to match this statistical distribution.

    The reason this work is interesting is that it shows that this argument is invalid: there is a plausible method that a 16th century forger might have used that might have produced such a document. This doesn't show that it really is a 16th century forgery, it only shows that there's one fewer argument against that possibility than we once believed.

    In the end, of course, we're unlikely to ever have decisive evidence that the manuscript is fake. Either someone will come up with a believable decryption (several people claim to have done it already; none of their claims have stood up), or people will keep trying and failing. The longer scholars bang their heads against the wall trying to get a translation, the less likely people will think it is that there really is one. Messy, but that's the way the world works. Sometimes you don't get to learn for sure whose guess is right.