Slashdot Mirror


Automated Language Deciphering By Computer AI

eldavojohn writes "Ugaritic has been deciphered by an unaided computer program that relied only on four basic assumptions present in many languages. The paper (PDF) may aid researchers in deciphering eight undecipherable languages (Ugaritic has already been deciphered and proved their system worked) as well as increase the number of languages automated translation sites offer. The researchers claim 'orders of magnitude' speedups in deciphering languages with their new system."

13 of 109 comments (clear)

  1. Re:Sweet by Fluffeh · · Score: 3, Funny

    But will it go into your ear, or will it be injected via a syringe and live in your gut is the question?

    --
    Moved to http://soylentnews.org/. You are invited to join us too!
  2. Answers to all TFA questions by cappp · · Score: 5, Informative
    Just so we can keep the “didn’t read TFA” comments to a minimum: The four assumptions as laid out in the article are:

    - The language being deciphered is closely related to some other language: In the case of Ugaritic, the researchers chose Hebrew.

    - There’s a systematic way to map the alphabet of one language on to the alphabet of the other, and that correlated symbols will occur with similar frequencies in the two languages. The system makes a similar assumption at the level of the word: The languages should have at least some cognates, or words with shared roots, like main and mano in French and Spanish, or homme and hombre.

    - The system assumes a similar mapping for parts of words. A word like “overloading,” for instance, has both a prefix — “over” — and a suffix — “ing.” The system would anticipate that other words in the language will feature the prefix “over” or the suffix “ing” or both, and that a cognate of “overloading” in another language — say, “surchargeant” in French — would have a similar three-part structure.

    . The article also notes the success rates where it states that

    Ugaritic has already been deciphered: Otherwise, the researchers would have had no way to gauge their system’s performance. The Ugaritic alphabet has 30 letters, and the system correctly mapped 29 of them to their Hebrew counterparts. Roughly one-third of the words in Ugaritic have Hebrew cognates, and of those, the system correctly identified 60 percent. “Of those that are incorrect, often they’re incorrect only by a single letter, so they’re often very good guesses,” Snyder says.

    Critics noted that

    The researchers’ approach, he says, presupposes that the language to be deciphered has an alphabet that can be mapped onto the alphabet of a known language — “which is almost certainly not the case with any of the important remaining undeciphered scripts.” It also assumes, he argues, that it’s clear where one character or word ends and another begins, which is not the case with many deciphered and undeciphered scripts. The decipherment of Ugaritic took years and relied on some happy coincidences — such as the discovery of an axe that had the word “axe” written on it in Ugaritic.

    1. Re:Answers to all TFA questions by MichaelSmith · · Score: 4, Insightful

      The decipherment of Ugaritic took years and relied on some happy coincidences — such as the discovery of an axe that had the word “axe” written on it in Ugaritic.

      Maybe I should go around and write "computer" in English on all my computers, as a service to future language researchers.

    2. Re:Answers to all TFA questions by vlueboy · · Score: 3, Interesting

      The decipherment of Ugaritic took years and relied on some happy coincidences — such as the discovery of an axe that had the word “axe” written on it in Ugaritic.

      Maybe I should go around and write "computer" in English on all my computers, as a service to future language researchers.

      Extinct language researchers examining english would fail at this same task 3000 years from now. English has no nouns --it has brand names: today's "computers" have big "Dell" logos but not "Computer."

      Also, how would researchers realize that [Apple Mac Glyph] isn't an integral part of our "ancient moon runes" if seen from their era? :)

    3. Re:Answers to all TFA questions by mrsurb · · Score: 4, Funny

      Also, how would researchers realize that [Apple Mac Glyph] isn't an integral part of our "ancient moon runes" if seen from their era? :)

      They'd probably see it as having some sort of religious significance. And they'd be correct.

  3. Re:Sweet by doishmere · · Score: 3, Informative

    Their method relies heavily on the unknown language being related to a known language by some degree. At their heart of their technique is Bayesian statistics applied to lexical and frequency analysis; for this approach to work, there must be some basis for comparison.

  4. Re:Sweet by Anonymous Coward · · Score: 5, Funny

    Good news, it's a suppository.

  5. Pfft, why? by mdenham · · Score: 5, Funny

    Label at least one computer "ham sandwich" to confuse future language researchers.

    Alternatively, label each computer with a character's name from (insert show of your choice here).

  6. Linear A Implications by DowdyGoat · · Score: 5, Interesting

    This is very cool for us undeciphered language fans.

    In the article, the language author Andrew Robinson correctly points out that this computer program won't work for languages that don't have a known language that is close to them, say like for Linear A found on Crete, which is definitely not Greek like Linear B turned out to be. There is a lot of speculation that Linear A is a native Minoan (Cretan) script, largely unrelated to any other known script.

    However, parallel with Linear A on Crete was a Cretan pictographic script, which may, or may not be related to Egyptian hieroglyphics. The Minoans had known trading ties to Egypt, which had written language long before them. If a relationship could be found (via this computer program) between the Minoan pictographic script and Egyptian hieroglyphs, then that might give insights into how the Linear A script was set up (which is a syllabary script).

    The only difficulty is that there may not be enough of the pictographic script to work--I'd imagine you'd need a fair number of examples to really allow the computer to compare and contrast.

    1. Re:Linear A Implications by KritonK · · Score: 3, Informative

      Actually, the program might be able to help: From what I understand, the Linear A alphabet is related to the linear B alphabet, which has been deciphered, even though the languages may be different. We know a bit about context (what we have are mostly inventories), and we even know the meaning of one word: the one next to the total of the amounts in the inventory probably means "total". Furthermore, that word, ku-ro, is similar to a form of a Greek word for "total" ("houlon"), so it is very likely that the language is at least indoeuropean in origin. One could try using various indoeuropean languages as candidates for the related language, until the program comes up with something meanngful.

      Now, if only we had a larger sample of the language of the disk of Phaestos...

  7. Re:Sweet by grcumb · · Score: 4, Funny

    Universal translator, here we come!

    Cool! Can I bring it into my next marketing meeting?

    --
    Crumb's Corollary: Never bring a knife to a bun fight.
  8. Re:Sweet by Walt+Dismal · · Score: 4, Funny

    Only if the gross gains in closing juncture exceed the long-term sustainability goals of the viability imperative for all mass interoperability. We at Mega Industries believe this will move us forward to our cloud-based monetization of the human-media dynamic which is strategically important in an ever-evolving mobile continuum. We have directed our customer experience champions to ensure consumers realize this when they call in with emphatic expressions of dissatisfaction.

  9. You want to impress me... by ngc5194 · · Score: 3, Funny

    ... see if it can decipher some of the perl code I've had to take over.