Automated Language Deciphering By Computer AI
eldavojohn writes "Ugaritic has been deciphered by an unaided computer program that relied only on four basic assumptions present in many languages. The paper (PDF) may aid researchers in deciphering eight undecipherable languages (Ugaritic has already been deciphered and proved their system worked) as well as increase the number of languages automated translation sites offer. The researchers claim 'orders of magnitude' speedups in deciphering languages with their new system."
But will it go into your ear, or will it be injected via a syringe and live in your gut is the question?
Moved to http://soylentnews.org/. You are invited to join us too!
. The article also notes the success rates where it states that
Critics noted that
Their method relies heavily on the unknown language being related to a known language by some degree. At their heart of their technique is Bayesian statistics applied to lexical and frequency analysis; for this approach to work, there must be some basis for comparison.
Good news, it's a suppository.
Label at least one computer "ham sandwich" to confuse future language researchers.
Alternatively, label each computer with a character's name from (insert show of your choice here).
This is very cool for us undeciphered language fans.
In the article, the language author Andrew Robinson correctly points out that this computer program won't work for languages that don't have a known language that is close to them, say like for Linear A found on Crete, which is definitely not Greek like Linear B turned out to be. There is a lot of speculation that Linear A is a native Minoan (Cretan) script, largely unrelated to any other known script.
However, parallel with Linear A on Crete was a Cretan pictographic script, which may, or may not be related to Egyptian hieroglyphics. The Minoans had known trading ties to Egypt, which had written language long before them. If a relationship could be found (via this computer program) between the Minoan pictographic script and Egyptian hieroglyphs, then that might give insights into how the Linear A script was set up (which is a syllabary script).
The only difficulty is that there may not be enough of the pictographic script to work--I'd imagine you'd need a fair number of examples to really allow the computer to compare and contrast.
Voynich manuscript!
If only we could find a language that is similar enough...
Universal translator, here we come!
Cool! Can I bring it into my next marketing meeting?
Crumb's Corollary: Never bring a knife to a bun fight.
Only if the gross gains in closing juncture exceed the long-term sustainability goals of the viability imperative for all mass interoperability. We at Mega Industries believe this will move us forward to our cloud-based monetization of the human-media dynamic which is strategically important in an ever-evolving mobile continuum. We have directed our customer experience champions to ensure consumers realize this when they call in with emphatic expressions of dissatisfaction.
IBM, as one example, has been on this hard since 2002 ( http://news.cnet.com/2100-1008-998264.html ) when the prize was first announced....stop going all lady gaga over stuf that is so old it can't even be recycled properly.
Well, Old Norse is technically based on Old Germanic rather than the other way round, and Old English not only had Old Germanic input but Old Norse input as well. Along with an uncertain amount of Anglic (amazingly little is known about the Angles), possibly some Jute. English uses Norman French, plus modern French (which itself is derived from Norman French). Norman French survives in the modern world in Guernsey, Jersey and maybe some other Channel Islands but became extinct on Alderney.
To bring this Back On Topic, if English were lost, it would be almost impossible to use this program to recover it. English has input from too many sources, resulting in way too many loan-words of incompatible structure and too much incompatible grammar. However, one very interesting test of the program would be to map each of the derived phonemes in Pre-Indo-European to a character, then compare this derived PIE script with each Indo-European language in turn. If the derivation is correct, the number of correct guesses for translations of PIE words into each known IE language aught to be above what would be expected by chance alone AND the translations should remain compatible with the derivations the PIE engineers used in the first place. By comparing across the translations for all languages, the program may discover other word-parts that had not been noticed before.
It may be possible to determine if a language is truly isolate or not, by analyzing against a language multiple times using slightly different data sets and seeing if the results remain about the same. If this test works, then languages of uncertain/unknown ancestry (such as Basque and Etruscan*) can be tested against all 7,200 known languages to see if any of them produce a moderately stable match. No match means no connection with any other existent linguistic family tree.
*Etruscan is a bugbear. There is one book that is completely intact and undamaged. It's made of gold leaf. The academic who currently owns it has not published so much as a single line of the text, merely two of the illustrations. All other Etruscan texts are fragmentary (so you've very little context to work with and not many words that are definitely complete) or too short to be useful. We don't know what Etruscan is related to, but if the above hypothesis is correct, we could find out and then translate the book. But the damaged texts, such as a linen book used to wrap a mummy, are way too fragmentary. You'd never be sure if such a translation was correct. A complete book, on the other hand, would offer no possibility for mistake. It would work or it wouldn't.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
... see if it can decipher some of the perl code I've had to take over.