A Universal Networking Language for the Internet?
Anonymous Coward writes: "The United Nations University is developing a
Universal Networking Language for the Internet, which is designed to allow effective communication between people writing in their native languages, with automatic conversion through an intermediate Meta-language (perhaps a precursor to Star Trek's Universal Translator.)
They will be holding a symposium on the technology on 18 November in Brussels, Belgium, where they will publicly announce their achievement. They claim that the initial stage of UNL will support 16 languages: Arabic, Chinese, English, French, Russian,
Spanish, German, Hindi, Italian, Indonesian, Japanese, Latvian, Mongol, Portuguese, Swahili and Thai." An interesting idea, but this is one of those "the devil is in the details" things. It'll be interesting to see how/if this can work.
I'm hedging my bets it will be fish shaped, and will fit into the inner ear.
I'd think it would be difficult to make an abstracted meta-language out of human languages. There's lots of grammatical issues which would be particularly difficult to deal with well.
For example, in the case of inflected languages, how do you get the declensional case information into the metalanguage? In many languages, there are grammatical cases have overlapping declensions, so there's ambiguity about what would be intended with meaning. And mapping between languages would be really tough.
Verbs would be really tough. Like in Russian, you have three tenses (past, present, and future) as well as two verb aspects. So you have pairs of verbs, one expressing action that occurs once, the other expressing habitual activity.
Sounds like the project would be lots of fun to work on, though. It's a really neat idea, linguistically.
I'm a little confused ... does "Universal Networking Language" mean Esperanto or TCP/IP?
--
"I find your lack of faith disturbing." -- Darth Vader
OK, here are some more answers.
... UNTIL" loop is the perfect solution for certain problems, and "dinero" is the perfect translation of "money" into Spanish. A TCP/IP stack, no matter which OS it is running on, will always have some sort of ACK/NACK test. But these are all very limited examples.
/. At the very heart of the cutting edge. (some text removed) I wouldn't expect your friends to be out of work any time soon. But isn't the job of a professional translator radically different now than it would have been 100 yrs ago? Political change was not the only thing that caused this change... communication technology has had a big role.
Watch out this is very, very long...
Don't think about it as "automatic" translation, it's much more likely to work out as semi-automatic. I expect that the process would be something like this:
1.Run automatic converter from natural language to intermediate.
2.Have an expert in the intermediate language review the translation.
3.Run automatic converters to the target natural languages.
4.Have linguists review the output.
Compare and contrast with a "traditional" translation process:
1. Ask a translator to translate from language "A" to target "B". Ideally, the person in charge of the translation should be fluent in language A, a native speaker of B and have at least basic knowledge of the subject at hand (for instance: Open Source).
2. Ask a linguist, (ideally fluent in language A, native speaker of B, etc.) to review the translation produced at step 1.
The point is that the intermediate language should be designed to be free of the ambiguities that plague language translation.
And how exactly can you do this? Either your intermediate language is "limited" (that is to say: misses many of the subtleties of the original language), which eases step #1 but certainly introduces many errors down the line. Or, it is an "advanced" language, that is able to translate many of the finer point of your "start" language -- but then, the interesting thing is the translation engine itself. Not the intermediate language. If your translation engine is good enough to translate, say, Spanish into UNL with little/no loss of meaning, it is also good enough to translate Spanish to English with no intermediate step!! If this is true, what's the point of UNL.
Another point is, how can you be an expert in an "intermediate language"? Either the language is "human-readable", but probably produces an output compared to sludge and correcting this sludge may introduce additional errors. Not to mention the pain it represents to check on something that borders on the unreadable. Or it is machine readable -- but in that case, who is going to read it?
Final point is productivity: using UNL, computers and machine translation may take longer than a simple translation "by hand" with human grey matter. A Windoze95 machine with MS Word and some good "paper" or digital dictionaries is, in many cases, more efficient and cheaper than going through the pain of machine translation.
The hope is to minimize or eliminate step (4).
Good luck! Frankly, this has been the "Holy Grail" of machine translation ever since it started. And I do not think we are any closer. So, far, every large, international institution that I am aware of (UN, UNESCO, EU Commission, EU Parliament, NATO, IMF, etc) either use tons of translators or have standardized on a couple of languages at most (English being, of course, the "Lingua Franca"). All the large international institutions mentioned above that use machine translations ahve discovered that, even on simple subjects, the 4th step you describe above is the one that consumes the largest time.
It would be a big win if you could get to the point where all the hard stuff is done just *once* instead of repeated over and over again for all of your target languages.
Again, this is the "Holy Grail" of machine translation. I don't believe that we are any closer to it than we were, say, 30 years ago. At least not judging from the output of some of the software available out there...
And no, this will not work for poetry or humor, but there's no good way to translate poetry and humor in any case. The idea would be to get it to work with technical, legal, and business language.
Sorry to say this, but this does not work very well either for legal or technical language. It may work with Business, since PHBs are so limited intellectually =). Legal translation can be horrendous: I have translated many legal documents in the past and I can tell you there is nothing worse than that, because legal terms are incredibly complicated and old-fashioned and also since legal trivia has to be rendered in a very exact manner. Legal terminology (in almost every language) is one of the most confusing and complicated one. Plus, lawyers and legal people are a major pain in the neck when it comes to Once you get the terminology right, I agree the rest of a legal document is usually a matter of "filling the blanks". But getting the legal terms right is enough to drive you nuts.
Technical translation is another problem: I think some technical areas may be the best bet for machine translation yet. The problem, as far as the technical field is concerned, is that in fast-moving areas (computer science is one) the technical vocabulary is changing and evolving so fast it's hard to keep up pace. I read up to 5 computer magazines a week (not to mention a daily dose of Slashdot =) just to keep up-to-date with the latest evolution in language and technology. Keeping a UNL database of terms and translation could prove to be a daunting task...
>What's so special about UNL? Theoretical translation of language A into a universal language and from there to language B is almost as old as "machine" translation itself.
The fundamental argument is that it hasn't worked before so it isn't going to work now is stupid. It has been demonstrated how difficult it is to do this, but not that it is impossible.
Please note that I never said (in the sentence you quote above) that this is not going to work. I just said that, as far as I am concerned, using an "intermediate" language is old news. This may be a new and interesting idea to you, but, frankly, for someone who has worked in translation, you could very well trace back this concept all the way to Volapuk and Esperanto. And these two were invented in the 19th century.
As far as I am concerned, I think you could prove that correct translation is impossible. All you would have to prove is that a "human" language is a chaotic complex system, which usually follows unpredictable rules and has several strange attractors, inducing a runaway complexity.
Case in point: English. Roots: Saxon dialects, Norman dialects, Old English and Old French. Latin. A little bit of Greek. Maybe German and Old Dutch. Evolution influenced by French and a myriad of other languages. Now divided into several branches (US English, British English, Irish English, Australian English, Indian English, International English), all of them influencing each other and countless other languages. Reducing the English language to a set of neat little equations and computer routines is left as an exercice to the reader... =)
Please understand me: computer translation of "basic" English into UNL and from there into Chinese, French, Spanish, Italian, Japanese, etc... is no big deal. Computer translation of highly technical/scientific papers may be achieved. But even then, due to the inherent complexity of English (or any other human language), a human will have to review the machine translation and correct it.
I therefore suppose that perfect translation does not exist (or is impossible). Translation (like programming) is an art, not a science. You can have a certain number of "artistic" rules, but you cannot have a "perfect", scientifically proven, solution.
Example: give a problem to be solved to two good programmers, and they'll probably come up with two different and equally valid solutions. Which solution you pick has to be determined by other factors (speed of implementation, maintenance and evolution of the system, optimization, resources used, etc).
Give a translation to be done to two good translators and they will probably come up with two rather different and equally valid translations. Which one you pick is then determined by other factors (length of translation, speed of said translators, price of translation, style, etc). Complex systems, like languages, cannot be reduced or predicted. They can be analyzed and more or less "solved" -- the quality of the solution being dependent on many factors, such as the experience of the specialist, his choice of tools, etc. This is true even in reductive or limited systems, where, for instance, the vocabulary to used is small (see technical translation above).
Remember the butterfly in Brazil that creates a storm at the other end of the world? I suspect translation (especially multiple language translation) may well be the kind of complex system that is so hard to solve using computers.
Perfect translation, like perfect programming, is only possible in a very limited scope. A "DO
>For a good example of the total and dismal failure of machine translation,
>try translating this text into French (or Spanish, or Italian, or whatever)
>with Babelfish and back to English. Then do it a few times. Then try
>English to Chinese and back a few times. Case closed.
Hardly, Here's why that is not a valid test
1.Babelfish doesn't use an intermediate language.
2.Babelfish doesn't even achieve loseless translation from
language A to B and back to A. This is the simplest case and
one which can be improved the most with a good definition for UNL.
Answers:
1. A intermediate language should introduce even more bugs into Babelfish translation. See above.
2. "Lossless" translation is impossible. See above. Complex systems, such as human languages, cannot be reduced easily to a set of equations.
>It is, in fact, an even better AI test than the Turing test.
They do not claim perfect translation, but yes computer which could translate between languages and do it perfectly would pass the test. Do you really argue that it is impossible for computer programs to ever pass the turing test? It is only a matter of time till this happens. The only way to stop it is to stop making computers.
Actually, I thought a computer had managed to recently pass the Turing Test, or some limited version of it. Anyone out there could supply information on this one?
But: I don't think the Turing test is actually a very good AI test. There is a huge difference between a program that is able to "talk" to you (parrot back what you said) and one which is able to understand you. A computer able to understand human language would probably be the first real AI on this planet. Most Turing test software are based on some variation of Eliza, and this has been around for ages.
Here we are reading
Well, this may be surprising to you, but the work of a professional translator has not evolved very much. Computer and communication technologies have eased their task a lot. Like many other professions, translators are now able to work from home, access the Internet and its wealth of information, send documents to clients by e-mail, and even use some very clever software that ease the translation process (TM/2, Trados, etc).
Word processing, in particular, certainly is the best thing to happen to translators since sliced bread =). Also, I agree that many new translation fields have been added in the past century: biology, computer science, aerospace, etc.
But the central fact remains this: to be a translator you have to be fluent in (at least) one language, a native speaker of another, and have a good expertise in one or more field of human activity. That's it. Oh, and you have to have a certain "talent" with languages, just like you need to have a certain "talent" for programming. It's an art, remember? Even the best-trained translator is worth 0 if he/she does not have that special "talent". Exactly like a lot of people work on Linux -- but there is only one Linus Torvald. =)
We may translate faster, have more tools and information at our disposal, and produce better-looking documents -- but the core skills remain the same and the work process is exactly the same. You could train a translator today in the exact same way they were trained 100 years ago: with a pen and a piece of paper. Sorry to disappoint you, but Computer technology is not always the perfect solution it prides itself to be...
That's All Folks!
The right to offend is far more important than the right not to be offended. (Rowan Atkinson)
For a project that's supposed to allow effective communication, they could at least have designed a web site that works well in all browsers. No alt attributes for images... Sigh. Those of us using lynx just have guess, based on the image names :-(
"The invisible and the non-existent look very much alike." -- Delos B. McKown
Though at first sight the idea of translating to an intermediate language seems interesting, I can't help but note that similar projects in europe have all failed so far.
Automatic translation between languages in the EU is something that could save a lot of money. So there have been a lot of research projects funded with loads of EU money to accomplish this. All of these projects have failed (as far as I know).
This seems to be a similar effort, this time by the UN which is an equally burocratic organization. I think the goal of this project is probably too ambitious to work. Even translations between two related languages (english and german)are troublesome (babelfish for example is not exactly perfect), so I can't see why translations to an intermediate language would change things (ever tried to do that in babelfish? the result is not pretty).
So, it will probably fail and loads of money will be wasted on it.
Jilles
Several points -- for full disclosure, let me just state that I am a localization engineer, with a 5+ years of experience in software localization (read: adaptation into different languages) and a 7+ years experience in translation. If that does not makes me qualified to comment on this, I don't know what does.
Of course, I may be completely wrong and UNL may be the next best thing since sliced bread. But I doubt it.
The right to offend is far more important than the right not to be offended. (Rowan Atkinson)
For those of you who think this is impossible because of the variations between languages, Noam Chomsky has something to say to you. I was exposed to his idea back in formal languages and automata class. Basically, his argument is that we have universal grammar (UG) parser built within us when we are born. We 'hardened' the parameters to the UG to conform to our prefered language. Sorta of like guile and perl where guile is a very expressive language but perl, while express less, can express the same thing in a more consise manner.
Universal grammar is defined by Chomsky as ``the system of principles, conditions, and rules that are elements or properties of all human languages... the essence of human language'' [Chomsky, 1978].
Thus, all languages that we are accustomed, English, Arabic, Malay, Japanese, and Chinese are special cases of a universal grammar. Chomsky and subsequent linguists are looking for those common elements of all languages.
Universal grammar and the innateness hypothesis
Universal Grammar in Prolog
There are lots of discussion about this... see google.
Hasdi
It's not going to work very well. The problem is that each language has its own nuances, and in many cases these don't translate very well into other languages. I'll use Japanese honorifics as an example. The list of them is relatively long ( -san, -sama, -kun, -chan, -sensei, -wa, and others). Simply by attaching one to the end of a person's name, I can make the same sentence express immoderate flattery or extreme derision. This can be translated in an extremely limited fashion to romance languages such as Spanish or French (by using familiar vs. formal form of address, but it's still limited). It doesn't translate into English at all (this is why I prefer subtitled anime; get the general meaning from the subtitles, and actually listen to the Japanese for the nuances). And, of course, you still have the problem of inflection not translating very well into written words. This makes English particularly unsuitable for network communications, actually, since so much meaning is left to inflection. What's the solution? I don't know. There probably isn't one. Even Esperanto isn't immune to this problem of losing meanings in translation. I don't think a "universal meta-language" is going to work, though.