A Universal Networking Language for the Internet?
Anonymous Coward writes: "The United Nations University is developing a
Universal Networking Language for the Internet, which is designed to allow effective communication between people writing in their native languages, with automatic conversion through an intermediate Meta-language (perhaps a precursor to Star Trek's Universal Translator.)
They will be holding a symposium on the technology on 18 November in Brussels, Belgium, where they will publicly announce their achievement. They claim that the initial stage of UNL will support 16 languages: Arabic, Chinese, English, French, Russian,
Spanish, German, Hindi, Italian, Indonesian, Japanese, Latvian, Mongol, Portuguese, Swahili and Thai." An interesting idea, but this is one of those "the devil is in the details" things. It'll be interesting to see how/if this can work.
Of course both of you are forgetting an important thing. The example you are using is from a song, and the lyrics arent exactly the best to use for examples. In this case the english translation is: My hat, it has 3 corners. because thats the way the song goes.
Actually, as long as the final interpretation is done by a human rather than by a computer, some parts of the understanding can be let fly.
OTOH, a transformational grammar has not yet been shown to be powerful enough (at least I haven't heard that it has). I think that one would require a complete ATN network with recursion. Bounded recursion would probably be sufficient, as I don't feel that folk understand more than about three layers. Certainly it only goes deeper as a stylistic perversion of normal syntax (but fashion can do strange things).
A worse problem is divergent mappings. No language uses an atomic view of the world, so each concept in each language is a set of items selected from the universe of possible concepts. This can be noticed even within a single language when moving from one dialect to another. It is most easily noticed when discussing things that map readily onto sensory images, e.g., "What is your name for the color of the object?", but it exists in all aspects of lanugage. (What is the difference between "dog" and "hound"?) When one translates the term "black hole" into Russian, I am told, one must use a different term, because in Russian a "black hole" is something specific which is not astronomical (not sure what, but it was taboo).
Now this is mainly something that can be handled by a lot of detail work. But I mean a lot of detail work. To get a very mild idea of a part of what I am talking about, pull out an unabridged dictionary and open it to a random page of definitions. Each meaning listed will probably need to be a separate term in the meta language. And that's just the distinctions that an english speaker would notice.
I think we've pushed this "anyone can grow up to be president" thing too far.
Let me add to this.
The vodka is strong, but the meat is raw.
Ok. In Russia, Russian is used for:
1) Financial Markets
2) Aviation
3) Scientific Publication
4) Popular Culture
5) The computer industry ( along with English)
6) Everything else that matters
:)
There's no real technical information on the website, and no evidence at all that a linguist is actually participating in this project. It sounds like a bunch of computer scientists who think they understand language.
Actually, the only real data they offer suggests that they are recreating the work Anna Wierzbicka was doing in the 80's with her ad-hoc theory of semantics. She ultimately showed why it wouldn't work, and now criticises the idea of using controlled language at all for machine understanding.
No, these people don't seem to have any idea what they've gotten themselves into. This kind of thing was what I did graduate work on. Controlled language is a useful idea, but a very limited one, and using pivot languages for translation will only take you about as far as Systrans' system (the one used in Babelfish.)
There are much more sophisticated efforts going on elsewhere, and even those are getting bogged down in the ugly reality of natural language. This will languish and go nowhere. With some luck, some more realistic project, like some of the automatic text summary projects and natural language to knowlege base projects will eventually produce a usable product, but this UN university effort sounds like a waste of time.
What's evil about these projects, of course, is that they don't let people just talk to one another. It would be neat to be able to have access to the literuture of other countries, but that pales in comparison to having access to the people in other countries. If you just learn Esperanto you can really converse with people without needing technology or anything. It just works.
Uh, does that mean the end of the world? Didn't the creators of the tower of babel get smitten or something? I remember something about god not being happy so he did something and destroyed the tower...
Why would anyone want their web page to read as if it's been run through a bablefish? A translation from netspeak into, say, English is always going to suffer some mangling, and most likely is not going to allow idiom, metaphor, etc.
/. reads like bablefish crap is the day I find myself an English web site.
Machine translation will improve, but the best oranization is still going to be browser or proxy based translation. If that translation package internally uses an intermediate semantic representation, then fine, but the day
You have to admire the democratic thinking though (NOT!) - rather than just foreigners seeing your web page as crap, you can (must) see it that way too! Designed by politicians, no doubt.
Ideally the system would allow finely crafted (hopefully even poetic, which often *requires* ambiguity) sentences in the author's native tounge. The UNL meta language could have drop down lists under each ambiguous word/phrase prompting the author to further clarify exactly what they meant, so it could be translated into all the other languages with the meaning intact.
Obviously all subtlety, poetry, aliteration, etc. will be lost.
While I don't believe myths and such, it is rather scary how it matches the story of babel. Since we can't be scattered to the corners of the earth, what'll happen?
http://logos.uoregon.edu/polyphonia/babel.html
Don't you know who the real semites are?
Jack Vance wrote a book, The Languages of Pao, which explores the idea of using created languages for social engineering-i.e., create a language with few words for compassion and many for violence, take some children, put them in an isolated environment, encourage certain behaviors like competitiveness, and have them grow up to be warriors. It's an interesting concept. I wonder how the UN's universal language will address cultural nuances (don't include capability to translate violent concepts).
I don't know German, but just for grins I ran "Gemütlichkeit" through Babelfish and came up with "cosiness".
Is this a cozy approximation?
Hey, moron! You misspelled endlessly. If you can't type a word as simple as that, head back to segfault or alt.moron.mindless.rambling!
BTW, whatever window manager you use blows, unless of course you're not on Linux, in which case you blow!
Esperanto wasn't aborted, it's alive & well today. The estimated number of speakers ranges widely from around 100,000 to 20 million, however, the most realistic estimate is around 2 million speakers today. Most people think the idea of a common & neutral second language is a good idea, it's just most haven't heard of esperanto. I hadn't until about a year ago, and I suspect many many more will hear about it because of the internet. It's a very easy language to learn, much easier than Russian, which I'm also learning at the present.
IDNS.org has a spec for non-ASCII domain names. They have a modified version of Bind available for download.
Getting this adopted universally is nontrivial.
> Didn't the creators of the tower of babel
> get smitten or something?
Well, "smited", perhaps.
I think "smitten" has a _slightly_ different meaning there...
Cheers,
If the nth-generation of babelfish can get it mostly right, then an intermediate language of some sort is a must. They want to support all 185 member languages of the UN and allow others to be supported as well. That's a pretty large matrix! It's unlikely that resources would be dedicated to a straight y Gymraeg-to-Euskara translator, much less Tagalog-to-Inuktitut.
But that intermediate step means the process had better be good, as can be seen by using Babelfish to translate from language A to B and back to A again. I presume that UNL, to fill the role of language B, will be designed to facilitate getting it right.
Forget support for Esperanto -- just use Esperanto as the intermediary language it was designed to be. Somehow I don't think encouraging people to include support for ISO 8859-3 in operating systems, browsers, etc. is going to be any less difficult than making allowances for bi-directional text in any of a number of character sets, to say nothing of language nuances (quick, how would you translate "Gemütlichkeit" into anything but German?). Esperanto is not that hard to learn, even for non-Indo-European-language speakers (there have been, and presumably still are, significant Esperanto movements in Japan and China, for example). The grammar can be grasped in about 30 minutes and you can carry the essential vocabulary around in your wallet.
I know, I know, people are going to come up with reasons not to use Esperanto. But it seems like if a solution that will work exists, why not use it?
(Note: Even though I like and occasionally use Esperanto, I would welcome use of a similar language like Interlingua or Latino sine Flexione that would be equally easy to learn and do the job just as well.)
--
Iun vi konfidas, kun ni li alig^as.
--
Someone you trust is one of us.
Chomsky revises everything he thinks every 10 years or so. The existence of a universal grammar of the type Chomsky currently advocates (and it is by no means clear that this is true) still doesn't necessarily mean that we can construct a common, useable language for everyone. Remember, every language used in the world is one of those "special cases."
Chomsky claims (despite evidence to the contrary) that syntax can be analysed apart of semantics, implying that if we could agree to a universal word list and definitions, it might be possible to devise an equally neutral grammar to use for machine translation. However, it is quite clear that words, even pseudosynonyms, don't mean the same thing in different languages.
My inclination is that Chomsky is just plain wrong about it in the first place: that there is no universal underlying order of constituents, but rather that human language structure are restrained to a subset of all valid ways of organising information linearly, and that those constraints are biological.
This means that any real machine translation requires us first to make real progress in understanding how humans process and store linguistic information. This field is in its infancy.
This is all well and good until somone in the UN declares that isn't a word anymore...
I'm surprised nobody's mentioned Hofstadter yet; he had a pretty good translation of Jabberwocky into German and French. Should you translate "Campbell's Soup" to "Borscht"? "Jakobstrasse" to "Jacob Street"? Why bother translating Dickens; just read Dostoyevsky!
Not only is point one completely and utterly impossible for reasons well discussed here already (slang, local expressions, evolvement of languages etc.), point two actually contradicts point one! They want UNL to be an exact representation of the meaning expressed in the native language, while simultaneously having it to be generic enough so everybody (or at least all "enconverter" developers) can understand what is being said. Assuming the average "enconverter" developer will be as technically (il)literate as the authors of this document, there's no way they are going to understand what technical people are talking about even when using his native language. No way is UNL going to help with that. So how, then, is he going to understand that very same conversation translated from a language he doesn't understand in the first place? Forget it!
Nice idea. Store it in the bin with all the other equally nice ideas: "Health and food for all" and "Can't we all just get along?".
Akatosh dun said:
It's very interesting that you bring that up. Idioms can be a bear to translate at times, much less cultural references (even from English to Spanish and back--in many fansubbed animes, the fansubbers have to include a section at the beginning for cultural references and idioms that Americans wouldn't necessarily get but Japanese audiences would). Not only that, some concepts do not translate clearly across languages (I actually find it easier to think of the Japanese concept of honour in terms of the Tao or the Dine' {Navaho} concept of the Path of Beauty than in English!).
A really good shot of how translation can require translating idioms and noting cultural reference is the discussion of the upcoming American release of "Mononoke Hime"/"Princess Mononoke" (click here for the gory details :). Neil Gaiman is translating for the dub, and apparently there were multiple major issues in translating it including:
The fact the entire dialogue in the movie is not in modern Japanese but in an archaic form (roughly akin to Middle English or the old form of English used in the King James Bible)
A mess of cultural references that Americans would not be aware of (such as one of the main characters cutting his hair--in Japan this is recognised that a warrior is leaving forever and to be among the dead)
A number of idiomatic phrases that had to be translated into American idioms (such as a comment that a character's soup tasted like water--which is about as low as one can go to insult one's cooking...this ended up being retranslated into "Your soup tastes like piss" which is more understandable to silly gaijin :).
Needless to say, it was quite illuminating...especially since some cultural references were noted that I didn't pick up on the first time I saw it (I've seen the fansubbed version) and I'm an otaku. Apparently Gaiman has rewritten the script explaining some stuff that American audiences wouldn't catch, either...and to be honest (IMHO) Gaiman is probably one of the few people who could've pulled it off.
Another really good example of this is the first tape of the anime "Compiler"--which was dubbed, but they STILL had to explain at the end why a giant Colonel Sanders turned into a Japanese baseball player and defeated a mad statue :) (Basically...Roy Bass won the Japanese equivalent of the World Series for the Tigers...the celebrating fans grabbed a statue of Colonel Sanders from a KFC, it being the only Anglo-looking statue that could be found, and threw it into the sea...they have not won the pennant since, and legend goes that some say the town will not win the pennant until the statue of Colonel Sanders is retrieved because the sea gods are pissed. :) Neat story, but not one most Americans would get...then again, the Japanese wouldn't get why octopi are often thrown at Detriot games if they get in the Stanley Cup :)
-Windigo The Feral (NYAR!)
Really? Bummer. Shaka, when the walls fell.
Of all the languages of the world there are three that clearly have great bodies of literature - Sanskrit, Greek, and yes, English.
Hmm, I assume that there was an implyed 'only' in there. I have read a few Chinese authors and poets who would very strongly disagree with you, my Eurocentric friend. In fact I would venture to suggest that the body of literature in Chinese is substantially greater than in either Sanskrit or Greek, although I freely admit that I have absolutely no facts whatsoever.
Anyone got any figures?
- "I never could learn to drink that blood and call it wine" - Bob Dylan (Tight Connection to my Heart)
The first phase is to support the handful of official languages of the U.N., so it would be whichever Chinese is in that group.
And it will probably take the UN 42 years to provide the first draft specs.
In any case, sounds like a worthy effort.
It's definitely complex and inconsistent, but the point was that it's explicit--eg, instead of inflection you use prepositions and auxiliary verbs--and to that extent superior to many languages for scientific purposes. (Incidentally, Whitehead was a logician/mathematician, arguably the greatest of this century; I feel inclined to believe him when he says English offers an advantage in his own field.) You're absolutely correct, but I don't think that invalidates his statement.
(And many people would argue that the body of English literature is no greater than that of, for example, German or Japanese.)
(I'm just thinking online here. I don't even know many spoken languages, but many of my Asian friends have spent long hours telling me how terrible English is.)
I'm not sure English is the proper starting point for this type of a machine-read hyper-language. English is primarily a spoken language, with all the fuzziness that implies.
What may be more appropriatte would be to start with written Chinese. From what I undserstand, "Chinese" is already something of a hyper-language, with one written language expressing several spoken languages. Modify the set of ideagrams to include some phonetic symbols (to properly represent the many names that are best represented as sounds). Ideally the syntax would allow for defining custom linguistic symbols, much like XML's ability to define custom tags. Tweak the hell out of this until you have a machine readable language (do less than 2^16 standard "words" seem adequate? Should this blow unicode out of te water and use 32-bit "words"?)
I'm hedging my bets it will be fish shaped, and will fit into the inner ear.
-=-=-=-=-
-=-=-=-=-
My mom's going to kick you in the face!
Esperanto was not invented for that purpose. Esperanto's purpose is to be the one foreign language everybody in the world would study. That way any two people would have a spoken and written language they could use to communicate.
Esperanto is alive and well on the Net. Use your favorite search engine to find links. Here are some:
Marko
We've already got English.
Now, the Academy Francais may not like it, but English is already the language of:
1) Financial Markets
2) Aviation
3) Scientific Publication
4) Popular culture
5) The computer industry
6) Everything else that matters.
English is the new Latin: Deal with it.
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
Well, part of the problem is that lojban really doesn't resemble anything so much as an explosion in a type factory. Esperanto and Interlingua at least have the occasional Latin or Greek root that's worked itself into worldwide usage.
I guess the problem is that it's difficult to adapt a computer-friendly language to humans, or a human-friendly language to computers. But like teaching a computer to play chess, that doesn't mean it isn't worthwhile.
--
Someone you trust is one of us.
Is this what happens inside the head of a bi-lingual person? (This is posed as a question to any readers who might be)
Excellent post. Wish I had moderator points today, so I could move it up! Esperanto just works.
I have come to believe that, in the human brain, the language center is tied somehow to the emotions, because people start acting irrationally whenever you start suggesting language alternatives. It's like asking them to change sexual orientation or something--their language is too strongly tied into their concept of personal identity to permit approach. So in an open forum, I seldom see anyone who is not already an Esperantist discuss the language objectively. Sad, really.
But hope springs eternal. I post this URL every time, in hopes that it may someday be of interest to someone: If you are interested in Esperanto, the world's most popular constructed language, try the Esperanto.net web site for starters.
As for the UNL, most Esperantists have been aware of it for some time. We wish them well, most of us, really we do. But most people who know more than one human language hold limited hope for such a project's success.
Trust me, no linguist will use this. It would be like getting a perl user to switch to TCL - they would carp for years about all the things they can't do the way they want to, assuming they can even do all the things they want.
Other types of tech will probably steer just as clear of it when they realise how frustrating it is to compose for an artificial semantically unambiguous language.
It's called English! It has been for a while now. Why are we pretending that English isn't quickly becoming the world's universal language?
I think I would be more inclined to agree with you if you were arguing that we should use English because other languages have a large number of words borrowed from it.
Hamish
"Wise men talk because they have something to say; fools, because they have to say something" - Plato
Now we can start work on that Tower of Babel again. :-)
human://billy.j.mabray/
human://billy.j.mabray/
"Every good system has a backup." -- Dale Hanchey
In fact, I speak it myself.
BTW, your linux distribution probably contains an Esperanto-HOWTO. And the GNU translation project has an Esperanto team. Plus KDE has Esperanto as one of the out-of-the-box languages.
Marko
it will allow more and longer flamewars than anything else since the invention of SNTP!
(with a nod to Douglas Adams)
But still a very cool idea!
-- IANAEG - I am not an elder god.
I agree, but I'd like to toss this goody in: AN Whitehead's Science and the Modern World included a section about language (as an analogy for mathematics, IIRC). One of the points he made was that while English is a shallow language, even compared to other Germanic languages, it makes up for it, in some ways, by being utterly explicit. Nothing is implied or masked by, eg. inflection; the entire language open, simple, and, to some extent, precise. (That a bit of an exaggeration, of course.) I believe his argument was that that made English a superior language for science, where ambiguity is a Bad Thing, but I can see who it could be extended to this, in the form of using English as the lowest-level of the metalanguage, then building protocols for the other languages on top, in a hierarchy of language features.
Of course, this would probably ruin the entire project, but I'm not very confident that it will succeed anyway.
The concept is nice, but you're still stuck with the problem that most languages are based on anacdotal references as well as accual words. You can translate the words, but the concepts will still frequently be lost.
Two mistakes in the above:
(1) Not every language has every tense. German has fewer tenses than english, and another poster said that Chinese has none.
(2) Language can't be described in A BNF grammar: it isn't sophisticated enough to capture singular vs. plural, gender, case, verb declensions etc. Phrase structure grammar extend BNF grammar s with parameters to capture these, and Chomsky showed that these are sufficnet to capture all of natural language.
I would guess that the meta-language design is based upon transformational grammar, which exposes the essential similarities between sentences like `The door is closed' and `Close the door!'. This would allow it to express subtleties like different ways of representing the same sentence.
We already have a universal networking language:
Gimme warez d00d, I am 31337!
At one level the real barrier to universal language translation is machine recognition of human languages. By this I mean the comprehension of what is being said. In order for linguistic comprehension to take place, general comprehension must first take place (thus the earlier post about the creation of a HAL-9000 like computer is not too far off base). Otherwise, your universal translator will choke on statements like: "He saw that gas can explode". This could mean that gas has the ability to explode, or that an object (a can of gas) exploded. In other languages, this double meaning doesn't apply and you have to use one of two possible sentences, depending upon your meaning. Since the translator wouldn't "intuitively" know your meaning, it would have to figure it out from context, which would ultimately require general comprehension.
:^)
I suppose the UN could construct a meta-langauge that is free from all such idiosyncracies (maybe based off of Esperanto?). If you assume that is possible, then the translation from the universal language to any of the local languages would be a direct map. This would require the universal language to restrict all words to a single meaning; otherwise it's possible you'll end up with context-based problems again. Also, such a restricition could resolve issues such as inflection: for example, the universal language would treat the Japanese syllable 'ka', which changes meaning depending on inflection, as two or more separate "words". While such a language would be extremely large, the translation from the universal language to the local languages would always work correctly. But, no matter how easy it is to translate from, I still think the translation to will require significant advances in artificial intelligence. It would be easier for us to all learn the universal language, but if we all did, then there wouldn't be a need to translate to the local ones, now would there??
BTW: In Russian "black hole" - "chernaya dyra" means astronomical "black hole".
Yeah, but then how are we supposed to access these sites with a 'western' keyboard? It's not that i've got nothing against the idea. If it cost no extra to say register domain using roman/asian/russian characters, then sure, no problem.
I don't suffer from insanity. I *enjoy* it!
to quote George Carlin :)
:)
If I remember my German, you're talking about something like
Mein Hut, der hat drei Ecken
where "der" referes to "Hut". That's something that will have to be covered in the rules both for translation into and translation out of German, no matter what language you're using to go into or out of German. Otherwise you end up with the English translation being
My hat, the has three corners
where a proper English translation would of course be
My hat has three corners
Of course this is a very simplified example, but I think you get the idea.
I just think that for the foreseeable future (and, since this is computing, that could be, oh, say, six months) the best computers on the planet are the ones we carry around in our skulls. To me it would make more sense to have a single language that everyone would agree on, but then the problem is to agree on the language. All of the "evolved" languages carry their own cultural baggage, and few people seem to think that a "constructed" language is up to the task, even though certainly Esperanto and possibly Interlingua and a couple of others have proven that hypothesis wrong.
Of course just outside the foreseeable future everybody will be speaking Bocci anyway, so what the heck.
--
Someone you trust is one of us.
ok god knows why im typing this, there are 300 comments already. thanks to 'selling out' slashdot is a meaningless cacophony of garbage. well im very happy to add my little shit ball to the heaping steamy pile. they have included only certain languages. this is crap. it should be designed generically so that it can support any language. languages change. anyways, they probably left out braille. who cares about those stupid blind people anyways. euler was blind but he wasnt too important. when was the last time u used e? gimme a break! they should all be killed, when they are babies, viva la eugenics, social darwinism is your pal.
Belgium is fast becoming the Mecca for speech and language technology with players like Lernout & Hauspie and projects like Flanders Language Valley.
All of europe really needs these kinds of technologies, but Belgium is one of the more multilingual countries within Europe.
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
Since it seems related, I've had a dream open source project in mind for some time. Not so much along the lines of UNL as Babelfish. I think this is the perfect project for the open source model because people from around the world could contribute work relevant to their own languages. A propreietry project would have to employ many specialists.
If anybody is interested in starting such a project, please reply in this thread.
dos/tres equis
Douglass Hofstadter has suggested that the Problem of language translation is the only real difficulty in producing an AI. =) For A number of years I have been working on a universal language project. It can be found at users.erols.com/alangrimes/ =) I hope you find that link interesting if not particularly usefull. =\ Please flame me by mail for taking up your time. I am Alonzo The Great ( alangrimes@starpower.net ) my old account got hosed somehow and I'm far too lazy to bother to fix it.
"First of all, I do not really believe the UN can produce anything remotely interesting, technically speaking. I like the IETF motto: "we believe in rough consensus, and working code". Show me the money^H^H^H^H^Hcode first, please. What's so special about UNL? Theoretical translation of language A into a universal language and from there to language B is almost as old as "machine" translation itself. As far as I remember, early EU research into machine translation were based on a similar idea -- and they were dismissed as a failure. For a good example of the total and dismal failure of machine translation, try translating this text into"
I did.
English -> French
French -> English
English -> German
German -> English
English -> Italian
Italian -> English
English -> Spanish
Spanish -> English
English -> Portuguese
Portuguese -> English
The end result???
In the first place of all, crío to really distant distant of interest who the O.N.U all can produce and not point out technician. I have the taste of the modernity of the IETF:" we create in the agreement approached and the operation bases it ". the champions money^H^H^H^H^Hcode in the beginning, please. Which is therefore extreme special UNL? The theoretical translation of the language to the inside to a universal language and with of the language B is nearly therefore old here how much the translation " of the machine ". Like the memory, IT CREDITS the first jambs of capturing in the automatic translation it has been based on a similar idea -- and has been isolated like the landslide. For entire a landslide and it good slaughter houses of the example that the automatic translation, manages, in
censorship is a form of noise, which actively seeks to drown out content with silence - Crash Culligan
You're right, but any decent "universal translator" will not stop at translating individual words. Its dictionary would extend to phrases of the sort you mention. Perhaps it would define "stand at window" as "stand .1m - 1m away from window, while normal vector from plane of body intersects window." Regardless, it would be quite a chore to accomplish this. Context is everything. The more subtle the meaning, the more context you need For example, if I said "That's really smart", you don't know if I'm being complimentary, self-deprecating, ironic, or insulting.
French is a dead language, one whose speakers stubbornly refuse to admit it, and one whose primary nation tightly controls it.
Plus, all those frogs like those nifty blue berets.
So... ribbit?
--Corey
Not only will they not deserve liberty or safety, Mr. Franklin, they will be DENIED both!
Here is an example so you can have a better feeling of what it's like:
So, this a two-sentences, one-paragraph text.
The first sentence has an agent (the team) who won something in the past, and an object (the match) which was won: "The team won the match".
The second sentence has an agent (the player, who is male) who broke something, an agent (the leg) which was broken, and modifiers which specify that this leg is that player's own left leg: "The player broke his left leg."
--
"Show me the code" -- Linus.
This problem would map into a modern compiler architecture. The compiler architecture has mutltiple front-ends, languages, and multiple back-ends, machine architectures, bound in the middle by an intermediate, but heavily simplified language. The idea is that a front-end parses and type checks the input and then outputs intermediate language. This can then be fed into any back-end built for a particular architecture.
For example, if you have front ends for C and fortran and backends for PPC and i386, then you can compile fortran programs for PPC or i386 and also C programs for PPC or i386. Any combination. Add another backend, say MIPS and with no extra work, C and fortran compiling are possible.
When dealing with natural languages, you would need a front-end and a back-end for each language.
There are a number of catches, here are a few:
Bottom line, of course a universal translator is possible, but until we discover BabbleFish or the brainwave reading equivalent (would reading brainwaves be enough, would all species "think" alike?), there will be plenty of input restrictions. Afterall, somethings just don't translate. Because of these restrictions, it will be infuriating and impractical to use.
Well, how do they connive to impress their girlfriends then? Or should I say girlfriend-girlfriend?
Hates people who have stupid little sigs
--Seen
"I used to be a dilettante. Then I thought I'd try something else for a while."
It would be most likely that the Meta-Language would only be able to handle a small subset of the meanings available in any of it's natural language counter-parts. The subset of "Meta-meanings" would be the set of all common meanings between all the languages.
// Zarf //
Ideas like run, walk, buy, sell, ect. would easily translate... however things like "glark", "glob", "grep" may not translate accurately. That is the 6 russian verb forms you mentioned may all be mapped directly to only 3 verbal meanings.
ie: grep to look, glob to list, and glark to understand... and so on.
The resulting word elements could then be arranged by a simple pattern-matching AI into an acceptable form. The result is a valid natural-language sentance which has some shadow of the original meaning. In practice this could allow for useful bussiness communication but prevent discussions of abstract ideas.
Yet another fine example of how problem-domain-"scoping" affects over-all software functionality.
-
[signature]
If you run Universal Networking Language through their coder-decoder thing, it comes out as "Colossal waste of resources and money". In other words, Microsoft.
Hates people who have stupid little sigs
A friend of mine had worked in this project. He said to me that the system has got a lot of limits. But creating good relative simple docs (like tech docs or bussines docs) is so useful, its'n it?
Who needs UNL when we have babelfish? ;)
-Dave
Dear friend,
Chinese is derived from Sanskrit.
Thank you
Any people wonder why the US doesn't like to pay its dues to the UN.... Also, UNL is probably not in a similar vien to Esperanto; in fact, it might not even be a 'real' language at all. It could just be a series of symbols with no phonetic representation. It will only exist as an intermediary language that never actually gets surfaced to the user. And on other thing: didn't that wingnut Noam Chomsky try and fail to come up with a universal language?
Here's how it might work:
This has the disadvantage that you lose some flexibility, subtlety and art in your writing, but you decided to give that up when you decided to go multilingual, right?
The point is that if you write text specifically so that can go to one foreign language and back smoothly, it's probably pretty translatable to many languages, I'm guessing.
You can try this now with Babelfish. Take a passage of text you wrote in English, convert it to something (e.g. French) and back. Then edit the original until the English that comes back is decent. This will force you to remove colloquialisms and force you to work around deficiencies in the translation program, but isn't this worth it for a good translateable piece of text?
Final note: We have all seen Babelfish make funny translations. There will always be some words/phrases that software cannot translate perfectly without AI. But certainly, we are all smart enough to craft text that software can translate well! As the software gets better, we can put less and less effort into this.
I'm not sure the internet is going to do Esperanto any favours, considering that anyone who uses the Net in a regular basis is likely to already speak English (which, to all intents and purposes is the Lingua Franca for at least the next century).
I'm all for the idea of composite metalanguages, for computers, but I don't see why anyone should cripple a metalanguage so people can use it too.
"Be nice, veer left, and never stop thinking" Iain Banks - Walking On Glass
OK according to the web site, the UNL aka "the meta language" Will be based off of english with a means for defining new words, as long as you can provide a word in your original language, AND locate it in the conceptual hierarchy. The most effective step they could take at this point, to increase the propagation would be to come up with an XML dtd, for UNL dictionary entries, and conversion/deconversion mappings. BTW it looks like the web site was produced using UNL technology, and it's not too bad, not as good as a native speaker with strong rhetorical skills but sufficient to carry technical and commercial traffic. The one thing it probably won't be very good at is translating persuasive text meant to convince people. Not such a great loss.
Is that the same tribe whose numbering system consists of "one", "two" and "many"? :)
"Be nice, veer left, and never stop thinking" Iain Banks - Walking On Glass
Since when did Mongol become one of the world's major languages? Half the people in Mongolia are nomads, besides! Thats like Al Gore's suggestion to bring the internet to Africa, to help the people who don't have electricity & running water. weird
Check them out: http://www.rebol.com
Nah, English is piss-easy for Spanish speakers too, if they put half a brain cell to work on it.
All it takes is learning a lot of words and working on your pronunciation, but English grammar is absurdly easy compared to that of most Romance languages, and can be learnt in an afternoon.
I'd say French is harder for Spanish speakers than English, precisely because it has a complex Romance grammar, and a fucked up pronunciation to boot.
"Be nice, veer left, and never stop thinking" Iain Banks - Walking On Glass
This project is doomed to hell. I will tell you why if you listen intently.
The Boob Factor.
That's right. The Boob Factor hasn't been addressed. None of these meta-langauges or intermediate langauges have addressed this important topic.
What is the Boob Factor, you ask? Quite simply, the Boob Factor is you, it is us. We are the Boobs.
Meta-languages or intermediate langauges, we will assume, work on well known grammatical and linguistic rules. In order to function correctly, these rules must be adhered to flawlessly.
Let us examine the following statement:
I like red meet.
You the reader have been blessed to have a couple of ounces of grey matter resting on your more than likely underdeveloped shoulders. You have the ability to infer the offending Boob's meaning in this sentence. Do you place faith in a meta-langauge or intermediate langauge to do the same? I don't think so. The Boob Factor has reared its ugly little head.
We should all sit back and wait until God has reversed his Babel of Confusion mayhem that he inflicted upon us in a drunken stupor. We can then all go back to speaking tongues in the master language of Sumerian. Oh, the joy for that day.
Hates people who have stupid little sigs
My first reaction on seeing this story was, "Wow, what a cool idea!" I'd love to work on designing this language. (And it is possible. All of the objections that I have seen in these threads can be resolved if you understand modern linguistics. Take a course, it's worth it.)
However... Do we really need a new metalanguage? Couldn't we just as easily use an existing language as the intermediate form? It would be just as easy to translate, and you wouldn't have to learn a new language to understand the system.
There's an idea in linguistics which is similar to the Church-Turing theorem in philosophy, although it's not as well established: Every modern language is assumed to have equivalent expressivity. If you wanted, you could translate from English to Chinese using an Aborigine language as your intermediate without any problems. (Except deficiencies in vocabulary, but it's easy to make up new words.)
I suspect the real need for this meta-language has to do with this project's association with the UN: They don't want to offend any ethnic group by chosing an existing lanugage as the standard.
MSK
They should translate from the source language, through a mediator, into a middle language representation, and prove that anything in the middle language representation can be translated into the destination language.
.
The middle representation language would have temporal, spatial, and objective information, as well as references, and changes in these "dimensions". I'm sure there's a more formal spec for language in general.. I have no experience in this area.
The mediator would detect concepts that could be vague (i.e., they have multiple reference levels), and provide alternatives that the client would pick from (that are already represented in the middle language). The mediator could then learn context cues from this, but it probably wouldn't be a good idea to use them until after a ton of experience is accrued in this respect.
The provided alternatives would be created by "programmers" in this metalanguage who also speak the source language themselves. This is a HUGE programming project..
When converting from the metalanguage into the destination language, the metalanguage would be compacted into phrases in the dest. language that semantically match what the metalanguage is saying. The annoying thing (and the reason it works) here is that if a figure of speech in the source language is really a large branch of references to other things that are completely foreign to a speaker in the destination language, the simple source phrase could blow up to a large
dissertation on the concept being described in the destination language.
source->mediator->metalangauge->compactor->dest
Would it work? I don't know. But it sounds like it could..
--
sean_dunn1@yahoo.com
i have wanted to make a "universal language" for as long as i can remember, but i still have not found the time ...
anyway, the real reason that esperanto is not successful is that it still has stupid rules -- for example, nouns still have gender which means that there are still too many pronouns and you still can not complete a sentence without knowing the gender of the subject.
not to say that esperanto is bad, but we all know that esperanto is just spanish V2.0 and no one will admit it.
a truly universal language must be written from scratch with all of the "fluff" removed. people say that you will lose the poetic qualities and you will lose the innuendo and colloqualism -- i say, that these people are pathetic whiners who are trained to be cynical of anything which could be considered progress. "poetic quality" and innuendo has _nothing_ to do with the language which it is written in. you might prefer german opera to italian, but it does not make one any "better" than the other. either way, you could still write your poetry in the language of your choice -- and, thanks to the UNL, people will still know what youre talking about.
the fact of the matter is, effective worldwide communication is a much more serious matter than an old-fashioned idea of what is "good" poetry. poetry will persist so long as there are good poets; we do not need to acomodate them with prissy, "romantic" languages. this just makes it easier for unskilled drunks to make more sappy, bad poetry.
as they say, you have to crack some eggs to make an omelette. i say thanks to UNU for cracking some eggs, and to everyone that thinks they can improve upon any of the current languages, please stop picking eggs out of the trash.
-abf.
-abf.
Babelfish isn't great. We all know this. However babelfish doesn't use an intermediate language, and no, French is not an intermediate language. The idea behind an intermediate language is that you can have groups working very hard to get their language to translate into the intermediate language. Then, by doing that, their language can be translated into every other language that the intermediate language supports. If you worked for as long as it would take to translate your language into 10 different languages on only one language, you'd come out with a pretty good translation. An intermediate language would also have the advantage of being able to be optimized to be translated into and especially translated from.
1. No articles? You gotta be kidding!
2. One tense is not necessarily true. We have other ways to explain the tense in our sentences.
I don't wan't to explain much about Bahasa Indonesia, since this is not a linguistic site.
But, as a person who has involved in several computational linguistic projects in Bahasa Indonesia.. we do have our own difficulties to deal with.
One of it is the verb-formation which is very flexible. This makes the stemming algorithm works harder for Bahasa Indonesia than other languages.
regards,
The Doc
I, for one, find I can get my idea across in Spanish a lot more often than I can understand what a native Spanish speaker is trying to tell me.
dos/tres equis
This is a bizarre urban myth in some circles. Sanskrit is absolutely not a good computer language. It is wildly irregular, has a great many verb forms and numerous noun declensions.
Further, in English, when when you say "and Bob" quickly you often get something like "am Bob." Sanskrit did the same. The awful part is that they wrote these word boundry changes (called sandhi). This is very hard for people... I can't imagine computers will find it any less ambiguous.
Sanskrit is about as logical/regular as Latin or Greek. Which is to say, as much so as any natural language.
It appears that the people at UNU who discussed this idea of UNL didn't bother to talk to anyone trained in modern linguistic theory. (I don't claim to be trained, but I have more than a passing familiarity with linguistics, *and* I read /., so...)
/semantics/ of a language). This unfortunately isn't usually the case, since many languages like to put different sentence constituents in different locations within a typical sentence structure. English, f'rinstance, is said to have an SVO (Subject Verb Object) word order. That is, the Subject of a sentence will be the first major constituent in a sentence, followed by the Verb, then the Object constituent. This isn't general, however. In the case of a question, the word order of an English sentence often (but not always!) changes to a VSO (Verb Subject Object) form. Other languages use completely different word orders. Japanese, IIRC has a word order approximating OSV (could be wrong). That doesn't even consider the lower levels of syntax, where one discusses what's known as "X-bar theory". X-bar theory uses representations of constituent phrases connected in various manners to develop a phrase structure tree that represents the syntax of a particular sentence or phrase. Thus a noun phrase (NP) has two branches from it, one being a specifier (Spec), the other being the intermediate projection of the NP (N', read "n-bar" for hysterical raisins). N' in turn projects a complement (Comp) and a noun (N). Syntax in one language, say English, will project the Spec to the left of NP, and the Comp to the right of N'. Thus the noun N is in the middle of the NP. This isn't true for all languages, they are free to choose whatever branching order they wish to have (dependent on certain Parameters which define particular instances of Principles, which I won't get in to).
;-). The syntax of the language (the part you wrote in lex) is usually a bit simpler than the semantics (the part you wrote in yacc), if you examine the respective sources for complexity. Now consider the fact that for *any* human language the complexity of both of these tasks is exponentially (perhaps even factorially) more difficult. Since semantics is at least an order of magnitude more complex than syntax with respect to computer languages, one could imagine how bloody awful complex this is for a human language. Now consider that to make a translation requires *complete* semantic comprehension of both the source and target languages -- translation is not a simple word-for-word lookup table (and I'm glad it isn't -- we wouldn't have much expressibility and I wouldn't be able to write this if it were).
The first major problem that they will have is defining a syntax for the language. That's not so tough if you just define an arbitrary syntax and leave it at that. But I suspect that they will try hard to design a syntax that distills the most popular aspects of each of the languages that they're translating from, thus getting stuck in a linguistic tar-pit from which they will never escape. I hope.
You see, there have been many attempts at discussing the "universal syntax", that is the base syntax for the language that the brain uses. In most flavors of Chomskyan syntax theory this is termed something like "deep structure" (lately it's been "D-Structure" to avoid any implied but inaccurate meanings of the word 'deep'). DStruct is in essence the most general syntax needed for accurate expression of any sentence structure in any human language. It's supposed to be general, not differentiating between different languages on the syntax level (notice that I haven't mentioned meaning yet -- that's something completely different, the
Another theory, Head-driven Phrase Structure Grammar says that every word projects its own dependent structure, and that the structure projected from a word in the lexicon must adjoin properly to other words projected from the lexicon to form grammatical sentences. This theory also takes into account some semantics issues as well, and is very popular amongst the Computational Linguistics and Natural Language Programming crowds, but isn't too popular amongst the older ranks of theoretical linguists. It too is language dependent in its structure of syntax, although very comprehensive syntaxes of certain languages have been developed with some success.
That's just syntax. It's not easy. It's not very regular. It's very context sensitive. If anyone has written a compiler for any programming language they know how complex a language will get if you allow it to be context sensitive (instead of context free).
Semantics, the meaning behind a particular word or phrase, is a ridiculously complicated problem in linguistic research. People have spent their entire lives researching it with little success, and at various times in the history of linguistics certain well-known demagogues have denounced the study of semantics in its entirety because it appeared to them to be too unfounded or scientifically reasonable. Chomsky to this day makes nasty comments about semanticians and is well-known for denouncing research into semantics because most work is not provably consistent in even restricted domains.
Semantics is gnarly. It's weird. Researchers who work in semantics are said to get their more successful ideas from hallucinogenic chemicals. Semantics is a subdiscipline in which any random researcher can overturn the field with one paper, tossing out all of the research done previously -- and get away with it successfully. I don't mean to degrade the work of semanticians, and I'd love to join their ranks some day, but it must be admitted that much of semantic research and theory has a hard time standing up because it's in its infancy.
Look carefully at the construction of a programming language compiler. It deals with what's known as a 'regular language'. This is a language that is known to follow certain rules consistently, and all special cases are well-defined (for most languages anyway
To put all of this into perspective, consider a universal translator for computer languages -- what's it called? It's called a computer. So what do we call a universal translator for human languages? Surprise -- a human.
- Babelfish doesn't use an intermediate language.
- Babelfish doesn't even achieve loseless translation from language A to B and back to A. This is the simplest case and one which can be improved the most with a good definition for UNL
They do not claim perfect translation, but yes computer which could translate between languages and do it perfectly would pass the test. Do you really argue that it is impossible for computer programs to ever pass the turing test? It is only a matter of time till this happens. The only way to stop it is to stop making computers. This could be a real concern. You have to hope that once the UNL is defined you could extend it for your own purposes and still have every thing work. Here we are readingBe insightful. If you can't be insightful, be informative.
If you can't be informative, use my name
Or maybe SNMP or SMTP? I think there is a protocol for every 4 letter acronym ending in 'P'
When you look at existing technology, like Babelfish at Altavista, you see that the 'devil in the details' might be more of a 'great satan' than one might think. I'm not sure you can have any kind of accurate translation without a human acting as a filter for meaning. Its easy to apply some rules to a metta language interpreter, but using it in discourse would probably create quite a bit of ambiguity. Just look at this translation if you don't believe me.
English to German and Back
If you regard available technology, like Babelfish with Alta Vista, you see you that the ' devil in the power of the details could think much more from a large satan than one. I am not safe you can type exact translation without human serve as a filter for meaning to have. Its easy, some guidelines to more mettasprachinterpreter to apply but at using it in the statement would probably create much ambiguity. Straight lines view of this translation, if you do not believe me.
English to French and Back
When you look at existing technology, like Babelfish at Altavista, you see that the ' devil in the force of the details much more than one great Satan which one A could think. I am not sure you then not to have any kind of precise translation without acting human as a filter for the significance. Its easy to apply some rules to an interpreter of language of metta, but to use it in the speech would probably create ambiguity much. Glance right with this translation if you do not believe me.
and my personal favorite....
English to Portuguese and Back
When you look at it existing technology, as Babelfish in Altavista, sees that ' the devil in the power of the one details much more satan great of that one could think. I am not certain you I can have no type of the accurate translation without acting human as a filter for meaning. Its easy one to apply some rulers to an interpreter of the language of metta, but to use it in the speech would create probably the ambiguity sufficient. To look at just in this translation if you not to believe me.
Need I say more?
The UNL will be inconsistent as a few of messages has already pointed out.
Moreover, is this suppose to be the project of some freshman? The web page is messed up; there are lots of errors. One of the lines says "How to joint the UNL Community" on page http://www.unl.ias.unu.edu/eng/unlhp-e. html. I find a few by just looking at it. I think the people who are responsible for ths do not even care. The pages are poorly coded (made by some win9x program) and pictures look distorted. They did not even give an explanation of how will it be done.
<!--#include virtual="disclaimer"-->So what? They're not trying to translate television. They're *trying* to translate "Legal papers [and] UN treaties".
So it only works for boring documents. They're plenty happy with that.
What's the difference?
My apologies to all you fifth-graders out there, sorry.
The question is simple -- will it work better than babel? Babel sucks, sure, but it's sure useful when I'm know something has the information I want, but don't speak the language. For that sort of thing, I suspect that this would be very useful.
And if/when its use becomes widespread people might start writing to the meta-language. Not writing in it, necessarily, but, for example, being explicit on things that would confuse it. If that happens, then it really would work.
On the overall topic of a metalanguage: I'm familiar with Chomsky's theory of a universal grammar and if we could reconstruct that grammar, a metalanguage would be possible to create and implement. However, without a good understanding of the universal grammar, it would seem to be nearly impossible. We're too set in our individual grammatical tracks to fully understand those of other languages. That's one reason why teaching small children multiple languages is so successful: their internal grammar is not yet completely rigid.
Incidently all the Lojban (and loglan) affectionados cite this as reason why people whose native language is a more expressive less biased one (such as Lojban) would have freeer and more powerful minds. (I'm simplifying this a bit). I don't recall the name of this hypothesis, but it's attributed to someone or other.
For those who don't know, Loglan is a conversational language, the grammar for which is based upon predicate calculus. It's nothing like any spoken language, and has some fairly rigid rules for everything from generating/importing new words to constructing unambiguous sentences. Lojban is an updated version of loglan. I've looked at the primary Lojban book, and it seems pretty awkward, though that could just be that I'm not used to it.
Trees can't go dancing
So do them a big favor
Pretend dancing stinks!
/. took out my g. I meant that English thing as a joke....
Seriously though, having a universal translater might help curb the americanization of the world. Even though I strongly suspect it wont work very well.....
-- Moondog
Hey, :)
Why are the presenting there work in Brussels ?
Let me ask you one question : What language(s) do people speak in Belgium ?
Answer : I'm from Belgium and I speak Dutch, and that language is not included in their (little) list.
To bad
The other part of Belgium will be pleased (they speak French, bah!), but why in Brussels ?
Maybe because Belgium is in the centre of Europe, but why in Europe ? Anybody got an explanation ?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Belgium HyperBanner
http://belgium.hyperbanner.net
Linux hosting for $2.50/mo
The beautiful thing about English is that it's shameless. If it finds itself lacking the ability to express something, it is more than willing to steal a word from someone else, and make it its own.
Would you rather use French, which last I check, it still 18th century?
(and 10% the size of English)
Hope they can do better than BabelFish.
I'd think it would be difficult to make an abstracted meta-language out of human languages. There's lots of grammatical issues which would be particularly difficult to deal with well.
For example, in the case of inflected languages, how do you get the declensional case information into the metalanguage? In many languages, there are grammatical cases have overlapping declensions, so there's ambiguity about what would be intended with meaning. And mapping between languages would be really tough.
Verbs would be really tough. Like in Russian, you have three tenses (past, present, and future) as well as two verb aspects. So you have pairs of verbs, one expressing action that occurs once, the other expressing habitual activity.
Sounds like the project would be lots of fun to work on, though. It's a really neat idea, linguistically.
This thing should have a Monty-Python-Foot icon for it... Real or not!
A language may not "have" a specific tense as a part of its grammar, but the tense can be expressed. If not by word permutation, then by context or additional words.
/ per
/ Per
Hmm... Hope you'll be in charge of the "translator", otherwise Echelon is no longer an issue. The UN (and therefore, our own governments) will automatically be monitoring anything which is translated...
Add to this typos and word choice errors (transmission noise?), slang, jargon, and all the other ways we distort language. How can any reliable translation be made?
Disclaimer time: i am not any sort of language lawyer, so if my post contains inaccuracies don't flame, just correct.
-----
--
perl -e'$_=shift;die eval' '"$^X $0\047\$_=shift;die eval\047 \047$_\047"' at -e line 1.
A fine idea in theory, but the big question is "How?" sounds to me like some folks who don't really understand what computers are capable of are trying to sound important. Those who do not understand Babelfish are condemned to repeat it.
Overall, I agree strongly with the idea. From a testing standpoint, with the development of an effective meta-language, all one would need to do test the translation for the most part is go from language x->meta language->language x. If that works, than presumably the meta language did not slaughter language x.
One question I have is how the language engine will handle words it does not know--or, more likely, abbreviations, misspellings, and slang. From what I've gathered, this is where other translators fail. If the translator doesn't understand half the sentence, than it generally has too much trouble finding context for the rest for anything to make sense. Just a thought.
-Keelor
I don't think so - at least French (and Italian/Portuguese and possible Romanian) all have similar structure and rules - it's more a question of learning the vocab.
Learning Italian for me was relativley easy because I already knew French. English structure and word order is different, more akin to the Germanic languages.
One advantage English does have over some other languages is that you can really barstadise it, but still make yourself understood. Plus, there are so many variants.
There are some difficulties in Chomsky's work that arise from extrapolating general rules from "basic" cases, rather than unusual, abberant cases, and research suggests that other cognitive models shape the way we speak. Many linguists are still very into generative grammar, but it is far from a unanimous conclusion.
There is more to a language than just translating words, as any Babelfish user will tell you. My first problem is that it is so hard to get anybody to actually use it as a standard. It is very easy to come up with a standard. Even M$ with all their power have trouble trying to execute standards.
Teach'em all english. Thats my solution.
-- Moondog
So why not establish a universal SUBSET of languages that can express as much of as many languages as possible? For example, I'd be willing to wager that there's a way of expressing the phrase "The red ball is resting atop the green book" in every language in the world. At least, every language used on the net. Relatively simple subsets of communication would at least allow for SOME measure of success for this project, but I don't know how useful it could be. But it's certainly worth a try.
To recap, obviously you aren't going to be able to translate haiku with this thing, but you could translate Linux installation instructions.
--- Dirtside
"Destroy science and religion. Science would re-emerge exactly the same; but not religion." - Penn Jillette, paraphrased
Sounds like someone's finally found a practical, widespread use for the OSI Presentation layer...
It seems to me the approach is to dump the nuances of individual languages. The website describes a process whereby you type a message and watch a realtime native->universal->native translation. If it the result doesn't match the input you've used something that doesn't translate and you need to replace it with something that does.
It's far from a true universal translation system, but I think it could be very useful in conveying simple information. I wouldn't attempt to distribute poetry this way but for lots of documents it would probably be understandable.
/* The beatings will continue until morale improves. */
What will be interesting is how they will handle the changing and sliding concepts/phraseology between languages. Language is more than a verbal construct, its a representation of a culture.
Classic cases of well intentioned translations that lost a little something:
Come alive with the Pepsi Generation was released in China as 'Pepsi Brings Back Your Dead Ancestors!'
chevy nova's didnt do too well in Mexico either..
no va... no go
Check out Magic Firesheep!
In a pre-XML world, some of us encountered the UN's efforts to make EDI (Electronic Data Interchange) a world standard.
It was not a pretty sight. I doubt very much if a UN-sponsored human-readable language effort will fare any better.
I don't know what others do, but I used to switch back and forth quite easily depending on to whom I was speaking and the situation. I was in a French Immersion elementary school in B.C. (before moving to the US, where no such program exists). In the classroom we all spoke French constantly but as soon as recess started we would speak English with the exact same people (except the teacher). However, if one of the English speaking administrators came into the classroom, we would be in English "mode" again.
:P
Sometimes, even now nine years later, I still can think of a word in French before I can think of it in English.
On the overall topic of a metalanguage: I'm familiar with Chomsky's theory of a universal grammar and if we could reconstruct that grammar, a metalanguage would be possible to create and implement. However, without a good understanding of the universal grammar, it would seem to be nearly impossible. We're too set in our individual grammatical tracks to fully understand those of other languages. That's one reason why teaching small children multiple languages is so successful: their internal grammar is not yet completely rigid.
Of course, the obvious extension of this arguement is that only a small child could invent a true metalanguage. Imagine the possibilities...
Maybe they can use Esperanto as the basis for the intermediate translation language. IIRC that's what Esperanto was invented for in the first place (not to mention it's supposed to be logically consistent in grammar and pronunciation, unlike "natural" languages - surely a great boon to implementation).
#include "disclaim.h"
"All the best people in life seem to like LINUX." - Steve Wozniak
#include "disclaim.h"
"All the best people in life seem to like LINUX." - Steve Wozniak
NLT via a "generic" metalanguage is one of the Big Goals of computer linguistics after Speech recognition and production.
It's a fascinating field, and any Joe Q Programmer can write a "reasonable" interpretation of the idea. Nobody has written a really good version just yet, and I don't think we're likely to for a while.
...is dialect and `slang' support. If I'm in the southern US and I say `yeah, I'm fixing to go do that', how will that be interpreted?
Worse still will be Chinese support. They have, what, 2000 dialects of Mandarin alone? Will the UN force everyone to use the same language in a particular region? Or will this Meta-language understand that.
Doable, but not very well. Wait until they start including slang and jargon. And we all thought Babelfish was bad...
censorship is a form of noise, which actively seeks to drown out content with silence - Crash Culligan
We already have a "Universal Language." It's called English.
I'm not trying to be facetious; I'm not saying English is better than other languages; and I'm not saying that English will serve you best, or even tolerably well in all places; but it is an inevitable conclusion you must come to after spending any reasonable length of time abroad: if there is anything resembling a universal language in this world, it's English.
English is already a lingua franca in technical and many academic fields. Many universities in non-English-speaking countries actually demand that graduate students write their theses in English, because that is the best way to ensure its diffusion. Some such schools even conduct their classes themselves in English.
The Hollywood movie industry has also no doubt played a large part in helping to making English (not to mention Western culture) palatable and popular the world over. Dubbed versions of films are hardly ever as popular as subtitled ones (exception: kiddie films).
Is English the best choice for a universal language? Definitely not from the point of being easy to learn. Esperanto would be much better. But realistically Esperanto doesn't have a chance. If English ever encounters a contender, it will probably be Chinese, if only because 1/5 of the planet speaks the language.
BH
I'm a little confused ... does "Universal Networking Language" mean Esperanto or TCP/IP?
--
"I find your lack of faith disturbing." -- Darth Vader
prior art should stop this - it's been done (or at least attempted) before, using a de-ambiguised esperanto as the bridge language, by (among others) klaus schubert in the netherlands. searches for "distributed language translation" or "DLT" (possiblly intersecting with "esperanto") should turn up several references (some probably *in* esperanto) from the mid-late eighties. -duncan
It's so crazy - it just might work!
a previous project, the Distributed Language Translation (DLT) project, based in the netherlands if i remember right, used a similar idea, using a de-ambiguised esperanto as the bridge language. as i remember, as a text was typed in a source language, it was translated into and stored as this bridge language. when ambiguities arose in the interpretation of the source, a query was sent to the typer as to which meaning was meant (eg: "i love her more than you"), and this distinction was preserved in the bridge language. when the text was required in a different language, this bridge language was translated into the destination language. since the bridge language was intentionally chosen and further designed to be more easily machine parseable and less ambiguous than the original, the translation work was made easier. searches for DLT and esperanto should turn up some references to the project, although a brief summary may clarify further. as far as i remember this was in the mid/late eighties. -duncan
It's a big world, pal. Just because your map has your country in the centre/center of it doesn't mean that it's actually in the center/centre. =)
OK, here are some more answers.
... UNTIL" loop is the perfect solution for certain problems, and "dinero" is the perfect translation of "money" into Spanish. A TCP/IP stack, no matter which OS it is running on, will always have some sort of ACK/NACK test. But these are all very limited examples.
/. At the very heart of the cutting edge. (some text removed) I wouldn't expect your friends to be out of work any time soon. But isn't the job of a professional translator radically different now than it would have been 100 yrs ago? Political change was not the only thing that caused this change... communication technology has had a big role.
Watch out this is very, very long...
Don't think about it as "automatic" translation, it's much more likely to work out as semi-automatic. I expect that the process would be something like this:
1.Run automatic converter from natural language to intermediate.
2.Have an expert in the intermediate language review the translation.
3.Run automatic converters to the target natural languages.
4.Have linguists review the output.
Compare and contrast with a "traditional" translation process:
1. Ask a translator to translate from language "A" to target "B". Ideally, the person in charge of the translation should be fluent in language A, a native speaker of B and have at least basic knowledge of the subject at hand (for instance: Open Source).
2. Ask a linguist, (ideally fluent in language A, native speaker of B, etc.) to review the translation produced at step 1.
The point is that the intermediate language should be designed to be free of the ambiguities that plague language translation.
And how exactly can you do this? Either your intermediate language is "limited" (that is to say: misses many of the subtleties of the original language), which eases step #1 but certainly introduces many errors down the line. Or, it is an "advanced" language, that is able to translate many of the finer point of your "start" language -- but then, the interesting thing is the translation engine itself. Not the intermediate language. If your translation engine is good enough to translate, say, Spanish into UNL with little/no loss of meaning, it is also good enough to translate Spanish to English with no intermediate step!! If this is true, what's the point of UNL.
Another point is, how can you be an expert in an "intermediate language"? Either the language is "human-readable", but probably produces an output compared to sludge and correcting this sludge may introduce additional errors. Not to mention the pain it represents to check on something that borders on the unreadable. Or it is machine readable -- but in that case, who is going to read it?
Final point is productivity: using UNL, computers and machine translation may take longer than a simple translation "by hand" with human grey matter. A Windoze95 machine with MS Word and some good "paper" or digital dictionaries is, in many cases, more efficient and cheaper than going through the pain of machine translation.
The hope is to minimize or eliminate step (4).
Good luck! Frankly, this has been the "Holy Grail" of machine translation ever since it started. And I do not think we are any closer. So, far, every large, international institution that I am aware of (UN, UNESCO, EU Commission, EU Parliament, NATO, IMF, etc) either use tons of translators or have standardized on a couple of languages at most (English being, of course, the "Lingua Franca"). All the large international institutions mentioned above that use machine translations ahve discovered that, even on simple subjects, the 4th step you describe above is the one that consumes the largest time.
It would be a big win if you could get to the point where all the hard stuff is done just *once* instead of repeated over and over again for all of your target languages.
Again, this is the "Holy Grail" of machine translation. I don't believe that we are any closer to it than we were, say, 30 years ago. At least not judging from the output of some of the software available out there...
And no, this will not work for poetry or humor, but there's no good way to translate poetry and humor in any case. The idea would be to get it to work with technical, legal, and business language.
Sorry to say this, but this does not work very well either for legal or technical language. It may work with Business, since PHBs are so limited intellectually =). Legal translation can be horrendous: I have translated many legal documents in the past and I can tell you there is nothing worse than that, because legal terms are incredibly complicated and old-fashioned and also since legal trivia has to be rendered in a very exact manner. Legal terminology (in almost every language) is one of the most confusing and complicated one. Plus, lawyers and legal people are a major pain in the neck when it comes to Once you get the terminology right, I agree the rest of a legal document is usually a matter of "filling the blanks". But getting the legal terms right is enough to drive you nuts.
Technical translation is another problem: I think some technical areas may be the best bet for machine translation yet. The problem, as far as the technical field is concerned, is that in fast-moving areas (computer science is one) the technical vocabulary is changing and evolving so fast it's hard to keep up pace. I read up to 5 computer magazines a week (not to mention a daily dose of Slashdot =) just to keep up-to-date with the latest evolution in language and technology. Keeping a UNL database of terms and translation could prove to be a daunting task...
>What's so special about UNL? Theoretical translation of language A into a universal language and from there to language B is almost as old as "machine" translation itself.
The fundamental argument is that it hasn't worked before so it isn't going to work now is stupid. It has been demonstrated how difficult it is to do this, but not that it is impossible.
Please note that I never said (in the sentence you quote above) that this is not going to work. I just said that, as far as I am concerned, using an "intermediate" language is old news. This may be a new and interesting idea to you, but, frankly, for someone who has worked in translation, you could very well trace back this concept all the way to Volapuk and Esperanto. And these two were invented in the 19th century.
As far as I am concerned, I think you could prove that correct translation is impossible. All you would have to prove is that a "human" language is a chaotic complex system, which usually follows unpredictable rules and has several strange attractors, inducing a runaway complexity.
Case in point: English. Roots: Saxon dialects, Norman dialects, Old English and Old French. Latin. A little bit of Greek. Maybe German and Old Dutch. Evolution influenced by French and a myriad of other languages. Now divided into several branches (US English, British English, Irish English, Australian English, Indian English, International English), all of them influencing each other and countless other languages. Reducing the English language to a set of neat little equations and computer routines is left as an exercice to the reader... =)
Please understand me: computer translation of "basic" English into UNL and from there into Chinese, French, Spanish, Italian, Japanese, etc... is no big deal. Computer translation of highly technical/scientific papers may be achieved. But even then, due to the inherent complexity of English (or any other human language), a human will have to review the machine translation and correct it.
I therefore suppose that perfect translation does not exist (or is impossible). Translation (like programming) is an art, not a science. You can have a certain number of "artistic" rules, but you cannot have a "perfect", scientifically proven, solution.
Example: give a problem to be solved to two good programmers, and they'll probably come up with two different and equally valid solutions. Which solution you pick has to be determined by other factors (speed of implementation, maintenance and evolution of the system, optimization, resources used, etc).
Give a translation to be done to two good translators and they will probably come up with two rather different and equally valid translations. Which one you pick is then determined by other factors (length of translation, speed of said translators, price of translation, style, etc). Complex systems, like languages, cannot be reduced or predicted. They can be analyzed and more or less "solved" -- the quality of the solution being dependent on many factors, such as the experience of the specialist, his choice of tools, etc. This is true even in reductive or limited systems, where, for instance, the vocabulary to used is small (see technical translation above).
Remember the butterfly in Brazil that creates a storm at the other end of the world? I suspect translation (especially multiple language translation) may well be the kind of complex system that is so hard to solve using computers.
Perfect translation, like perfect programming, is only possible in a very limited scope. A "DO
>For a good example of the total and dismal failure of machine translation,
>try translating this text into French (or Spanish, or Italian, or whatever)
>with Babelfish and back to English. Then do it a few times. Then try
>English to Chinese and back a few times. Case closed.
Hardly, Here's why that is not a valid test
1.Babelfish doesn't use an intermediate language.
2.Babelfish doesn't even achieve loseless translation from
language A to B and back to A. This is the simplest case and
one which can be improved the most with a good definition for UNL.
Answers:
1. A intermediate language should introduce even more bugs into Babelfish translation. See above.
2. "Lossless" translation is impossible. See above. Complex systems, such as human languages, cannot be reduced easily to a set of equations.
>It is, in fact, an even better AI test than the Turing test.
They do not claim perfect translation, but yes computer which could translate between languages and do it perfectly would pass the test. Do you really argue that it is impossible for computer programs to ever pass the turing test? It is only a matter of time till this happens. The only way to stop it is to stop making computers.
Actually, I thought a computer had managed to recently pass the Turing Test, or some limited version of it. Anyone out there could supply information on this one?
But: I don't think the Turing test is actually a very good AI test. There is a huge difference between a program that is able to "talk" to you (parrot back what you said) and one which is able to understand you. A computer able to understand human language would probably be the first real AI on this planet. Most Turing test software are based on some variation of Eliza, and this has been around for ages.
Here we are reading
Well, this may be surprising to you, but the work of a professional translator has not evolved very much. Computer and communication technologies have eased their task a lot. Like many other professions, translators are now able to work from home, access the Internet and its wealth of information, send documents to clients by e-mail, and even use some very clever software that ease the translation process (TM/2, Trados, etc).
Word processing, in particular, certainly is the best thing to happen to translators since sliced bread =). Also, I agree that many new translation fields have been added in the past century: biology, computer science, aerospace, etc.
But the central fact remains this: to be a translator you have to be fluent in (at least) one language, a native speaker of another, and have a good expertise in one or more field of human activity. That's it. Oh, and you have to have a certain "talent" with languages, just like you need to have a certain "talent" for programming. It's an art, remember? Even the best-trained translator is worth 0 if he/she does not have that special "talent". Exactly like a lot of people work on Linux -- but there is only one Linus Torvald. =)
We may translate faster, have more tools and information at our disposal, and produce better-looking documents -- but the core skills remain the same and the work process is exactly the same. You could train a translator today in the exact same way they were trained 100 years ago: with a pen and a piece of paper. Sorry to disappoint you, but Computer technology is not always the perfect solution it prides itself to be...
That's All Folks!
The right to offend is far more important than the right not to be offended. (Rowan Atkinson)
I mean absolutely no offense here, but the end result is gonna suck. First of all, by the time the first draft comes along, there will probably be ten thousand other proposed standards, written by people with knowledge of the subject, that whip the hell out of the UN standard.
Second of all, too many cooks spoil the soup. Take a look at Ada. It was designed to be an everything-in-one, do-everything, and do-it-well type of language. It's mediocre at some tasks, horrible with most, and good at nothing. It's shit.
This is what generally happens with stuff like this. Don't get me wrong-- ANSI C is a good standard, but it was based on an already existing de facto standard. Same with SCSI.
I get the feeling that the end result isn't going to work. It'll translate real well between all languages until you use a pesky word like "the" or "and".
Maybe we could all map our written languages to written Chinese. This already works for a number of languages in eastern Asia. Europeans might have to adjust their grammar to meet the needs of written Chinese, but the lesson I take from Babelfish and other automatic translators is that grammar is so trivial that translators can just ignore it.
Jambo !
IIRC Swahili started as a universal (trading) language made up from various East African tribal languages, Arabic from traders coming to East Africa with an Indian influence and probably many others too. The word Swahili is from the Arabic word for the coast as the language sprung up first along the coast. It is now used as an intermediate by many peoples with different mother tongues far into Africa. In the short time I was in East Africa I found it incredibly easy to learn functional Swahili and I am a language dummy. Interesting then how this original 'universal' language has now moved down stream and is a one to translate too.
I suppose many other modern languages have grown from a mix of others (eg English, from Low German, Frisian, French, Celtic now including Indian (eg juggernaught), Polynisian (eg taboo, tatoo) and many other words)
Perhaps in the end we will just end up with another new language that started by the need of people to trade.
Perhaps they should base it on Swahili or even Esperanto, please no more new languages !
Maybe you live in interesting times
Looking in a Japanese-Japanese dictionary (Kokugojiten for all you Japanese literate folks out there), speaking to someone in Japanese about the word or phrase I don't get, speaking to another Japanese fluent English speaker in a combo of Japanese/English, and finally waiting a while for the meaning to sink in from use.
Based on the list of culturally incompatible (that is where a lot of the translation problems stem from) languages on the UN site, I would wager that this is going to produce at best a very very demented version of bable (sp) fish. Any linguistics folks wanna tell me why I am wrong?
English, for those who did not learn it as kids, is a notoriously difficult language to learn. When you think of the fact that every single alleged "rule" in English has at least one (and usually several) exceptions, you being to see the problem.
At least, this is what many French-Acadian friends have told me. English is my mother tongue, so I wouldn't know personally.
The statement made by me above was sorta cut and pasted after hitting preview and it not coming out right. You are right that klingon doesn't borrow. but any real artificial language does. Trek borrows for names a lot from other languages. "Odo" "Nerys" and "Jadzia" are all real words from other languages. I can't remember at the moment what Odo means, but Nerys=nose in spanish, and Jadzia="old man" in polish.
Lowmag.net
I remember something about an aborted project of 'universal language', was it Esperanto ? Never heard about it since ...
New slant on Turing: What about a prize for the machine that produces a translation that you can't decide is machine-translated or human-translated?
For a project that's supposed to allow effective communication, they could at least have designed a web site that works well in all browsers. No alt attributes for images... Sigh. Those of us using lynx just have guess, based on the image names :-(
"The invisible and the non-existent look very much alike." -- Delos B. McKown
The concept is called "deep contextual dependency."
It works all right for extracting meaning and for translating documents without nuance or unidiomatically (See National Weather Service of Canada automatic weather advisory translation.)
We can translate WORDS without difficulty. We can parse and deconstruct sometimes surprisingly complex sentences. The problems come when we try to deal with the fact that we don't speak in words.
We USE words, sometimes the wrong ones though Malapropisms are less of a problem nowadays with the spread of mass media spreading too few languages like manure. There is pretty good concensual agreement on the meaning of any word.
But, unless we're really anal-retentive or WASP, but I repeat myself, we usually back up or demonstrate our meaning with gestural cues.
We SPEAK in sentence fragments and often, like cats purring, we aren't really communicating a damn thing, except "Hello I'm here. Don't kill me." Most of what passes for communication is just interpersonal noise.
The problems of automatic translation arise because we don't speak or even think in words. We think in sentence fragments.
To complicate things, the words we use in constructing those fragments are often not words that bear any relationship with the thought being expressed by the sentence fragment.
"The spirit is willing but the flesh is weak" is a great example of a consensually determined, historically derived idiomatic fragment sequence used to express a single thought. It is directly untranslatable.
Don't know what it means? Not not sure of what it means? Then you don't come from a Judao-Christian, AngloSaxon family that at least paid lip-service to the church and to Shakespear. There IS a concept and a context expressed with the phrase but most of us would dance around trying to express it to a foreigner (or a computer.)
To understand it, you need to be a part of the consensus that was the socio-cultural caldrom that cooked up the expression in the first place. (Did you spot the idiomatic expession in that last sentence?)
Not to disagree with Suzanne Vega, but Language IS liguid. It may not be able to rush in but it is constantly deforming to fill the ill-defined vessels that speak it.
Machine translation will require that our machines not only become more like us but join us in constructing language
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
I will admit to not having read all of the UN documentation, but what I can tell about it from what I have read, they are attempting to create a abstraction of language in general.
Although this is an interesting idea, it makes an assumption that all language is based off of one abstract "map". IMHO different languages have different maps. Having spent a fair ammount of time learning ancient greek in high school and college, I can say that the map for that language is quite different from english, and those are both Indo-European languages.
The concepts that exist in one language may not in many other languages, which is often very problematic. Eventually, to learn any language, you must actually just start thinking in it, and not doing translation to your native language. Contemplating the 3 voices in greek (active, passive, and middle) is something I rather enjoy doing, as it is very foreign to english.
I am just afraid that they will have to produce a Least Common Denominator language which won't be useful for anything beyond technical specifications and instructions. I will have to admit that that would be useful on many fronts, but may not be the dream that we were all hoping for.
There is no silver bullet. Plus, werewolves make better neighbors than zombies or vampires anyway.
This would make the grammar and spelling flame obsolete, since the grammar would be a product of the translator. Without the most basic of flames, how could a flame war ever compare to those now.
Of course, errors in translation would let people who completely agree argue enlessly without knowing it!
It certainly seems that this would be possible at an elementary level. It shouldn't be a problem developing a language that would enable users to communicate basic messages to each other. However, communicating the subtlties inherent in each language would prove to be difficult. For instance, certain concepts for which there are terms in Chinese or Japanese are almost impossible to represent in the English language. Tackling these abstract and subtle differences would prove to be the biggest challenge for any 'Universal Language'.
Latins not as dead as you think. Its still
used for papal encyclicals and the like, and
as a consequence they maintain a latin dictionary
which every so often is updated with neologisms.
This made the news about 6 months back - they
included new words for things like 'lapdancer'
in the latest edition.
Of course not to mention the typos and poor spelling in the site.
Makes you wonder if they're going to include Klingon?? Or perhaps Sanscrit?? They can have a choose the next translated language contest! That would make for an interesting program.
Sanskrit would be the ideal meta language.
Every thing has to be clearly specified. No irregular verbs, tenses,etc.
The grammar(or syntax, take your pick) has been very well defined.
Simple! No scope for error *and* no scope for any propetiary extensions.
My $0.02
I can throw myself at the ground, and miss.
On the other end it would be simple to computer-translate the result into the next language.
Maybe we can't fully automate the process yet, but we can go a fair way.
I think this is a very neat idea. My worry is, who will patent the technology first and screw the world.
Amazon does it with ecommerce 1-click. Microsoft does it with style sheets. Hell, if its a good.. Interesting technology why not, lets take it and pantent it to death! Then we can charge everyone for it and make a zillion-and-one dollars. Perhaps I should send in my application today!
There needs to be limits on patents. Yes, I believe they do foster invention, but they also can stop community work on a really-good-thing.
Perhaps a community-patent-agency and a easy, low cost effort to setup patents that are held by some sort of group for the explicit reasoning of keeping some basic ideas *free* for us geeks and the rest of the world.
Really, it shouldn't have to come down to this tho. But someone will patent the implementation of this and we will all be screwed.
My $0.02
-- dieman - Scott Dier
All unicode characters should be allowed in domain names, email addresses, URLs, etc. The net attitude that 26 Roman letters can somehow serve all Earth residents is PATHETIC. And what better way to open up the namespace too?
Yorkshire, UK, for example, still uses "thee" and "thou". If you translate this into some kind of meta-language, it's either going to barf, or lose details. Those details may be important to meaning. God only knows how it'd cope with Cockney slang, or even common phrases (eg: "from the horse's mouth", "a sticky wicket")
As I see it, this can ONLY work for formalised documents, using a formalised subset of the various languages. eg: Legal papers, UN treaties, etc. It'll NEVER work with informal, written language.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Which Chinese language will they be using? The website doesn't say. It'll most likely be Mandarin or Cantonese, I think.
The website also doesn't address conceptual issues with translation - a language isn't just about syntax, it's about semantics. As a simple example, a German asking "Wie spaet ist es?" would have their sentence translated as "How late is it?" - it'll take a few seconds for an English speaker to work out they're asking the time. I'm sure there are a vast number of such colloquialisms in each language, and finding a common way of representing all the concepts people wish to communicate is a very hard task. Would it not be simpler for the people themselves to learn a language, and so everybody knows what colloquialisms and concepts the listener will be using.
I know, Esperanto. Or Loglan. *sigh* These projects haven't worked before, so past experience predicts a poor performance of this one. It's good to see they're making the effort, though.
S.Actually, it seems the problem with machine translation isn't so much that human brains are more intelligent, but that humans have more environmental context from which to get the subtleties of our evolving languages.
Universal Network Pidgin?
The in-converter would make an effort to identify idioms which are ambiguous or don't translate well and either have the author remove them or encode them in such a way that the core meaning isn't lost when they cannot be present.
The out-converter would convert what it could, then if more information remained present it in another form like footnotes.
Still not a easy problem by any means. But I think if we could find a way to write which could be reliably translated by machines it would be worth the investment.
From what I understand, computers still aren't terribly good a translating tasks, and it seems as though this could be even worse. Wouldn't you loose even more context/subtle-language-aspect by translating from, say, English to UNL and then UNL to, say, Japanese? UNL will probably be rigidly defined (context-free grammar?), so you will have to twist and bend English to get it into UNL.
It would be neat to see some amazing developments in natural language processing, but I won't hold my breath
BTW - if you ask Babelfish to translate 'I won't hold my breath' to German and back, you get:
I do not hold mean breath on
:)
Dana
This could possibly be a great tool for communications, but what about cultural differences between nations? Are the going to be PC (Politically Correct) filter plugins to make us aware of when we are about to say something that would be offensive to another culture?
Before you post your web page, you hit the World PC button on your UNL Translator Plugin.
*WARNING* you are about to offend people in the following countries...
China, Saudi Arabia, etc..
Wihtout it, you could offend billions of people, and not even know it. What an opportunity!
It also raises the question of "What am I saying to people in Zimbabwe?" You have to have a great deal of trust in the algorithms to put anything with delicate subject matter through the system. This becomes even more important if there was some kind of filter like I described above.
I wonder if this UNL is something that is human readable on its own. I think it would be much safer if we created a UNL that was human readable/speakable. Then you could go to any part of the world, and stand a chance of communicating without having to use your UNL enabled cell phone or palm device.
I will be difficult, but I don't think for those reasons:
Verb tenses are not the problem. Every language can express every tense, just in a different way. Hard yes, impossible no.
Additionally, approximations work well enough. Ex. Most English readers couldn't tell you the difference between past tense and preterite(sp?) tense.
Grammar is easily defined. 90% of language could be described in a BNF. adv-adj-noun in one, noun-adj-adv in another. So what. That is probably the simplest part.
My interest would be in the meta-language design. Words by number? string? Grammar by parsing into a std format, or classifying each word? Are there multiple ways to organize a statement? What about this "word hierchy" they talk about. Quite cool there.
It's horrible. What I don't understand, or rather what people don't seem to understand about graphic design is that ... if you don't know how to do it right, don't do it!
I'd rather see a plain TEXT HTML2.0 page on a gray background than this kind of ugly logos. At least it would be lynx-friendly.
This allows the semantic extraction to be MUCH more computationally intensive than systems like babelfish can afford. When you make a document, it's okay to spend an extra 15 seconds to extract a pretty good representation of the gist of it, so long as it doesn't need to happen every time the page is viewed. (babelfish doesn't even cache translations, does it?)
Okay, so some of the idioms and convoluted sentences will be improperly converted, and will need some manual tweaking. Hopefully this system will allow this tweaking to take place. By providing multiple different conversions back into the author's native tongue, they may be able to see some of the translational oversights, and fix them.
This won't be good for poetry, but will allow people who only know one language (English speakers seem more likely to fit this category than other people) to publish documents readable by people who do not speak English - that's a substantial breakthrough.
It would be nice if this standard would allow segments to be set to fixed translations, so that if I really wanted the English to read a particular way, I could enforce that particular idiom, without loss of generality. ("Normally translate 'it has a low probability' but if you ARE translating to english, substitute the literal string 'fat chance'")
Trees can't go dancing
So do them a big favor
Pretend dancing stinks!
The UNL is simpler than the natural languages.
During the translation from the natural languages to UNL, some unimportant details must be dropped.
During the translation from the UNL back to natural languages, the meaning of the sentences remain, however the atmosphere or the mood might be lost - it is like a lossy compression...
It's obvious they've put at least some thought into it. Translating to some universal middle-language is clearly the only sensible way to handle translating between many different languages - you only need N two-way translators instead of N^2.
They describe a UNL editor which translates your text into UNL and back again, so that you can check that the translation will be ok as you write it. This sounds like an excellent way to check the results and minimise the errors/inaccuracy.
Actually coming up with a representation which copes with the meaning in all of the other languages is surely a massively difficult task. *If* they manage to solve that, then I'd think that computer (written) natural language recognition is all but solved. (I don't know what the current state-of-the-art is like)
The one thing which annoys me is their insistence on using the terms "enconverter" and "deconverter". I mean the latter sounds almost ok, but "enconverter"? From a group who need to be expert in languages? Yuck!
Though at first sight the idea of translating to an intermediate language seems interesting, I can't help but note that similar projects in europe have all failed so far.
Automatic translation between languages in the EU is something that could save a lot of money. So there have been a lot of research projects funded with loads of EU money to accomplish this. All of these projects have failed (as far as I know).
This seems to be a similar effort, this time by the UN which is an equally burocratic organization. I think the goal of this project is probably too ambitious to work. Even translations between two related languages (english and german)are troublesome (babelfish for example is not exactly perfect), so I can't see why translations to an intermediate language would change things (ever tried to do that in babelfish? the result is not pretty).
So, it will probably fail and loads of money will be wasted on it.
Jilles
A holy war could be started because of the sentence:
If it's spelled incorrectly.
Use your imagination.
A very similar story was posted on slashdot last year ---> Here
While this system would reduce the number of translators significantly, with the UN's record of fast action (NOT!) and bureaucracy I think this is headed down to the Great Bit Bucket. I'd be much more interested in what some of the major research centres in computational linguistics and language recognition are up to. (Links, anyone?)
==================================
neophase
==================================
neophase
they can't even write web pages correctly (the english pages are using the Shift_JIS - Japanese - char encoding...fscks up 1/2 the text on the page)
I think the UN should just push esperanto as a 2nd language (as its intended) so ppl can communicate that way (i mean come on, its NOT hard to learn)
"There is no spoon" - Neo, The Matrix
Ilmoita se Ahtisaarelle heti !!
The various Chinese languages are all mapped (with a reasonably good degree of confidence) to the same written language. Translating written languages is easier than translating spoken languages.
w3m has pretty much obsoleted Lynx. It's a text mode browser that supports tables, frames, and generally renders much better than Lynx does. Akinori Ito is the man.
After all, encode and decode have been used for years and years and years, so all of the translations of the UNL web pages would be correct.
Frankly, the web site reads like it was originally written in another language, and then translated to English. Which, to be blunt, makes me doubt that they have the skill to actually implement UNL. I mean, if they can't do a decent job translating a lousy "brochureware" web site, how can they do a creditable job with UNL?
Having said that, I'd like UNL to work. It might simplify my travels somewhat.
It doesn`t help with the yanks mispelling words ie color, sulfur ( and driving on the wrong side of the road for that matter :-) )
Esperanto wouldn't work: the point about Esperanto is that it is a pared down language without the anomalies that most natural languages have.
Unfortunately one *needs* these anomalies in order to translate back into the natural languages: one needs to know all about the 3 genders and 4 cases to translate automatically into German, for example.
Several points -- for full disclosure, let me just state that I am a localization engineer, with a 5+ years of experience in software localization (read: adaptation into different languages) and a 7+ years experience in translation. If that does not makes me qualified to comment on this, I don't know what does.
Of course, I may be completely wrong and UNL may be the next best thing since sliced bread. But I doubt it.
The right to offend is far more important than the right not to be offended. (Rowan Atkinson)
no, every language cannot express every tense. Read Cassier (his philosophy of symbolic something or other); he has lots of source information about how things are expressed/experienced in different languages.
In particular, there's an indian tribe in south america he cites which has a single verb tense, and expressions of the past and present are identical, and are apparently understood in an temporally undistinguished way by the tribe.
MONGOL? surely there are many many languages with more users than Mongol? Does Genghis Khan use the internet? Your tax dollars at work.
A few thoughts on "automatic translation" of web page.
1. Web pages could store more than one translation of their text. You could store the default version in your native language and a translated version for someone else. Envision BODY LANGUAGE=ENGLISH-AMERICAN or LANGUAGE=ESPANOL. This would be even easier if you were generating pages out of a back-end database.
2. You could embed a "meta-language" version of each of your web pages. This would allow you to tune the meta-language page for better translation. BODY LANGUAGE=UN-META-LANGUAGE
3. This process would be facilitated if the meta-language was something that people actually spoke, and was easy to learn (eg, Esperanto, or some other designed language).
4. Where the meta-language was not sufficiently specific (implied meaning, context), you could add markup tags around words to indicate meaning. This could extend Esperanto to have useful features for computer translation.
5. You could even mark up English text to indicate meaning.
6. Failing all the above, if web pages consistently had "summary" or "abstract" sections you could at least focus your translation efforts on that chunk of the page.
Ah, for the return of HTML to content markup and not display.
James Cook
James@CookMD.com
See http://www.useit.com/ about the bolding.
this is one of those things my wonderful senior year high school teacher used to talk about. we would talk about things like if all words are really "images" in the brain, like as if there's a mental stamp we have inside that we "think" of every time we say "run", that could really be translated into any language... very interesting, but it's really hard to out-think your own brain and try to figure out what it's doing.
there was an exchange student from germany in that class and she said that when she first came here, she would "think in german", but eventually she found it easier to "think in english" when she was going to speak english.
i'm a very poor spanish student (which have to take at college due to general requirements blahblahblah) and i'm not at the point where i can really "think in spanish", so i "think in english" and of course i speak about ten times slower than good spanish speakers (those people they put on the listening tapes are so damn fast!).
if there is some kind of "universal" language that your brain thinks in, it must be really hidden from your conscious self, because i find it very difficult to think without words. but, of course, as my teacher would say, if you can't think without words, how could the first languages have been developed?
so of course we must be _able_ to think without words. i guess we just make up our own internal mental representations for things or concepts, but once we learn language, this is probably not used.
as far as the project itself is concerned... this is going to be _necessary_ at some point for language translation (though of course i think everyone should speak english, like almost "everyone" does already). if you have just have italian-english and german-french, etc,etc. translation algorithms, it's going to get ridiculous. n^2, where n is the number of languages you want to translate between.
i really just hate languages.
Thought not.
By the way, this was already tried once - it was called "Esperanto". Ever meet anyone thatspeaks it?
Thought not.
My mother is German my father American. My German family speaks no English and my American family speaks no German.
When I have had to translate I used to try to do a word for word sentence for sentence translation. It never worked. I can not explain why, it just sounded very wrong and sometimes even gave the wrong meaning. Then when I got over having to do sentence for sentence translation and began paraphrasing everyone would understand. I don't see a computer doing any accurate paraphrasing anytime soon.
Why is the UN reinventing the wheel? There is a universal language on the internet: it's called English (or american, as the brits would call it).
Only half in jest on this one...
We already have realtime audio-textual language translators that are surprisingly accurate.
I think what they are talking about is a textual intermediate language to bridge other languages. I think this is how the audio translators work anyway so we may be half way there. I think an XML language would do perfectly, like:
(sentance)
(clause)
(subject name="boy")
(verb name="walk" tense="past simple" adverb="slowly" adverb="lazily")
(predicate)
(preposition name="to")
(noun name="school")
(/predicate)
(/clause)
(/sentence)
Something like that (hopefully it wasn't munged). Then I'd say "The boy walked slowly and lazily to school", and it would be converted to the meta language, then into the destination language.
It's 10 PM. Do you know if you're un-American?
Just imagine what the poor linguists who have to find the "common constructs" in the various languages will have to go through...
I wonder if eventually, people will just "skip" the language translations & learn the meta-language directly - will they teach it in school?
Of course, this could all go the way of Esperanto...
This discusses a similar project...
/. longer than I thought, to have remembered to look for that...
Wow. I've been on
"enconverter software"? "Inter-Net"?
Are there any actual computer scientists or linguists involved with this project? Their web site looks like it's either a team of bureaucrats or fifth-graders.
The idea of using an intermediate language (often called an "interlingua") to translate text is not new. PANGLOSS is (was?) one such project, there is also at least one Japanese interlingua project (http://www.cicc.or.jp/homepage/english/about/act/ mt/mt.htm). I don't think these projects led directly to any practical application. It's a tough problem!
I don't expect UNL to succeed -- I am skeptical of any organization that has only a flowchart and a tag on their home page to show for a year of work.
What I'd like to see is an open-source project to develop an interlingua for a very specific domain (say, computer user interfaces?) that will be immediately useful. Start with just the interlingua -> human language engine, since it's a bit easier. Use it to make text for dialog boxes, menus, and help files. It won't translate Shakespeare, but it'll be useful!
Also consider this example (oversimplificated):
A cat is a pet in western culture, but a dog is more like a chicken (i.e. something to eat) in some eastern cultures (China?).
So, how an eastern people writing "He went out to kill the dog" should be interpreted? Was he crazy, or just an hungry man? Probably, he was an hungry man.
The intermediate language may be as precise as possible, but when translating, either you have to ignore the meaning ("She went out to kill the dog"), or you have ignore the fact ("She went out to kill the chicken").
Well, let's choose the former: a Babelfish translation does almost this today, so what's the point in doing that? Improving. Well. It's a good point.
So, let's choose the latter: suddenly, in the text, you find something like "...and still there were some tracks of a four-legged animal on the snow...". Obviously, it can't be a chicken, so the translator should also change all the indirect references to the dog to indirect references to a chicken. "A tricky job", as Deep Thought would say.
Said that, I hope they improve the quality of automated translations. But I don't expect too much soon.
My 0.02 Euro.
Philosophy of Symbolic Forms. I think you mean volume 1. Is here where I remark on my dislike for the deification of Chomsky and his theories?
Ages ago I thought of the problem of automatic language translation. It is best done by translating to an intermediate language then translating to the language of choice. DO NOT use a language that is still in use. The reason is "living" (still being used) languages change too rapidly (the same words change meaning with time). Use a "dead" language (no longer used). LATIN is probably best since it no longer used and there are Latin to "every other language" translations available.
I am a true bi-lingual and my wife also studied this in college. Here is the gist of it, bi-lingual brains are wired differrently (makes me a mutant ). The brain forms pair-words of equivalent words. so if you say house in my brain house-haus or haus-house is accessed. Based on the aural input I then strip off the other language construct. Since I parse English/German it is actually easier since they are linguistically close. I dont envy people that have say English/Japanese or Chinese. And yes I can generally think in both languages but mostly when I am alone. The best way to describe that is that when I am alone I can here the words of the language(s) in my head but not when I am around other people. I suspect that I use the internal symbolic language when around other people in order for it to be my translator. Oh yeah, until I met my wife I didnt know any of this I just did it. Until here I didnt realize what a fun toy I had between my ears.
First Pole! Initial posting! Top Rank! Station #1!
Let me just add something to the above, since I haven't made myself clear in what I have said in the above.
In German it is possible to use the definite article to refer back to something used in the previous sentence, rather like `it' in English: but with the crucial distinction that what we refer back to must be of matching gender. So if a masculine, feminine and neuter word occur in the sentence it is possible to refer to any of them with the `it'. This ability to refer on the basis of gender must be captured in our syntactic model. Similarly the case system allows one to have multiple indirect objects (one accusative, one datave and one genetive, for example) directly attached to a verb, where in english one would use a preposition.
On the other hand, they just might be able to come up with a way to map a small subset of natural language, computer-speak for example, for the purpose of easing the creating of internationalized apps and making web sites more navigable. But I don't see how this could be successful in a general case.
Ita erat quando hic adveni.
I recall when I had to write a graphics format conversion utility for a little known application. This was not a simple task having to learn each file format and then how translate it to this new format. luckily, I had no need to translate between anyother formats or translate to any other formats (I had to avoid copy lefted code so pbm was out :( ). But it would have been impossible to maintain if the file formats for each format changed constantly. It is bad enough that I had to account for multiple versions of a particular proprietary graphics format.
Now, suppose they overcome the problem of keeping up with language changes. What about time? I am supposing this is talking about written language. I write something today using current local language terminology that changes over the next few years or I use some terminology that went out in the 60's or any combination (what a gnarly bad chuck of freakin' hackish). 10 years later it is translated in the meta-language. how the heck would it translate this and maintain integrity?
NT is based on the premise that anyone who can manipulate a mouse can administer a system. Huh?!?
That's why your South American language isn't on the list of languages to be supported. =) Guess these people are out of luck, which is really a shame, but that's what happens when the industrialized world gets to run things. The weirdos.
Yeah, this guy shouldn't have generalized like he did, but I've done work in MT as well, and he's right for the most part. Babelfish is not exactly the epitome of Machine Translators, either. Most good Machine Translators cost money to get to use, and they're used primarily as an aid to human translators, who look over the result. Cuts down on costs. =)
Anyway. This post is buried so deep, it makes my eyes water.
For those of you who think this is impossible because of the variations between languages, Noam Chomsky has something to say to you. I was exposed to his idea back in formal languages and automata class. Basically, his argument is that we have universal grammar (UG) parser built within us when we are born. We 'hardened' the parameters to the UG to conform to our prefered language. Sorta of like guile and perl where guile is a very expressive language but perl, while express less, can express the same thing in a more consise manner.
Universal grammar is defined by Chomsky as ``the system of principles, conditions, and rules that are elements or properties of all human languages... the essence of human language'' [Chomsky, 1978].
Thus, all languages that we are accustomed, English, Arabic, Malay, Japanese, and Chinese are special cases of a universal grammar. Chomsky and subsequent linguists are looking for those common elements of all languages.
Universal grammar and the innateness hypothesis
Universal Grammar in Prolog
There are lots of discussion about this... see google.
Hasdi
I will be curious to see/hear how they intend to handle the use of slang, or eubonics, which is so commonly (over?) used, particularly in the US, any ideas on this? Maybe Americans will have to learn English now so this will work. Wouldn't that just make things simple?
As far as I can tell (just guessing, though) there are 2 key differences between UNL and Esperanto:
1) It's not Romance-based and thus won't be as Euro-centric and will thus probably translate Eastern languages better.
2) It's designed as an intermediate language and not as a final end-user language. As far as I can tell, it could even be machine-readible and not speakable. In any case, it will not have as many constraints as a language like Esperanto that is designed for human speech.
These are just my gueses. I don't know what kind of language they're actually trying to implement. (The website is skimpy on those details.)
Latvian and Mongol but not Hebrew? Does this make sense to anybody? Israel practially a second silicon valey (home of ICQ, Gooey, Ricotche, and more, the entire country is covered digital cellular network - cell phones outnumber ground lines, etc.)
I will not buy this record -- it is scratched!
I want to fondle your bum!
Mein Luftkissenfahrzeug ist voll von den Aalen.
Je veux au fondle votre sans valeur!
(courtesy of babelfish, obviously)
It's not going to work very well. The problem is that each language has its own nuances, and in many cases these don't translate very well into other languages. I'll use Japanese honorifics as an example. The list of them is relatively long ( -san, -sama, -kun, -chan, -sensei, -wa, and others). Simply by attaching one to the end of a person's name, I can make the same sentence express immoderate flattery or extreme derision. This can be translated in an extremely limited fashion to romance languages such as Spanish or French (by using familiar vs. formal form of address, but it's still limited). It doesn't translate into English at all (this is why I prefer subtitled anime; get the general meaning from the subtitles, and actually listen to the Japanese for the nuances). And, of course, you still have the problem of inflection not translating very well into written words. This makes English particularly unsuitable for network communications, actually, since so much meaning is left to inflection. What's the solution? I don't know. There probably isn't one. Even Esperanto isn't immune to this problem of losing meanings in translation. I don't think a "universal meta-language" is going to work, though.
Nope, the idea of this project if I understood correctly is not to create a new language people around can use to talk to eachother, it's meant more for machines. English doesn't really qualify as it's a real language and thus doesn't have the required properties as it is ambiguous in many ways. I'm not familiar with esperanto, but I would assume it doesn't contain all the necessary information either to be useful as an intermediate language as it was designed to be a human language. Take for example the word "uncle". Is that on your mothers or fathers side? In english it's impossible to say but in some languages the difference is very important. If your translator program is smart it might guess correctly from the context (or atleast state it doesn't know so maybe a human operator can then choose one) Ok, so maybe things would be much easier if everyone spoke english, but that is not the case and won't be in the foreseeable future. In the meanwhile automatic translation is needed badly and this sounds like a good approach to the problem.
This may sound snide, but if their translation tools work so well, why haven't they been able to translate their Web site into all 16 supported languages? The site only has English and Japanese versions at the moment.
Pinker's "The Language Instinct" is an outstanding introduction to some of the problems of linguistics. It also presents some theories as to the biological basis for language. One of his most interesting points explained the incredible difficulty in learning a new language after the age of nine or so.
Pinker feels that there is a portion of the brain that instinctively understands the possibility of grammar, and when learning a language (when young) the brain adapts to the particular grammar of a language. Children who have not been exposed to a language and yet must interact with others in the same situation will develop their own grammar and invent a vocabulary.
In other words, there isn't a "universal" grammar as such, but more of a "meta" grammar--rules for creating grammatical rules. Of course, the human mind doesn't follow these rules verbatim. Any language has a main set up grammatical rules, but there are numerous exceptions within the language that follow different rules. A universal translator might know the "meta" grammar, but it would still have to figure out the grammar for each particular language plus recognize the exceptions.
Something of a daunting task in and of itself, and we haven't even begun to talk about the problems of non-corresponding vocabulary, idioms, slang, jargon, language drift, etc., etc. I am not optimistic about the chances of a good universal translator coming around until we get a good AI that mimics the processes of the human brain. Unfortunately, such an AI would probably want coffee breaks.
Pinker has several other books out that I haven't had a chance to read yet, but they have gotten good reviews and I am sure that they would make excellent reads as well.
If the project organizers try to write good translators first, they will be putting the cart before the horse, and the project is likely to go badly. They should put their effort into the design of UNL, coming up with a good extensible machine-readable language that conveys human semantics, and write only prototype translators. UNL must be an open standard, like TCP/IP or HTML, and once publicly released, it should not drift too much. The writing of the real translators should be left to enthusiastic open-source developers, who will have the time and the motivation to do a much better job on translators.
Inevitably there will be trade-offs. In most languages, translations from other languages will seem like a pidgin. Fine linguistic nuances will not survive the translation process, and regular users will learn not to depend on them. If it's mostly comprehensible, it will still facilitate communication where none would have been possible previously.
The first design for UNL should probably be considered provisional, and ultimately a throw-away to be replaced in the future. But we can't replace it until we've learned its lessons. This still seems to me to be a very worthwhile thing to attempt.
WWJD for a Klondike Bar?
Current understanding does not allow us to translate arbitrary subject matter between languages very well. (translations of language coming from a subdomain where meanings are non-arbitrary is currently possible).
Given UNL, it might be possible to generate natural language from it, but not vice versa. UNL may provide for language meaning to go to natural language, but does not provide a way to get from natural language to meaning, something computational linguistics has been struggling with for decades, and precisely the reason why translation is currently impossible.
In short, the problem isn't a universal representation of meaning, it is getting natural language to automagically convert to such a structure.
So, UNL will only be useful once the problem it "solves" is already solved.-k
Just to be a nitpick, UNL would not be a "meta-language" because it would not be a language about languages. It would just be an intermediate language.
The enconverters and deconverters would be more like the "meta-languages", sort of...
Universal languages for any medium will never be completely accepted. To have the UN try and do this is a waste of time & money. The US should get out of the UN, and kick them out of the country, before they try and impose their new language as part of their New World Order. Oceania had newspeak, the UN has this----
Isn't this the same as interlac?
Sorry, couldn't resist, but it sounds like the codex of language starting with numbers and working up. Maybe it should be taken that way.
Un-natural languages don't sit with me well. Klingon, esperanto, etc just seem silly. All they can do is borrow words from each language. This would at least guarantee certain words translate exactly to certain languages.
I think it would be interesting to see everything translated through Chinese, since that is one of the more stateless languages. Of course it would be difficult to understand when translated to English (just go to Chinatown in cisco), but you'd get the gist of the conversation.
Japanese would be interesting as well. Like chinese, it conveys thought and emotion more efficiently than english. That and its grammar rules don't swing violently on whim.
Pluarality and conjucation could simply be translated by computer. We just need a universal rule set, not a universal language.
Lowmag.net
Depends on your starting point. If you are from a Nordic country for example, or Germany or The Netherlands for that matter, English is a lot easier to learn then French is. For Spanish people on the other hand, the opposite is true.
A lot of these goals have already been done by Lojban . It has precise unambiguous grammar (Yacc-able, in fact), is speakable by humans, and can even be parsed from speech unambiguously, and has none of the cultural baggage that clutters most natural languages (and even Esperanto). Of course, for the project it would be necessary to codify a much bigger vocabulary, but that's not too hard.
I agree that Mongol isn't as widely used as, say, Swedish? but, then again, how many Swedes don't speak/read/write another language? I believe they chose this because Mongolia is rapidly adopting "free trade" and how many American companies have business there. BTW, Napoleon didn't use the Internet either but people still speak French.
What about using mathematics as an International Language. People would read and write using numbers. We could use numbers to represent the sounds of words. Or we could use them just like alphabets. So instead of "Hello", we would have 28766. And that would be readable by every person.
Linkwa, pink dama, arf muzheek. Rintintinambulation. Alla da peepholes enda voold, enda looniverse, cargo a schlong ender hertz. Epp, dat schlog arf Unamunda.
-Chris
Speaking of being more open, given that this project is supposed to help international communication, I'm surprised it gives so incredibly few details about their language. If you look at their project info page, this has been in development for a FEW YEARS already. Yet, their website only contains information on the software, not on the language itself, which would be the hard part.
I wish they'd give us more information on what UNL itself is like
Is babelfish gonna be a plugin for my Netscape og what?
Problems with chomsky aside, it's still quite tough even with innate logical structures. Humans have a hard time using their own native grammars correctly; making a machine do something that complex is going to be super-tough
...according to foxtrot... http://www.foxtrot.com/comics/pag es/ft990831.html
-----
Free P2P Backup, Windows & Linux
I thought not. We now return you to your regularly scheduled brain-washing.
So anyway, these problems are hardly unique to European languages.
has, to my knowledge, only one tense. And no articles. And plural noted by saying the noun twice ("orang" is person, "orang-orang" is people).
Needless to say, there isn't much poetry in Indonesian...
-