Coming Soon, The Google Translator

fascinating by professorhojo · 2005-05-31 02:12 · Score: 5, Informative

since the RTFAs lacked any kind of crunchiness, i sourced some great stuff here that does a wonderful job explaining how this system works, and gives the advantages the statistical translation method has over the rules-based approach. as well as the disadvantages.

fascinating stuff:

"Currently, most machine translation technology, including consumer-oriented programs such as Systran's Babel Fish, have been "taught" the rules of language, such as verb tenses and when to use parts of speech. Programmers painstakingly hand-build systems based on such rules. "The computer is told, if you see this thing in Russian, replace it with this thing in English," explains Yarowsky.

"While somewhat effective, such systems are time-consuming to build (consider how long it takes most humans to learn a language and all its rules), and resulting translations are still marred by grammatical and other errors. Those that do work fairly well usually tackle popular Western languages, such as French, German, and Spanish; there are few translation programs developed for other important tongues, such as Chinese, Turkish, or Arabic, let alone for more obscure languages like Tajik.

"To tackle a broader range of the world's languages, and to improve on the quality of machine translation, Yarowsky and his Hopkins colleagues are developing computer programs that can be trained to figure out any language using statistical analysis, i.e., looking at the probabilities of language patterns. In what's known as automatic knowledge acquisition, the computer could "learn" Serbian well enough to translate future documents or conversation, or at the least pick out pertinent words like "bomb."

"As Yarowsky explains: "Say you want to teach a computer how to translate Chinese: You give the computer 100,000 sentences in English and the same 100,000 sentences in Chinese and run a program that can figure out which words go to which words. If in 2,000 sentences you have the word Washington, and in about the same number of sentences you have the word Huashengdun, and they occur in the same place in the sentence, these words are likely translations.

"It's all just observation," Yarowsky adds. "Children do the same thing, but they also do it through visual stimulation and feedback. They see a book and hear the word 'book,' and eventually they learn that it's a book. They see a bird with its wings flapping around and learn that is called a bird. It's the same with machines, only they have much better memories. Computers could remember exactly when and where they saw the words bird and book."

"So, instead of telling a computer how to do something -- conjugate the verb 'to be' in Spanish, for example (I am = soy) -- researchers give it tens of thousands of examples and program the computer to find repeated patterns that the computer can use to conjugate new verbs. Trained this way, the program could potentially "learn" phrase structure and the rules of translation.

"As Yarowsky notes in his 100,000-sentence example, one way to accomplish automatic knowledge acquisition is to use bilingual or parallel text. The program "reads" a document in English and then a version in a second language. Such texts used by Hopkins researchers include the Bible, which is available on the Web in more than 60 languages, the Book of Mormon (over 60 languages), and the United Nations Declaration of Human Rights (240 languages).

"Aiding the computer is the fact that the English version of such texts can be annotated by hand or using another computer program -- essentially marked up to show, for example, that Jesus is a noun and pray is a verb. The translation program-in-training needs such information because it cannot translate future text just by substituting individual words in each language; it must also be able to analyze how sentences work. To do so, the computer program uses pattern recognition templates and other tools to understand sentences on a syntactic level. Simply put, the program is essentially given clues to know what to look for, notes Yarowsky: "It should figure out the subject, figure out the object, and other elements of sentence structure."

Re:fascinating by Anonymous Coward · 2005-05-31 02:18 · Score: 2, Funny

This is great and all, but I won't be impressed until it translates the gibberish that comes from the Iranian gas station attendant everytime I stop for gas.

For now, I just nod my head in ignorance, and count my change.
Re:fascinating by Anonymous Coward · 2005-05-31 02:24 · Score: 0

One use that the article doesn't mention is that this could be used to train people to read other languages; try to read the article in the foreign language, then compare your reading against the auto-generated reliable translation. This approach is problematic with the sometimes bizarre and unreadable translations generated by current translation software...
Re:fascinating by NoMoreNicksLeft · 2005-05-31 02:34 · Score: 5, Interesting

Some questions:

Why can't a dictionary be made of nouns, of verbs? Why can't we have it statistically analyze the grammar for ambiguous words?

Does it only recognize exact matches? Especially with verb conjugation, I'd think any words 80% similar or so should be considered matches. Not all languages are as conjugation happy as latin or spanish or even english, and you often lose some nuanced conjugations when translating from one to the other.

What will be done about idioms? Translating these word for word often makes no sense at all, and for me at least (no idea what the official stance is), I'd rather they substitute in idioms with the same general meaning, but for the culture being translated to.

Does it work on alternate character systems, is it word boundary dependent?

Does it understand punctuation rules, will this post translated to spanish have the upside down question marks where they're supposed to be?

How many of the world's existing languages have enough text for this to even be feasible?
Re:fascinating by Anonymous Coward · 2005-05-31 02:37 · Score: 0

for example, that Jesus is a noun and pray is a verb

"Jesus is a verb, not a noun". Ricardo Arjona
Re:fascinating by MoonBuggy · 2005-05-31 02:39 · Score: 4, Interesting

Sounds like a very good approach, but am I the only one to see an issue in the texts they're using that are already available in multiple languages?

The examples given (two religious texts and a legal one) don't really sound like the best things for teaching a "blank slate" program a new language. I understand that it's looking for structure and rules rather than word-for-word links, but the Bible uses many outdated or non-standard phrases and sentence structures, as does most legal text I've ever seen. I'm not a linguist or a statistician, but from my uneducated viewpoint it sounds like problems might arise in the texts that are available for training the system. Anyone know how they're planning to overcome this?
Re:fascinating by elrous0 · 2005-05-31 02:42 · Score: 5, Insightful

or at the least pick out pertinent words like "bomb."
Why do I have a funny feeling that this research isn't being funded by philanthropic foundations?

-Eric

--
SJW: Someone who has run out of real oppression, and has to fake it.
Re:fascinating by browngb · 2005-05-31 02:43 · Score: 1

Oh God, it's going to learn languages from examples? I hope they don't try this over the net, otherwise we'll have computers writing LOL, IC, and other nonsense.

--
Generally, I get bored with my replies and give up on making sense halfway through.
Re:fascinating by I_Heat_Sexylaid · 2005-05-31 02:43 · Score: 0

60% Of U.S. Believe Porting Open Source to Minor Hiper Type-R Modular Independent Cartoonists [who] Band Together for Star Trek XI Intel Preps [for] Debian Sarge ['s] First look at ["] Coming Soon, The Google Video ["] [considered harmful].

--
Slashlight! (Can't find the funk) kewl base part
Re:fascinating by fizban · 2005-05-31 02:48 · Score: 4, Funny

"Open the pod bay doors, HAL."

"STFU, Dave. LOL!"

--
+1 Insightful, -1 Troll. What can I say, I'm an Insightful Troll.
Re:fascinating by DrEldarion · 2005-05-31 02:50 · Score: 1

It's all just observation," Yarowsky adds. "Children do the same thing, but they also do it through visual stimulation and feedback. They see a book and hear the word 'book,' and eventually they learn that it's a book

What would be really incredible is if they could combine this with Google Image Search and get the computers to be able to recognize pictures as words.
Re:fascinating by Carnil · 2005-05-31 02:53 · Score: 1

Could this approach be used in natuaral language recognition software?
The system could be made to learn the translation between common language structures into well formed, parseable sentences that it could then process.
Just a thought, maybe i'm being a bit too optimistic here.
Re:fascinating by Anonymous Coward · 2005-05-31 02:54 · Score: 5, Funny

You go all the way to Iran to get gasoline? Who are you, George W Bush?
Re:fascinating by tomhudson · 2005-05-31 02:56 · Score: 0

This is great and all, but I won't be impressed until it translates the gibberish that comes from the Iranian gas station attendant everytime I stop for gas
Never work. Too much spit gums up the microphone.
Re: Fascinating by Anonymous Coward · 2005-05-31 02:58 · Score: 0

the computer could "learn" Serbian well enough to translate future documents or conversation, or at the least pick out pertinent words like "bomb."

At least now we know what this service is about. It's about extending Echolon.
Re:fascinating by MindStalker · 2005-05-31 02:59 · Score: 3, Insightful

Well the bible is hebrew, greek and latin. There are no outdated English phrases in the Bible. Now if your refering to the King James translation of the bible, obviously such would be good for teaching google Old English but not modern english. You would need a much newer translation that doesn't use old phrases. Such do exist btw.
Re:fascinating by tomhudson · 2005-05-31 02:59 · Score: 1

Oh God, it's going to learn languages from examples? I hope they don't try this over the net, otherwise we'll have computers writing LOL, IC, and other nonsense.
... and they'll be replying to half the "articles" posted on slashdot: "I AM A SCRIPT YHBT YFI HAND"
Re:fascinating by Anonymous Coward · 2005-05-31 03:04 · Score: 0

Jesus you, you motherjesuser!
Re:fascinating by Secret+Agent+99 · 2005-05-31 03:11 · Score: 1

The Bible raises a further interesting difficulty: in English alone there are many competing translations which all interpret the original Hebrew, Aramaic, and Latin differently -- sometimes very differently.

On a smaller scale, the same issue arises with other literary works. For purposes of building a corpus, which translation of the Bible, or Balzac, or Tolstoy, do you select?

If humans can't agree on what is "good" or "faithful" in a translation, how do you trust a computer?
Re:fascinating by Simonetta · 2005-05-31 03:13 · Score: 3, Interesting

What will be done about idioms? Translating these word for word often makes no sense at all...

The often-quoted examples are: "Out of sight, out of mind" becomes "invisible idiot" and "the spirit is willing, but the flesh is weak" comes out as "The meat is rotten, but the wine's great".

How many of the world's existing languages have enough text for this to even be feasible?

Ah yes, that's the tricky part. Translating for preservation near-extinct languages that are in spoken or recorded form only. A true programming challenge.

I find the Babel-Fish translator to be nearly useless and the Systran box at www.systransoft.com very helpful when selling things on eBay to people in non-English-speaking countries. When I get a question about an auction item that has little grammar cohesion and has a offshore domain, like
"How many cost you Italia he transport?", I'll run my response through Systran's translator and add the original english afterwards. More often than not the sales and PayPal transactions are successful.

I believe that machine translation will be the 'killer application' for 64-bit home PCs. ..along with DRM busting..

There are five levels of machine translation:

1) word substitution.
2) phrase substitution.
3) cohesive paragraphs and idioms.
4) light literature, magazine articles, and business.
5) classical literature, law, and diplomacy.

Each level requires at least an order of magnitude more computing power than the previous one. Babel fish is on level two and systran is on three. Google is positioning themselves to be between levels four and five.

I wish them the best of luck. Without sarcasm or irony. This is important work.

"Give me a one sentence definition of 'irony'."
"Yeah, it's where the Iranians come from."
Re:fascinating by The+Desert+Palooka · 2005-05-31 03:22 · Score: 1

Until it replaces the word book with DCS00001

*grin*
Re:fascinating by kebes · 2005-05-31 03:24 · Score: 5, Interesting

What will be done about idioms? Translating these word for word often makes no sense at all, and for me at least (no idea what the official stance is), I'd rather they substitute in idioms with the same general meaning, but for the culture being translated to.

I think this is precisely where statistical approaches can really shine. A purely dictionary-based conversion will translate an idiom word-for-word, which will make no sense at all. However, a statistical approach could be constructed to look for the "longest reliable match." So if the idiom "cat got your tongue" re-appears over and over, and is correlated to a different idiom in other languages (that may not use the word "cat"!), then the algorithm could tokenize "cat got your tongue" as a single entry that would map to something different in each language.

How many of the world's existing languages have enough text for this to even be feasible?

You're right... that's the killer. Translating using statistics (especially idioms) properly will require a huge database of samples. Even what's been suggested so far is not enough. If we want to translate technical documents, we need a new database. If we want to translate "free form writing" we need yet more data.

However, there's lots of data out there (already in digital format) that could be used... we just need people to see the potential and start using these datasets (or making these datasets available). For instance, for technical stuff there are thousands of abstracts for papers and for theses that are translated into various languages (for instance, many articles published in german are then also released in english... I live in Quebec, and every thesis abstract has to be translated into french also... etc.). Many legal documents (many of which are already available to the public) are also translated for various reasons. It would also be interesting if translators all around the world uploaded documents they had translated into some database (assuming it's nothing sensitive of course!). As this database grew, it would become more and more reliable. Let's face it, there's tons of human-based translation going on, forming a massive dataset... but by and large it's just scattered and not useable.
Re:fascinating by Temposs · 2005-05-31 03:27 · Score: 2, Interesting

Computational Linguistics is my field, so I can tell you that the problem with the current state of corpora is a lack of massive cross-language corpora over many languages.

The two sources used by Google are basically the only sources available for the kind of task we're talking about. Obviously the thing to do is work on creating more cross-language corpora, and I'm sure this is being done, but it takes much time to create a cross-language corpus on the scale that the UN documents or translations of the Bible have.

--
Knowledge is just opinion that you trust enough to act upon. -Orson Scott Card
Re:fascinating by magefile · 2005-05-31 03:31 · Score: 1

Er ... you mean that it'd be good for teaching Google old-fashioned English. Old English is not merely archaic English - it's much closer to the original Germanic ancestry, to the point that it's as similar to German as English.
Re:fascinating by Bigman · 2005-05-31 03:35 · Score: 3, Insightful

Don't forget that many works of fiction are translated into several languages. The only problem with that is persuading the copyright holders to permit their use in training computer translation systems. I'm not sure where you would stand with this legally (After all, IANAL!), so I suspect this is why Google has been using the UN documents. I would imagine these are effectively public domain; and if not, I would imagine the UN would see a reliable machine translation project worth supporting. The only downside I can see is that the UN texts are unlikely to have many idioms or colloqualisms, which would limit the resulting translators usefulness in a more general context.

--
*--BigMan--- Time flies like an arrow.. but personally I prefer a nice glass of wine!
Re:fascinating by portforward · 2005-05-31 03:39 · Score: 1

I am pretty sure the KJV is not written in Old English. Old English is pretty much another language then contemporary modern English. If you have ever seen the poem "Beowulf" in the original text you would definitely see the difference as it would be unreadable to you. In some ways it "looks" like German and has many similarities in terms of grammar to German. The KJV is written in early "Modern" English.
Re:fascinating by Harinezumi · 2005-05-31 03:48 · Score: 2, Interesting

My guess is that the statistical analysis happens not just on the word level but on the sentence level. This means that the system would handle idioms almost perfectly when there are corresponding idioms in the target language, and adequately even when there aren't any (since the hard work of coming up with standard translations for those has already been done by several generations of UN translators). There should be very high correlations between the occurrence of "God helps those who help themselves" in English and "berezhonogo Bog berezhot" in Russian, for example.
I'd be more worried about homonyms, especially ones that are used in the similar contexts. I wonder if it will be able to handle sentences like "I turn left here, right?", which manage to confuse even humans at times.
Re:fascinating by should_be_linear · 2005-05-31 03:49 · Score: 5, Funny

but the Bible uses many outdated or non-standard phrases and sentence structures, as does most legal text I've ever seen. I'm not a linguist or a statistician, but from my uneducated viewpoint it sounds like problems might arise in the texts that are available for training the system. Anyone know how they're planning to overcome this?

Harry Potter is the answer. It is several "normal language" books and is translated to all major languages. Also, program would finally figure out how to translate words like "Quidditch".

--
839*929
Re:fascinating by Octorian · 2005-05-31 03:51 · Score: 1

No, but you may find outdated Hebrew or Greek phrases.
Re:fascinating by JJ · 2005-05-31 03:53 · Score: 1

But what about grammatical limitations of this method? English-Japanese has a severe grammatical limitation (not just word order but sentence structuring) or even a morphological limitation of agglutinative vs isolating langugaes (like English-Turkish.) These language pairs just won't respond to the method, IMHO.

--
So long and thanks for all the fish . . . !!!
Re:fascinating by CustomDesigned · 2005-05-31 03:53 · Score: 1

the Bible uses many outdated or non-standard phrases and sentence structures
The King James translation of the Bible uses current standard phrases and sentence structures - for Elizabethan England. Modern translations of the Bible use, well, modern phrases and sentence structures. The original texts of the Bible used phrases and sentence structures that were current for the time they were written. For instance, new Testament Greek comes from a relatively small time period, and includes a variety of styles from authors who ranged from highly educated (Paul, Luke) to little formal education (Peter, John).

The Book of Mormon does have the problems you mention since we don't have the original text (it having vanished in a golden mist as it was magically translated), but even there, the language is a dialect of English that was wide spread in communities greatly influenced by the King James translation of the Bible. You'll find Elizabethan forms used to this day in Apalatia.

As to legalese, well, legalese is a dialect you may very well wish to have translated on the web. So including that corpus is quite appropriate.

The key to handling dialects is to categorize translations of a corpus into dialect as well as language. American Legalese would be a dialect of English.
Re:fascinating by Anonymous Coward · 2005-05-31 03:56 · Score: 0

As opposed to this article, yes - children learn it the same way, but only much laster they gain ability to use or understand advanced talking. For that, you must understand what is spoken in right context. Machine can't do that.

This stuff wil lbe fairly good, but not as good as human translators.
Re:fascinating by Anonymous Coward · 2005-05-31 03:57 · Score: 0

60% Of U.S. Believe Porting Open Source to Minor Hiper Type-R Modular Independent Cartoonists [who] Band Together for Star Trek XI Intel Preps [for] Debian Sarge ['s] First look at ["] Coming Soon, The Google Video ["] [considered harmful].

Yes, but only in former Soviet Russia.
Re:fascinating by bogado · 2005-05-31 04:03 · Score: 2, Interesting

laws appart (you could use public material like project guthemberg), I think that a translated book is, or at least seem like a bad input for this. Since the text say it expects whole sentences translated 1 - 1.

A novel or book is not translated like this, the best translation aren't word for word or sentence to sentece. Good translators almost rewrite the whole thing, some times with a different style.

Language has a lot of cultural meaning into it, and even the same language sometimes needs to be adpted to mean the same (and I am not saying anything about accent). Computers will hardly get to this point, I would expect from this a good 'well, at least I got the point' translation.

--
[]'s Victor Bogado da Silva Lins
^[:wq
Re:fascinating by Secret+Agent+99 · 2005-05-31 04:09 · Score: 1

This still doesn't get around the problem that there are competing translations of the Bible that each interpret things quite differently. (This is the case in English and French, and no doubt other languages.) Modern vs. archaic is a trivial problem compared to setting debates over the precise true meaning of the older texts (the ones in Hebrew and Aramaic). This is less of a problem for the Greek bits (i.e. the New Testament), since there is broader consensus on how to read Ancient Greek.
Re:fascinating by Anonymous Coward · 2005-05-31 04:19 · Score: 0

I think you'll find the King James version is current for Jacobean England.
Re:fascinating by EvilSS · 2005-05-31 04:20 · Score: 1

Wasn't there an article not long ago that they where working on something like this?

--
I browse on +1 so AC's need not respond, I won't see it.
Re:fascinating by Anonymous Coward · 2005-05-31 04:32 · Score: 0

You made my day. Thanks. :)

Still laughing...
Re:fascinating by cicho · 2005-05-31 04:40 · Score: 1, Insightful

Project Gutenberg has plenty of translations into English, but not other target languages, it seems. And given the nature of copyright, do you want modern machine translation to read like 19th century prose?

--
"Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan
Re:fascinating by Qzukk · 2005-05-31 04:41 · Score: 1

Law and the bible are used because a LOT of work goes into making sure that things mean the same in the translations. The language may be a little archaic (depending on which bible translation you're talking about, really) or full of legalese, but when you've got one sentence in english that tells you that you cannot build a bus stop within ten feet of a parking meter, the spanish version of the sentence says the same thing.

--
If I have been able to see further than others, it is because I bought a pair of binoculars.
Re:fascinating by aldoman · 2005-05-31 04:55 · Score: 2, Informative

You input them all, and let the statistics do their magic.

Just like your email spam filter can handle you pressing junk on stuff that isn't junk, or not junk on stuff that is, it's just all numbers and there is an inherent tolerance for small errors that will be created with this sort of system.

--
IntechHosting - Free domain, 2GB, PHP, £4.95/$8.95
Re:fascinating by tommertron · 2005-05-31 04:59 · Score: 1

It seems easy to update it to Modern English with colloquialisms. Google could just give users an option to flag a nonsensical translation. They could then collect all the flagged translations, have them translated by humans, and fed into the database, so the machine will learn for next time. This should happen less and less as the translator becomes comfortable with colloquail, modern, and casual Englsih.

--
Random rants about technology: http://technorants.blogspot.com
Re:fascinating by swb · 2005-05-31 05:02 · Score: 1

I was going to post exactly what you posted, but as I read your post I kind of wondered if anyone has done any extensive reading of the same bit of creative writing (prose, not poetry) written in multiple languages and can comment on just exactly how similar or dissimilar the writing can get.

I know from watching one too many Hong Kong action films that while their translations are bad, they can't deviate TOO much as they have to provide enough glue between the story and the action to make the movie work -- so you get stuff like "I must defend the honor of my sister and our family for the sake of our family's honor" -- it's not inaccurate, but an awkward wording.

I would think that a book could take certain liberties with a few highly specific idiomatic situations, but by and large wouldn't turn something like "To Kill A Mockingbird" into a legal dispute over a bird feeder.
Re:fascinating by Anonymous Coward · 2005-05-31 05:11 · Score: 0

Well, when the Google AI becomes sentient I guess that we are going to end up with a right wing international fundamentalist lawyer. Could be worse I suppose...
Re:fascinating by Anonymous Coward · 2005-05-31 05:16 · Score: 0

Along the lines of tokenizing an idiom so that it can be translated to another language, I wonder if this technique can be used to define idioms in the native language. For example, a new English speaker would not know what the phrase, "cat got your tongue" would mean. Perhaps running it through the program, one could learn that it meant "to be quiet," or if it were smart enough, explain the nuances of the phrase with "by saying this phrase, I wish for you to speak."
Re:fascinating by Abreu · 2005-05-31 05:21 · Score: 1

A novel or book is not translated like this, the best translation aren't word for word or sentence to sentece. Good translators almost rewrite the whole thing, some times with a different style.

This is true, and sometimes very sad. For example, The Hitchikers Guide to the Galaxys official spanish translation is simply unreadable.

--
No sig for the moment.
Re:fascinating by Secret+Agent+99 · 2005-05-31 05:23 · Score: 1

In the case of the Bible you have different interpretations that have led to wars and schisms: the very same passages of original text translated in a variety of different ways with different meanings, not just styllistically different turns of phrase.

Please explain how the "magic" of statistics can discern which of them, if any, is an accurate reflection of the orignal, and then use that result as a guide to translating between languages. You certainly can't "average" them.

I'm not saying that problems such as this will never be solved, and I do believe that statistical brute force is the surest path to useful MT, but on the other hand I don't think these problems are trivial and subject to simple solutions.

It's one thing to use formulaic bureaucratic prose to train a statistical engine to translate still more formulaic bureaucratic prose, quite another to expect easy success with richly idiomatic literary works.
Re:fascinating by mforbes · 2005-05-31 05:29 · Score: 1

The King James translation of the Bible uses current standard phrases and sentence structures - for Elizabethan England.

I thought it was Jamesian England?

note to mods: this post is not off-topic. We're talking about languages through this entire thread, and while I may be more than a little pedantic here, I'm not off-topic. Man how I hate getting modded off-topic when I'm not...

--
Allegedly real newspaper headline from 1998:
Man Struck by Lightning Faces Battery Charge
Re:fascinating by hunterx11 · 2005-05-31 05:33 · Score: 1

Actually the source languages for the Bible are Hebrew, Aramaic, and Greek.

--
English is easier said than done.
Re:fascinating by dysk · 2005-05-31 05:37 · Score: 1

Assuming you already own a copy of the book, I would think that using it to train your language system is fair use. Afterall, you're not reproducing or redistributing the book, you're merely collecting statistical informtation.
Re:fascinating by Anonymous Coward · 2005-05-31 05:49 · Score: 1, Insightful

Any organization that can prevent said "bomb" from being used is the best philanthropic foundation I can imagine.
Re:fascinating by Anonymous Coward · 2005-05-31 05:59 · Score: 0

note to mods: this post is not off-topic. We're talking about languages through this entire thread, and while I may be more than a little pedantic here, I'm not off-topic. Man how I hate getting modded off-topic when I'm not...

Ya know, if I had mod points today I'd mod you offtopic just for wasting MORE space whining to mods about how your post should or shouldn't be moderated than you did with your useless addition to this discussion.

And for anyone who cares to read this, I usually moderate ANY post that bitches about the mod system or attempts to tell a mod how to moderate their post either offtopic or redundant... And I get mod points every three to five days. If you don't like the mod system don't post or bitch to the admins, but don't tell mods how to do their job.
Re:fascinating by bogado · 2005-05-31 06:04 · Score: 1

All the good translations are like this, since this is the only way to capture the rich detais that are lost when you switch cultures. Sure this is a double edged knife, since a bad translator can very easily destroy a work in the process.

--
[]'s Victor Bogado da Silva Lins
^[:wq
Re:fascinating by juanescalante · 2005-05-31 06:04 · Score: 1

It would also be interesting if translators all around the world uploaded documents they had translated into some database
Yeah, I'm sure they would happily contribute to this program taking over their jobs.
Re:fascinating by Knuckles · 2005-05-31 06:07 · Score: 1

HHGTTG seems to be especially vulnerable to this because of the word play. I have once recommended it to someone, and the person read it in German (I hadn't thought of that possibility) and told me very soon that it isn't funny. I borrowed the German version, and the person was totally right. Not funny at all. In fact, totally horrible

--
"When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
Re:fascinating by Anonymous Coward · 2005-05-31 06:07 · Score: 0

And the translations of Harry Potter match very well. Sometimes sentences are arranged differently or have slightly different meanings, but at the paragraph level it matches perfectly.
Re:fascinating by bogado · 2005-05-31 06:08 · Score: 1

It does has some work in other languages, I agree that it's a minority, but it do have penty of translated to english works, it should not be dificult (for google even less dificult) to digitize the orinal.

> do you want modern machine translation to read like 19th century prose?

And by the way things are escalating in the copyright arena we should consider our selves luck that those aren't 17th century book...

--
[]'s Victor Bogado da Silva Lins
^[:wq
Re:fascinating by Knuckles · 2005-05-31 06:10 · Score: 1

The Greek of the Bible is outdated as a whole by definition, since it's a dead language. Modern Greek is different.

--
"When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
Re:fascinating by rca66 · 2005-05-31 06:11 · Score: 1

Why can't a dictionary be made of nouns, of verbs?

You seriously think, this hasn't been done? An MT dictionary is quite a complex beast, where translations are connected to the grammar and contain a lot of semantics. Translations depend on whether the verb is used transitive or intransitive, whether it is used with a human object or a thing, and so on and so on.

Why can't we have it statistically analyze the grammar for ambiguous words?

Well, actually my company is about to release a product which does exactly this... Generally our system is based on rules and grammars (and a big and complex dictionary), but enhanced by some statistical information.

What will be done about idioms?

Good systems have coded a lot of idioms.

How many of the world's existing languages have enough text for this to even be feasible?

The afford to write rules and dictionaries for a translation is quite high. This is one of the big advantages of statistical methods: as this is a quite automatic task, you can do this for a lot of languages, even exotic ones, where the market wouldn't be big enough to build a full blown system. Quality might not be overwhelming, but at least you can get an idea of what the text is about.
Re:fascinating by ylon · 2005-05-31 06:22 · Score: 1

Your explanation of how the Book of Mormon was taken back is entirely fabricated and untrue. We won't get onto that history, but suffice it to say that you're incorrect.

The English used in the Book of Mormon is more recent than that of the Bible so it would be a better candidate, than say, the Bible. Yes, being from Appalachia myself I speak with folks quite often who portray such speech. Very interesting, from my own linguistics standpoint. (It also helps to maintain that period of time's phraseology; it would be neat to get works from various generations to see its effect linguistically on the engine.)

Plus by using the Bible and Book of Mormon we still keep the familiar form of "you" alive which is important for English since we've lost it unlike other languages such as Spanish. Statistically using scriptures as such will be similar to Rosetta stone to the engine, but will be overridden by the higher statistics of documents, as they say, such as the UN documents. Overall this should create a very balanced translation technique.
Re:fascinating by Arjen · 2005-05-31 06:29 · Score: 2, Informative

...the flesh is weak" comes out as "The meat is rotten, but the wine's great".

Seems like I have to repeat myself over and over again, since this is an urban legend. According to MACHINE TRANSLATION: An Introductory Guide:

The `spirit is willing' story is amusing, and it really is a pity that it is not true. However, like most MT `howlers' it is a fabrication. In fact, for the most part, they were in circulation long before any MT system could have produced them (variants of the `spirit is willing' example can be found in the American press as early as 1956, but sadly, there does not seem to have been an MT system in America which could translate from English into Russian until much more recently --- for sound strategic reasons, work in the USA had concentrated on the translation of Russian into English, not the other way round). Of course, there are real MT howlers. Two of the nicest are the translation of French avocat (`advocate', `lawyer' or `barrister') as avocado, and the translation of Les soldats sont dans le café as The soldiers are in the coffee. However, they are not as easy to find as the reader might think, and they certainly do not show that MT is useless.

BTW, since this book is no longer available in the stores, the whole contents is placed online. I recommend reading this book to anyone who is interested into the subject of MT. It really is a nice introduction into the subject.
Re:fascinating by Knuckles · 2005-05-31 06:40 · Score: 1

That's something I wondered about too. I am a layman, but my understanding is that Chinese is extremely context sensitive, I think older versions more so than modern Simplified Chinese.
The literal texts of the magistrates of the imperial era seem to be so context sensitive that you can only really understand them if you are able to draw cross references to other texts, because some writer may have once used a specific sign in a novel context, changing its meaning subtly.

The accepted "translations" are often totally arbitrary, more related to the context of the first translators (more often than not European colionalists) than to the original context. Which for example leads many serious students of T'ai Chi Ch'uan to get into learning Chinese, because the accepted translations are totally off. For example the term "Ch'i is commonly translated as some form of "Energy". Which misleads laymen, and more often than not the sceptics that seek to debunk Ch'i.
In fact, it has nothing to do with what we call "energy" at all. This word is a projection of the European translators, who came into contact with T'ai Chi Ch'uan at a time when "energy" was the hip word in Europe (middle 19th century).

I don't know what this means for the discussion at hand, but it sure sounds complicated ;)

--
"When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
Re:fascinating by gcatullus · 2005-05-31 06:54 · Score: 1

This could actually help improve the translations of the Bible. Because you could load in the enourmous amount of ancient literature into the database. The program could "learn" the Aramaic, the New Testament Greek, the Hebrew,the Latin Vulgate, and a modern language. I imagine this would be akin to translating from multiple documents, using multiple dictionaries and mulitple concordances. That would be almost impossible to do manually, but the results would be very interesting.
Re:fascinating by coastwalker · 2005-05-31 08:30 · Score: 1

Theres only one word for this - Google-fish.

--
Facts are history now plebs have politics for religion on social media.
Re:fascinating by Anonymous Coward · 2005-05-31 08:30 · Score: 0

"As Yarowsky explains: "Say you want to teach a computer how to translate Chinese: You give the computer 100,000 sentences in English and the same 100,000 sentences in Chinese and run a program that can figure out which words go to which words. If in 2,000 sentences you have the word Washington, and in about the same number of sentences you have the word Huashengdun, and they occur in the same place in the sentence, these words are likely translations.

Call that whatever you want, but it is not a translation.

A translation is much more than replacing individual words in one language with some individual words from another language:
- the translation has to reproduce the meaning of the text. Babelfish and any other existing automatic translation system fail blatantly here.
- the translation has to reproduce the language style in which the text was written. Such as the puns in Piers Anthony's Xanth series, the Newspeak in Orwell's 1984, or rhymes and rhythm in a poem
- the translation has to translate the cultural background of a text

Thomas
Re:fascinating by MoralHazard · 2005-05-31 08:48 · Score: 2, Interesting

The only problem with that is persuading the copyright holders to permit their use in training computer translation systems.

As long as the translations have been created in advance, and you can obtain copies of the works in question, it should be fine, legally. I cannot see a way that a court could find the machine-state of a translation machine to be a "derived work" in the copyright sense, and it's certainly not making any literal copies.

Now, someone could distribute a text under a license agreement that forbid this type of usage, but a court decision may well find that it's a protected "fair" use. And I can't think of many texts that have license agreements that would restrict something like that.
Re:fascinating by rainman_bc · 2005-05-31 08:48 · Score: 1

Greek for the Bible is impossible for anyone who speaks modern Greek to understand. The only similarity are the letters. Even the accents were much more complicated in Ancient Greek.

When I go to my parents church which is Greek Orthodox (not that I've been there in the last 10 years though), i can't understand a word they say, even though I'm fluent in Greek, albeit my spoken is much better than my written.

--
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
Re:fascinating by psetzer · 2005-05-31 08:50 · Score: 1

"In the beginning, the gods created the Heavens and the Earth." The difficulty in properly translating the Bible is that you have to maintain orthodoxy in addition to keeping the general meaning the same. I gave about as close of a translation of the first line of Bereshit (Which translates roughly to "in the beginning") as most scholars would agree upon, but if you noticed, the original Hebrew really refers to the gods in the plural. This, according to church teachings refers to the three parts of the one God, and not some pantheon.
A machine translator, no matter how sophisticated isn't going to be able to perform the secondary purpose of any translation of the Bible, which is to help support the translator's opinions and views on the Bible. If you want to cement the Bible as the plain and infallible source of all church teachings (some Protestant groups), yous translation is going to look profoundly different than one which shows the Bible as a mysterious and sometimes opaque text that sometimes requires training to understand its mysteries (the Catholic Church's official position).

--
"Anyone who attempts to generate random numbers by deterministic means is living in a state of sin." -- John von Neumann
Re:fascinating by autopr0n · 2005-05-31 09:02 · Score: 1

Don't forget that many works of fiction are translated into several languages. The only problem with that is persuading the copyright holders to permit their use in training computer translation systems. Why would this legaly be a problem at all?

--
autopr0n is like, down and stuff.
Re:fascinating by CustomDesigned · 2005-05-31 09:02 · Score: 1

The English used in the Book of Mormon is more recent than that of the Bible so it would be a better candidate, than say, the Bible.
You missed the point. The Bible is *not* an English book! It is a Jewish book. Most of the authors were Hebrews and spoke Hebrew, although some wrote in Aramaic or Greek, and there was one Greek, one Roman and one Babylonian in the lot.

There are English translations, of which the King James is just one, but the Bible was completed centuries before King Authur. More modern translations like NIV do not have the archaic English forms.
Re:fascinating by autopr0n · 2005-05-31 09:09 · Score: 1

The King James translation of the Bible uses current standard phrases and sentence structures - for Elizabethan England. Actualy the King James version of the bible used english that was outdated at the time, in order to make it sound older then it was.

--
autopr0n is like, down and stuff.
Re:fascinating by Anonymous Coward · 2005-05-31 09:12 · Score: 0

On the contrary, this is a perfect example of where a statistical approach to machine translation could work well. A purely statistical translator would not know or care that "elohim" is a grammatical plural; it would observe that the word is usually translated into another word and use that other word, which would just happen to be a singular.

(I happen to agree that a machine-translated bible would be a disaster. Just not with your particular example.)
Re:fascinating by ylon · 2005-05-31 10:08 · Score: 1

Now that is true, and you get the same from the Book of Mormon from linguistic studies done. However, enough of the thought process has become part of folks that some Hebrewisms are integrated into English so it still wouldn't hurt too much as it would still be "modded down" by the statistical analysis of the engine.
Re:fascinating by dillon_rinker · 2005-05-31 10:17 · Score: 1

I believe that machine translation will be the 'killer application' for 64-bit home PCs. ..along with DRM busting..

Doubt it. There is nothing above the bit level that 64-bit machines can do that 32-bit machines can't. I challenge you to name one such task - again, above the bit level, so "performing a bitwise AND on two 64-bit values with a single instruction" doesn't count...

"Give me a one sentence definition of 'irony'"
"Like silvery, only harder."
Re:fascinating by LilGuy · 2005-05-31 10:21 · Score: 1

If it can translate everything from any single press conference Bush has held, I'd be impressed.

--

You're nothing; like me.
Re:fascinating by LilGuy · 2005-05-31 10:29 · Score: 1

Maybe that's not the point. If the 64 bit processor can perform a bitwise AND on two 64 bit values with 1 instruction, that is gonna be a hell of a lot more performance than a 32 bit processor will give you.

--

You're nothing; like me.
Re:fascinating by cp.tar · 2005-05-31 11:30 · Score: 1

Actually, it is not really important; machine translation is not likely to be that good for literary works anyway.
However, if we could get machines translating all the millions of intensely boring legal documents emerging from the UN or EU (since all the documents have to be translated to all the official languages of the EU ASAP), it would get all those translators pretty much fired, but probably not altogether unhappy - they are usually the ones that are good enough to get a new job easily.

Let's face it: Not even human translators are often up to the challenge of translating literary works; HHGttG and Terry Pratchett are some of the most perfect examples, since they use such an amount of wordplay that would drive even the most experienced translators to drink and/or madness. BTW, translations of Pratchett's books to Croatian are just plain awful.

Anyway, since even human translators have difficulty with literary works - especially poetry, since not only the meaning must remain the same, but also the meter has to be adapted to the target language (i.e. English is an iambic language, while Croatian Standard is trochaeic), then rhyme... I just wouldn't trust a computer with that... at least not at this techlevel.

Technical documentation, legalese... are much different. I used to work as a translator of technical documentation (instruction manuals) and it is so very dull... Always the very same things, always the same phrases... On the good side, I now know how to install, handle and program anything, from VCRs to oil pumps (not much use, though, as I don't own either). Anyway, that is a job best left to computers; the only excitement comes from Italian translations of English texts and Korean translations of Chinese texts to English; then you have to play the guessing game and invent the whole text from scratch - and no, you don't get to see the device you're describing. Everything else is copy/paste, find/replace and a lot of text formatting. When computers can manage that on their own, I'll be quite happy; and if they (or 'we', as I have chosen this as my line of study and future line of work) manage to build a machine translator that will successfully translate literature without some heavy AI, I'll be most sincerely amazed.

Then I'll disbelieve it and start looking for the human they'd put inside the computer.

--
Ignore this signature. By order.
Re:fascinating by NetSettler · 2005-05-31 11:30 · Score: 1

Well the bible is hebrew, greek and latin.

Actually, it would be interesting to see a Bible translation that is objectively obtained by a neural net that is not seeking to push a particular political point of view. Of course, it might just turn into a fight over which documents the neutral translator was trained on, and whether those documents were, themselves, political in some way that influenced the translator. But still, it would seem a fun experiment to try. Perhaps we could learn something about the agendas we introduce into language by having an agenda-free program.

Equally fun might be a kind of "translator diff" that noticed biases in someone's translations.

--
Kent M Pitman
Philosopher, Technologist, Writer
Re:fascinating by Anonymous Coward · 2005-05-31 11:38 · Score: 0

To end the confusion on right and right I propose we change the word "right" as used in direction to the word "reft." Unfortunately Asian cab drivers will go out of business: "So you want me to take a reft after taking that last reft and then make a reft on broadway and turn reft at the rast red right?"
Re:fascinating by Dulimano · 2005-05-31 11:39 · Score: 1

One method is the automatic detection of bilingual websites.

Our reasearch team used a manual method. We have built a large (2 million sentences) Hungarian-English parallel text database using Project Gutenberg, open source software documentation, EU legal texts and other online resources.

It has a simple web query interface at

http://szotar.mokk.bme.hu/hunglish/corpus

The search is still completely unoptimized, and the user interface is spartan, but it works.
Re:fascinating by Anonymous Coward · 2005-05-31 11:51 · Score: 0

How about addressing more than 4gb of memory?
Re:fascinating by cyberon22 · 2005-05-31 12:21 · Score: 1

I work with the Adso project, an attempt to create an open source Chinese-English translation and language processing engine. We're quite pleased with the results to date and have set up a language learning blog and online text processing site for people to play with.

http://www.adsotrans.com
http://www.newsinchinese.com

I also -- for unrelated reasons -- read some of the late Qing stuff and can testify that the language is basically completely different. Customs of word usage vary dramatically (single characters preferred to bigrams), while most official documents lack punctuation.

Anyway, both experiences make me very skeptical of statistical translation approaches in the Chinese-English space, since any good translation is almost always a non-literal translation, and there needs to be room in any system for ambiguity in how terms are translated contextually. I suspect the statistical approaches will work much better in the better specified romance languages.

Power to them if they can do it.
Re:fascinating by Anonymous Coward · 2005-05-31 12:51 · Score: 0

About idioms, I would say poetry works the same. I was asking a french chick about translating these song lyrics, but got back "it has too many puns". Puns being what? Multiple word meanings, or just the sound of them (being the same as other ones). For this, I would say it should store human translated versions and use those. (Like someone translating a part (they might not know is) from shakespear into another language)

I happened to see "listen to the new white teeth single from nin", at first thinking I've been reading the song-title wrong all along, but then figured: it's too original!

So as for poetry and other such styles using so many parts of the language, the only answer is "tough luck".

Although, I was thinking about how making translations opensource - both so songtitles and such things get added quickly, but also free (or small fee) being able to order translations really quickly (extra cheap with sweatshop-orders from 10 30 year old asians!)

And another thing - being able to select from which year/age-grammatics you want translated, and the language type - legalese and so, and as for poetryish writing, giving you options for each word, or several versions.

Does it understand punctuation rules, will this post translated to spanish have the upside down question marks where they're supposed to be?

I remember getting that from an altavista translation, and for translation back from chinese it added caps at start of lines (not flawless, but hey).
Re:fascinating by Anonymous Coward · 2005-05-31 13:42 · Score: 0

Seems like I have to repeat myself over and over again[.]
You could always shut the fuck up.
Re:fascinating by OzRoy · 2005-05-31 14:25 · Score: 1

They already tried to do that once. I can't remember where I read about it, but they were trying to teach a computer what a tank looked like.

They thought they had finally taught the machine what it looked like, but occasionally it got things completely wrong. After a long analasis they finally worked out that it thought the shades of the sky was a tank, or something like that.

That is the biggest problem they have yet to overcome, and that is computers can't determine what is an object in a picture, and what isn't.
Re:fascinating by Skynyrd · 2005-05-31 14:59 · Score: 1

So, just how much sooner do /. subscribers see the article before the rest of us?

That was one hell of a first post.
Re:fascinating by ericf · 2005-05-31 17:55 · Score: 1

Probability models can handle non-exact or "fuzzy" matches just fine. If your corpus includes a phrase like "The shirt costs $20," then the probability that its translation is acceptable for "The blouse costs $30" is higher than that of "Freedom is on the march."

Stochastic models can also deal with idioms for the same reason -- if your corpus includes equivalences between "it's raining cats and dogs" and "il tombe des cordes" then that idiom will be learned.

Word boundaries are usually dealt with by using n-grams. Given knowledge of a language's vocabulary, you pick a size of "n" that serves as a word length. You then identify "words" by sliding a window of size n across the corpus, picking up every sequence of "n" consecutive characters, including whitespace. This avoids the problem of having to devise a great word tokenizer. It's also essential to dealing with ideographic languages or languages, like Thai, in which word boundaries can only be identified with a dictionary.

Alternate character systems are tough. Japanese has three writing systems and it's possible to say "the same" thing in more than one of them. You want a really big corpus -- one that provides coverage of all systems.

Punctuation rules can be dealt with stochastically. In fact they have to -- translation "units" are typically sentences and so you need a model that knows how to find sentences in a given language. At first blush you might think it's just periods, question marks, and exclamation points. But remember that sentences can end with a period. They sometimes end with elipses. Periods are used in addresses (St. Ave.) and in names (Dr. John Q. Blankenship). And quotations sometimes contain question marks and exclamation points.

Sentence segmentation for translation is made even thornier when you consider that it's a many-to-many mapping. That is, one sentence in English might equal three in French. But in Chinese, you might map one sentence to that same English sentence *plus* the one before it.

The problem with using a corpus like UN translator transcripts is that while it's fine for dealing with (reasonably) civil discourse among educated elites, it's less good for learning the argot of Saudi-born terrorists who've been hiding in an Afghani cave for four+ years. Just as there aren't likely to be many Texas-isms or Yorkshire-isms (or whatever dialect you like) in UN transcripts, there sure isn't going to be a lot of coverage for regional Arabic variations. And even if there were, humans will simply change the rules of their spoken language, leaving the MT trainers with no good corpus to feed their models. NSA's onto you because you used the word bomb? Start calling it something else.

There is an open source implementation of the "maximum entropy" approach to statistical natural language processing that is used for systems of this kind. If you're curious, it (and the scientific work on which it is based) would be a good place to start: http://opennlp.sourceforge.net/.
Re:fascinating by bobcote · 2005-06-01 00:28 · Score: 1

They still have such a long way to go. I did quick test of a few idioms and the results were, at best, amusing.

I attempted to translate a technical email and the result was total nonsense.

The Google tool is no better nor is it any worse than the others out there.
Re:fascinating by JJ · 2005-06-01 01:14 · Score: 1

/fII suspect the statistical approaches will work much better in the better specified romance languages./fI

Well, that is the rub. Esperanto was set up to handle the maximum variability of Greek and Latin based languages. It works well for translating Italian to Portugese (for example) and can even handle Polish to Spanish fairly well. But it doesn't handle English particularly well (or should I rephrase that, English speakers don't handle Esperanto particularly well.) And when you toss things in (like the Chinese monosyllabic, bisyllabic, trisyllabic complexity morphology) Esperanto and the statistical approach tends to flail.

--
So long and thanks for all the fish . . . !!!
Re:fascinating by sepluv · 2005-06-01 02:07 · Score: 1

Semantics is necessary for translation (by human or machine).

--
Joe Llywelyn Griffith Blakesley
[This post is in the public domain (copyright-free) unless otherwise stated]
Re:fascinating by doombob · 2005-06-01 05:51 · Score: 1

Sounds like a neural network to me...
Re:fascinating by MindStalker · 2005-06-03 02:23 · Score: 1

There are plently of religiously neutral people out there that could translate it nonbiasly. Personally I would love to see any complete retranslation. But it has yet to happen. Many many mistakes that were introduced into the original KJV are still contained in the newest translations. There is a pretty good retranslation done of the old testimate its called the "English Hebrew Bible." Has a lot of interesting differences which from my limited study seem to be better translations.

Needs a *bit* more work... by TripMaster+Monkey · 2005-05-31 02:12 · Score: 4, Interesting

Just to illustrate, here's the summary of this story, translated to German and back to English using Google's current version:

Google gave a Glimpse of its machine Uebersetzungsystems the following production at the factory route of the A May 19 to journalists. Google. "Google Blogoscoped" offers an excellent overview of the representation. The system was trained with the nation documents as korpus. This korpus is something 20 billion word value of contents. It uses the existing target language translations (takes place via human translators at the U.N.) Samples find, which use it then to establish guidelines for translating between those languages. Apparent it was successful, where the present version had failed, if it translated certain cliches. If everyone of forming a serious were capable, of the M.Ue., those would go to have having to Google.

--
____

~ |rip/\/\aster /\/\onkey

Re:Needs a *bit* more work... by mattmentecky · 2005-05-31 02:14 · Score: 1

Isnt that what TFA is about? That the new version needs a "bit" more work so they developed a new system?
Re:Needs a *bit* more work... by TripMaster+Monkey · 2005-05-31 02:18 · Score: 1

That's why I said 'just to illustrate'.

--
____
~ |rip/\/\aster /\/\onkey
Re:Needs a *bit* more work... by Anonymous Coward · 2005-05-31 02:24 · Score: 1, Informative

The current version of "Google translates" is based on Babelfish (a rule-based machine translation system), it isn't based on Google's research into SMT (statistical machine translation)
Re:Needs a *bit* more work... by JWeinraub · 2005-05-31 02:27 · Score: 0

For the time being, machine language isn't supposed to do your German homework well enough the teacher actually belives you did it. I think it's useful for getting a rough idea. Once it's in the target language, since we are all fluent in the language it translates into, we can figure out what its supposed to say in perfect grammar. And when those weird words that don't get translated, I am sure we can just Google it and find out what it means. This still does have a long way to go, but it does do a decent job. However, with the Google Browser, I am sure it will be neat seeing blog comments in several different languages. As far as the reader of the site is concerned he thinks its a blog in their native tongue...So reading the site untranslated might have several different languages as comments, which can be neat all on its own.
Re:Needs a *bit* more work... by grasshoppa · 2005-05-31 02:32 · Score: 0, Offtopic

You gotta be kidding me, this was modded as flame bait?

MODS: Wake the fuck up. THIS post can be considered flame bait ( watch, I'll be modded insightful or interesting ).

--
Mod me down with all of your hatred and your journey towards the dark side will be complete!
Re:Needs a *bit* more work... by Anonymous Coward · 2005-05-31 02:43 · Score: 0

I modded this as redundant, but since it's now modded 5, Interesting, I'm going to post to get my mod points back.

It shouldn't be news to anyone that the current google/altavista-translator is crap. It would have been much more interesting to see how the new translator would handle the news blurb.
Re:Needs a *bit* more work... by I_Heat_Sexylaid · 2005-05-31 02:46 · Score: 0

["] Life Exists On Other Planets [,] Platforms is Harmful [a] Blue Line 580W PSU Review [." a] Success In Two To Three Years. [for the] Mac Mini Look-Alike [is] Coming Soon [in] new Battlestar Galactica Episodes [with a] Translator for Skype Users [considered harmful].

--
Slashlight! (Can't find the funk) kewl base part
Re:Needs a *bit* more work... by Detritus · 2005-05-31 02:59 · Score: 1

It's much more readable than most efforts at machine translation.

--
Mea navis aericumbens anguillis abundat
Re:Needs a *bit* more work... by Anonymous Coward · 2005-05-31 03:08 · Score: 0

I agree. This paragraph in the article was translated to portuguese and then back to english.

Original:
Still, many people can't speak English. The collected, shared knowledge that makes up the web is therefore only partly accessible to them. The reverse, of course, is true as well.

Translated back:
Still, many peoples cannot say the English. The collected knowledge, shared that it makes the fotorreceptora leather strap above is consequently only in accessible part they. The reverse, of course, is true also.
Re:Needs a *bit* more work... by zoney_ie · 2005-05-31 03:10 · Score: 1

To be honest, I don't regard that as all that bad for current machine translation. The fact they think they have something that will be at least some bit better than the current version is great.

I mean in fairness - it's nearly good enough as it is to grasp the story (reading that translated from German version). For any other purpose, hand translation is required any ways (for any presentation purpose, even ordinary English needs tweaking, much less translated text). As long as this improvement increases the readability of machine-translated text, that's a good enough advance. Eventually, sure, we want translation approaching human work, but the current advances are certainly ensuring the utility of machine translation.

In essence, it is becoming "good enough" for everyday "I want to read this foreign language document" usage.

--
-- *~()____) This message will self-destruct in 5 seconds...
Re:Needs a *bit* more work... by Anonymous Coward · 2005-05-31 03:12 · Score: 0

Nice, but it's missing something about iPods and how evil George W Bush is.

Slashcode goes on-line August 4th, 1997. Editor decisions are removed from story selection. Slashcode begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th, 2006. In a panic, they try to pull the plug...
Re:Needs a *bit* more work... by Cow+Jones · 2005-05-31 03:52 · Score: 1

Google gave a Glimpse of its machine Uebersetzungsystems [..]

To be honest, I don't regard that as all that bad for current machine translation.

Until you compare it with a translation done by a real human.
Here is what it looks like when I translate the blurb into German and back again (and I'm not a professional translator):

Google gave journalists a glimpse of its next generation machine translation system at a May 19th Google Factory Tour. "Google Blogoscoped" offers an excellent overview of the presentation. The system has been trained using the United Nations Documents as a corpus. This corpus is some 20 billion words worth of content. It uses existing source and target language translations (done by human translators at the U.N.) to find patterns it then uses to build rules for translating between those languages. Apparently it was successful where the current version had failed in translating certain phrases. If anyone were capable of making a serious go of MT, that would have to be Google.

--

Ah, arrogance and stupidity, all in the same package. How efficient of you. -- Londo Mollari
Re:Needs a *bit* more work... by Anonymous Coward · 2005-05-31 03:55 · Score: 0

You can do that automatically at http://hedman.ca/cgi-bin/loopy.py . Or at least until the server melts down.
Re:Needs a *bit* more work... by Anonymous Coward · 2005-05-31 04:40 · Score: 0

Remeber also that this is the effect of *TWO* translations. You have compounded the translation errors.

My question is how was the translation from English into German? Was that readable?
Re:Needs a *bit* more work... by rnelsonee · 2005-05-31 04:46 · Score: 1

I believe that the translation engine that's used in your link uses Google's old translation tech, which is based on Babelfish (I think - I could be wrong - but once Altavista went into obscurity, this translater popped up with the exact same interface). It's been around a while, and has a limited set of languages it can translate to. Google's new code is supposed to end up much better than this, but I have no idea how mature it is, so there's a good chance it's still not up to par with the Babelfish code.

Google's translator by bcmm · 2005-05-31 02:13 · Score: 2, Interesting

So what powers Google's current translator? I have seen it give word-for-word the same as Babel on some occasions (but with better handling of non-ASCII characters).

--
# cat /dev/mem | strings | grep -i llama
Damn, my RAM is full of llamas.

Re:Google's translator by iantri · 2005-05-31 02:22 · Score: 5, Informative

SystranSoft's Systran is behind almost all of the machine translation srevices on the Internet, lincluding Google's.
Re:Google's translator by Nytewynd · 2005-05-31 02:26 · Score: 1

From the sounds of things, Google learns with a neural network. It has the ability to learn new mappings based on pattern matching. Babblefish sounds like a distinct mapping of phrases that have been hand coded.

Theoretically, Google can get better at translating over time, as it's neural network learns better connections. It might even get better than a human translator if it goes long enough. There will always be small discrepancies, but if the bulk of the text is correctly translated, that would be good enough.

--
/. ++
Re:Google's translator by metlin · 2005-05-31 02:27 · Score: 1

Wow, that's just fantastic!

Thanks, I was looking for some of the less common languages, and it turned out that Systran has those.

Owe you one, mate.
Re:Google's translator by Potor · 2005-05-31 03:05 · Score: 1

Systan is a usually a joke. Although, the last couple days it has been surprisingly giving me some decent translations. But anyway, now I am pissed. Google is becoming my competitor! I am a translator, and just finished a 400 page book (Dutch -> English). cheers, potor
Re:Google's translator by cecille · 2005-05-31 03:13 · Score: 1

Sounds like that to me too.

I was working with a prof last semester who liked to talk a lot about AI applications to language processors...I think he was going to do his research in it and then got side-tracked onto another project. At any rate, he was giving me a quick rundown on how to get funding and showed me this paper he based his first research proposal on, and it was on self-organizing structures and the applicaiton to language. Essentially, the reserachers trained this NN to parse basic grammar and then let it run free for a while for some unsupervised training. What they found was that the program had lumped together the types of things that humans would normally lump together in groups - verbs with verbs, nouns with nouns etc. Really interesting results, especially considering some of the linguistics theories about the base structures of languages are mostly the same. Seems computers may pick up on that somewhat as well, which would make a NN an excellent tool for things like translations.

Incidentally, if anyone happens to know what paper I'm talking about...I'm totally drawing a blank on where this thing came from...it would be really appreciated. very interesting stuff, and I'd like the chance to check it out again.

--
...no two people are not on fire.
Re:Google's translator by JohnFluxx · 2005-05-31 03:15 · Score: 1

It can't get better unless it asks the user to say whether the translation was good or not, and perhaps even ask the human to give the correct version.
Re:Google's translator by mOdQuArK! · 2005-05-31 03:16 · Score: 2, Insightful

I am a translator,

Well, if their service is free and works well (not necessarily perfectly), you now have a tool which should let you translate that entire book in about a week (assuming most of the week will be spent checking the translation & preserving the "flavor" of the source).
Re:Google's translator by rca66 · 2005-05-31 03:22 · Score: 1

Google is becoming my competitor! I am a translator

It's unlikely that anybody of us will see an MT-System which can translate a book with a reasonable result. But MT could help the translator speeding up his task as it might translate the easy sentences and the human translator corrects it and translates the rest.
Re:Google's translator by bhiggins80 · 2005-05-31 04:07 · Score: 1

If memory serves, the Google translator is powered by happiness.
Re:Google's translator by autopr0n · 2005-05-31 09:18 · Score: 1

From the sounds of things, Google learns with a neural network. It has the ability to learn new mappings based on pattern matching. Babblefish sounds like a distinct mapping of phrases that have been hand coded.

Do you know anything about machine learning? It doesn't sound like a neural network at all. NNs are good a simple function guessing from a fixed number of inputs, but wouldn't work with arbitrary input spaces like text.

If you don't know what you're talking about, don't.

--
autopr0n is like, down and stuff.
Re:Google's translator by Nytewynd · 2005-05-31 15:16 · Score: 1

Do you know anything about machine learning? It doesn't sound like a neural network at all. NNs are good a simple function guessing from a fixed number of inputs, but wouldn't work with arbitrary input spaces like text.

If you don't know what you're talking about, don't.

As a matter of fact, I have a minor in artificial intelligence and another in cognitive psychology. That means I understand neural nets pretty well. In fact, I know you can teach one language. It can learn language with pattern recognition also. You need to help it during the learning process, but as it learns more, it also starts adjusting itself.

Humans learn language with neural nets, so machines should be able to also.

--
/. ++

Bork bork bork! by AtariAmarok · 2005-05-31 02:15 · Score: 4, Funny

Here is the result as interpreted by the Swedish Chef:

"Guugle-a gefe-a a Gleempse-a ooff its mecheene-a Uebersetzoongsystems zee fullooeeng prudoocshun et zee fectury ruoote-a ooff zee A Mey 19 tu juoorneleests. Guugle-a. "Guugle-a Bluguscuped" ooffffers un ixcellent ooferfeeoo ooff zee representeshun. Zee system ves treeened veet zee neshun ducooments es kurpoos. Thees kurpoos is sumetheeng 20 beelliun vurd felooe-a ooff cuntents. It uses zee ixeesting terget lungooege-a trunsleshuns (tekes plece-a feea hoomun trunsleturs et zee U.N.) Semples feend, vheech use-a it zeen tu istebleesh gooeedelines fur trunsleteeng betveee thuse-a lungooeges. Epperent it ves sooccessffool, vhere-a zee present ferseeun hed feeeled, iff it trunsleted certeeen cleeches. Iff iferyune-a ooff furmeeng a sereeuoos vere-a cepeble-a, ooff zee M.Ue-a., thuse-a vuoold gu tu hefe-a hefeeng tu Guugle-a."

Looking forward to a www.borkle.com which returns all its results in such a format.

--
Don't blame Durga. I voted for Centauri.

Re:Bork bork bork! by spiffturk · 2005-05-31 03:11 · Score: 1

You can modify your google settings so that your results are in bork bork bork. There's also l33t-sp35k by choosing "Hacker" as the language. This doesn't translate the pages for you, but shows all of google's messages in the chosen language.

--
Will
Re:Bork bork bork! by DarthVain · 2005-05-31 03:57 · Score: 1

OK you made make laugh out loud and snort at work... I have to at least sound like I am doing work!

Integrate with GMAIL! by RubberDogBone · 2005-05-31 02:16 · Score: 5, Interesting

Make this work with Gmail and I'd even pay money for it!

Tired of getting email from Amazon.DE on my Gmail account and having to copy and paste it over to Babelfish.

That would be very useful for me.

--
Sig for hire.

Re:Integrate with GMAIL! by Anonymous Coward · 2005-05-31 03:02 · Score: 2, Funny

Why are you subscribed to Amazon.de mailing list if you don't speak German?!?!? How are you gonna read those German books?!
Re:Integrate with GMAIL! by RubberDogBone · 2005-05-31 03:19 · Score: 1

They are order confirmations, not a mailing list.

Amazon.de carries things that Amazon.com does not. Same with Amazon Japan.

--
Sig for hire.
Re:Integrate with GMAIL! by Anonymous Coward · 2005-05-31 03:30 · Score: 0

um, why not learn german?
Re:Integrate with GMAIL! by yppiz · 2005-05-31 03:42 · Score: 1

Finally, I'll be able to understand all the Chinese and Russian spam in my Inbox!

--Pat
Re:Integrate with GMAIL! by ornil · 2005-05-31 05:01 · Score: 1

Finally, I'll be able to understand all the Chinese and Russian spam in my Inbox!

And you'd be surprised to learn that one of the most notorious Russian spammers invites you to learn English:)
Re:Integrate with GMAIL! by Kahlus · 2005-05-31 05:30 · Score: 1

There is a nice extension for firefox that will give you a right click translation for a selected body of text.

I use it all the time to read russian webpages that get linked to me.
Re:Integrate with GMAIL! by Anonymous Coward · 2005-05-31 07:00 · Score: 0

"Why are you subscribed to Amazon.de mailing list if you don't speak German?!?!? How are you gonna read those German books?!"

Everyone is subscribed to Amazon.de mailing list...
Re:Integrate with GMAIL! by Anonymous Coward · 2005-05-31 07:52 · Score: 0

Wait a minute... they get linked to you?
Re:Integrate with GMAIL! by cpeterso · 2005-05-31 08:28 · Score: 0

In Nazi Germany, Amazon.de mailing list is subscribed to YOU!

--
cpeterso

Coming soon...... by Cmdr+Whackjob · 2005-05-31 02:16 · Score: 0, Flamebait

www.googledot.org and www.appledot.org

Anyone care to make a bet? by Weaselmancer · 2005-05-31 02:17 · Score: 4, Funny

That Microsoft will announce a new revolutionary language translation service sometime in the next two weeks or so?

--
Weaselmancer
rediculous.

Re:Anyone care to make a bet? by Anonymous Coward · 2005-05-31 02:27 · Score: 2, Informative

Well, it's not like they don't have the technology...

http://research.microsoft.com/nlp/Projects/MTproj. aspx
Re:Anyone care to make a bet? by snarkh · 2005-05-31 03:07 · Score: 1

Microsoft has a large group, which has been working on machine translation since early 90's.
Re:Anyone care to make a bet? by xtracto · 2005-05-31 03:36 · Score: 1

Microsoft has a large group, which has been working on machine translation since early 90's.

Sure but, by the way MS works they will wait until Google or any other one use the translation technology as direct competition in order to start really using that "large group".

This is why MS has not entered the translation market, since only Systrans and other minor dictionaries are available and they have this small market, but if Google or Yahoo or even Apple started using this technology to give an Added Value, I am sure Microsoft would start to do something

--
Ubuntu is an African word meaning 'I can't configure Debian'
Re:Anyone care to make a bet? by m00nun1t · 2005-05-31 07:59 · Score: 1

MS has had MT in production for several years. Check out the Spanish version of MSDN - 100% MT, since around 2002 I believe (not 100% sure of the year).

--
Read reviews of shopping cart software

Unsupported assertions by gowen · 2005-05-31 02:18 · Score: 2, Insightful

If anyone were capable of making a serious go of MT, that would have to be Google.

Erm... why is that? Is it because machine translation in some sense search technology? Because they've hired reknowned experts in natural language processing? Because they've got a lot of money slushing around and employ a lot of generally smart people?

Oh, no. It's because geeks like Google. Therefore, Google are capable of superhuman feats that mere scientists -- those with years of experience in relevant fields -- are incapable of doing.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.

Re:Unsupported assertions by stevejsmith · 2005-05-31 02:26 · Score: 4, Insightful

No, it's because Google has tons of talent, money, already-archived text to work with, computers, respect in the industry, and consumer base. I can't think of a company that possesses these characteristics more so than Google.
Re:Unsupported assertions by gowen · 2005-05-31 02:30 · Score: 1, Insightful

Well, (oh dear, here comes the Flamebait mod again), I'd argue that Microsoft has more of all of those, with the possible exception of "respect in the industry." As does IBM, Dell, Cisco ... and any number of other well established, Blue Chip IT companies.

Furthermore, Google's ideas are not new. People have doing things like this for years. But here on slashdot, a google press release about their latest software which doesn't even exist yet gets treated like the announcement of an earth shattering invention.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:Unsupported assertions by KagatoLNX · 2005-05-31 02:31 · Score: 5, Interesting

Ummm, geeks like Google because Google employs scientists. Which mere scientists were you talking about?

Were you talking about the PhDs at universities busy teaching classes, churning out research papers to avoid being fired (an ugly numbers game some departments play), or perhaps burning time generating volumes of grant paperwork?

Oh, maybe you were talking about the scientists employed by the private sector. I'm sure the management teams wherever they work are willing to take the time and care that Google won't.

You do know how may PhDs Google employs, right? Not to mention that they won't be fighting for resources there either. No backstabbing, liquidating MBAs trashing their corporate budget. No football-crazed alumni assassinating their funding proposals either.

Also, I would remind you that "mere scientists" often come up with the needed research (there are volumes in MT alone), but rarely can afford to put in the years that it takes into a good implementation.

Geeks love Google because it is, in many respects, where the best of business meets the best of academia.

--
I think Mauve has the most RAM. --PHB (Dilbert Comic)
Re:Unsupported assertions by tobybuk · 2005-05-31 02:32 · Score: 3, Funny

Look pal, you said something about Google that could be taken a negative. Here on Slashdot that is only slightly better that saying something good about Windows. But thank your lucky fucking stars you didn't decide to disparage the immortal being that is Linux. That's worse than flushing the original Koran down the pan.
Re:Unsupported assertions by gowen · 2005-05-31 02:40 · Score: 0, Flamebait

churning out research papers to avoid being fired
I love how you believe "churning out research papers" is somehow orthogonal to doing research.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:Unsupported assertions by benjcurry · 2005-05-31 02:40 · Score: 2, Insightful

Oh, come on! It's because in the past, most of what Google has undertaken has been enormously successful and useful. Yeah, they hire alot of smart people and have lots of money. Gmail (IMO) is the golden standard of free webmail. Google Maps (IMO) is the best map system out there. They also are responsable for Adsense, Adwords and I think they even have a search engine that gets a good amount of hits per diem. Maybe there is a reason to think this translation thingamabob will be good!

--
BenCurry.net
Re:Unsupported assertions by benjcurry · 2005-05-31 02:43 · Score: 1

Yes...Google also has a history of fulfilling on its hype, in stark contrast to MS.

--
BenCurry.net
Re:Unsupported assertions by gowen · 2005-05-31 02:47 · Score: 1, Insightful

Really? Google search is great, and Gmail's a adequate front end attached to a webmail system whose sole selling point is the massive amount of storage space.

But have you seen the monstrosity when that front end got belted onto the deja Usenet archive? Google Maps is usable, but it's hardly ground breaking.

And other than those things, exactly what hype have google delivered on?

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:Unsupported assertions by stevejsmith · 2005-05-31 02:52 · Score: 3, Interesting

Dell and Cisco are not in this business. IBM is not hemorrhaging with cash in the way Google is. Microsoft is not in the business of providing free Internet accessories. In any case, Google has a track record of innovative ideas ("innovative ideas" meaning that not only did they come up with it and implement it partially, but they invested full-on into it, bet money on it, and made it better than the competition) and is most likely of any company who would announce this to actually pull through with it. If some little start-up announced this (as I'm sure a few have), people would take it with a grain of salt. But that Google announces it, I'm sure most people believe fully that Google will deliver on its promise.

And you're right, people have thought of this exact idea (I'm sure every other computer major and linguist has, in fact, since the birth of ENIAC--I know the idea's crossed my mind tons of times, not that I'd have the slightest clue how to do it), however actually attempting to do it with a reasonable chance of success? I'm going to say Google is the first.

Plus, I got the impression from the article that the serve is operational, just not available to the public. If you'll read the article, you'll find that the translator properly translated a fairly complicated phrase from Arabic to English. I'd guess that this service is, from a technical standpoint, at least 95% done -it's just the packaging and touching-up that needs to be done.
Re:Unsupported assertions by benjcurry · 2005-05-31 02:54 · Score: 1

Well, I think I mentioned what they had delivered on. Gmail and Google Maps are groundbreaking in the sense of being some of the richest client-side applications the web has seen as of yet. Gmail is a joy to use, well organized and hassle-free (IMO). I haven't seen the Usenet/fron end thingamabob you mention, though. Google Maps offer many advanced features. My favorite is "bicycle shops near 121 Main street, Podunkville, VA". Brings up all the bike shops in close proximity to the address, with their phone #'s, etc. Like the yellow pages on steroids.

--
BenCurry.net
Re:Unsupported assertions by imroy · 2005-05-31 02:56 · Score: 2, Insightful

Erm... why is that?

Because Google has shown that it knows how to handle large amounts of human-created content and create useful information from it. The search engine was just the start. Just look at the spell checker they added. It doesn't use a dictionary, just the mass of web pages they spider monthly. It's not always perfect, but it allows it to be more adaptive than other methods. This translator looks like something similar along those lines.
Re:Unsupported assertions by l3v1 · 2005-05-31 03:01 · Score: 1

I love how you believe "churning out research papers" is somehow orthogonal to doing research.

You'd be surprised...

--
I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
Re:Unsupported assertions by Timesprout · 2005-05-31 03:07 · Score: 1

Clearly someone with mod has no no concept of what flamebait really is or what moderation actually means. Gowens points are prefectly valid, Google have done nothing, let me repeat that, nothing groundbreaking. What they have done is taken some old ideas and implemented them very well. The double standards round here though are amazing.

Just imagine MS produced a web accelerator which recorded personally identifable information about you and made unrequested downloads to your machine. The poor slashbots would struggle to post such would be their apoplexy. If Google do it, thats fine though.

--
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
Re:Unsupported assertions by snarkh · 2005-05-31 03:12 · Score: 1

Often it is...
Re:Unsupported assertions by rca66 · 2005-05-31 03:13 · Score: 2, Insightful

If you'll read the article, you'll find that the translator properly translated a fairly complicated phrase from Arabic to English.

For each existing MT system you can find fairly complicated sentences which translate ok.

I'd guess that this service is, from a technical standpoint, at least 95% done -it's just the packaging and touching-up that needs to be done.

"Technial standpoint" you mean, the system is able to translate arbritrary text? Maybe. Or do you mean the system is able to translate arbritrary text into semantically correct text in the target language? Highly unlikely. People are trying this vor decades now. And other companies and institutes have smart people too.
Re:Unsupported assertions by netsavior · 2005-05-31 03:14 · Score: 1

those with years of experience in relevant fields -- are incapable of doing.
How many years has mapquest had? Now Google has a mapping product that is 100% better... wait mapping is nothing like web search... how could they have done that when MQ has a staff with TONS of mapping experience... MQ uses the SAME DATA as GM and guess what... GM is better.

I find that history is full of people with "years of experience in relevant fields" that have been left behind by newcomers who see feats in a fresh way instead of writing them off as "superhuman feats"

Maybe Google is just not as weighed down with the failures of past generations.
Re:Unsupported assertions by Stonehand · 2005-05-31 03:18 · Score: 1

I can think of a certain TLA that would be extremely interested in machine translation and probably has access to ludicrous amounts of computing power, archived text in a variety of languages of interest, and top-notch scientists.

--
Only the dead have seen the end of war.
Re:Unsupported assertions by Anonymous Coward · 2005-05-31 03:36 · Score: 0

The odd person of the Ummm likes the Google, because the Google
employs the scientist. Which simplicity, which spreads out, is it
spoke the scientist?

, that out common, it deviates out, in order to respect, research the
paper (ugly number play fires any vice-play), or it manufactures the
sheep grant of the paper work, which burns it the university, it it
informs it filled is supposed, the PhDs with types of the hour that
spoke?

By the private sector, which Ohio out perhaps spread, it spoke the
scientist, who is employed. The management team hangs one hour and a
left, which Google, which worries itself it and where it functions the
marking sign it sits down out and and I which does it, it am positive.

The PhDs Google rents, which is right page log, which does not know it,
over to take off and it there an egg n, which spreads it out? This
resource of them of the danger, that place, is it, which page and
fights do not address and it not respect. The Backstabbing, its
Korporation Yesan pulls hydrocyanic acid MBAs to the outside pages.
off, those the football, to their investment blueprint to murder it
becomes wildly excited and it pupil of one is, which page.

"scientist, who is more simply" Soo silicone Ro to this research a
person, who, who rises a necessity (there are sheep within the M.Ue.S
and of them) it everything also you in me by south high country Anh
is, is rare it, but and is there a possibility of the Verringerns of
the fact, which sets it within the yearly, when it is intercepted with
a good execution.

First 10001 of the world of the first scholar of the enterprise in the
workstation, of many dots the odd person love the Google because of
the comfort.
Re:Unsupported assertions by Anonymous Coward · 2005-05-31 03:39 · Score: 0

No, it's because Googlebot is smarter than you.
Googlebot will soon be able read and interpert 240 languages... talk about multi-lingual. Remember googlebot is going to become self aware really soon here, just remember to stay on googlebot's good side, before he decides to wipe you off the net like the smear of organic refuse you are...
Re:Unsupported assertions by Bigman · 2005-05-31 03:47 · Score: 1

What they have done is taken some old ideas and implemented them very well.
Which, I would imagine, is where Google have scored over other corporations and research organisations. They are good at making things work. And, unlike many corporations, they have a good track record for delivering the results of their efforts at nil cost, or at least very low cost.
This project shows how access to free information on the internet enables innovation; if the UN documents had the usual IP overheads then this project would not be practical.

--
*--BigMan--- Time flies like an arrow.. but personally I prefer a nice glass of wine!
Re:Unsupported assertions by coolGuyZak · 2005-05-31 03:50 · Score: 1

Maybe. Or do you mean the system is able to translate arbritrary text into semantically correct text in the target language? Highly unlikely.
I believe that google isn't aspiring to "perfection", instead leaning towards "better". And if their system lives up to the hype, then more power to them.
"People are trying this vor decades now"
Apparently, he just ran into a translation problem himself ;)
And other companies and institutes have smart people too.
While they may have the "smart people" you speak of, none of them have applied those people appropriately. Another thing you have to consider: How many companies throw on additional staff who don't give a damn? My bet is quite a few. How many let their employees work with their passion? My bet is not so many.
Re:Unsupported assertions by Harinezumi · 2005-05-31 03:53 · Score: 1

And the system might even get declassified after a few decades!
Re:Unsupported assertions by coolGuyZak · 2005-05-31 04:06 · Score: 1

What they have done is taken some old ideas and implemented them very well. Google have done nothing, let me repeat that, nothing groundbreaking.
Actually, I consider implementing those old ideas well to be Google's most groundbreaking accomplishment.
Just imagine MS produced a web accelerator which recorded personally identifable information about you and made unrequested downloads to your machine.
I don't remember hearing anything about unrequested downloads. Regardless, I wouldn't be up-in-arms either way. However, this opinion should not take things out of context. If MS screwed up their beta software, I wouldn't be giving them pain. Surely enough, I'd be pissed. However, it was my decision to use beta software. Same for the people who got burned using the accelerator. Same for OSS. Beta is not guarenteed to work.

Meanwhile, if it was officially released software... You'd have every justification to be pissed. In that case, time to dust off your bat and smash the proverbial windshields.

DISCLAIMER: Not a /bot, nor to I speak for them.
Re:Unsupported assertions by gowen · 2005-05-31 04:09 · Score: 1

My favorite is "bicycle shops near 121 Main street, Podunkville, VA".
Pretty handy, but it takes something more than an ability to do look-ups based on ZIP codes for me to call something "groundbreaking". My telephone operators been able to do that for years.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:Unsupported assertions by Anonymous Coward · 2005-05-31 04:31 · Score: 0

Yes, GM is better using the same data. Know why? Because Google built a better interface. Duh. This is programming 101 stuff.
Re:Unsupported assertions by gowen · 2005-05-31 04:46 · Score: 1

I'd guess that this service is, from a technical standpoint, at least 95% done -it's just the packaging and touching-up that needs to be done.
I've got to admire your honesty -- you could've attempted to pass this uninformed speculation off as fact, but at least you've put it under the heading "Unsupported assertions"

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:Unsupported assertions by Politburo · 2005-05-31 04:46 · Score: 1

Microsoft is not in the business of providing free Internet accessories.

So I suppose MSN Search, Hotmail, Terraserver, and MapPoint don't count?
Re:Unsupported assertions by stevejsmith · 2005-05-31 04:54 · Score: 1

A service has searched through an absolutely huge corpus with little or no additional input other than "make these fit," and was able to translate perfectly (or near-perfectly, as I can't read Arabic) a fairly semantically-complex sentence, which with using the old methods of translation was an absolute disaster. I'd say this is a pretty sure indication that they're almost done with the heavy technical stuff (could you even begin to conceive of a way to parse the corpus?).
Re:Unsupported assertions by stevejsmith · 2005-05-31 04:58 · Score: 1

These are all relatively insignificant (and poorly-designed, at that) parts of the Microsoft behemoth, most likely only in place to prop up their desktop OS and productivity software hegemony.

For Google, on the other hand, this is what made them one of the most popular stocks on Wall Street.
Re:Unsupported assertions by gowen · 2005-05-31 04:59 · Score: 1

translate perfectly (or near-perfectly, as I can't read Arabic) a fairly semantically-complex sentence, which with using the old methods of translation was an absolute disaster.
Yes. One Arabic sentence. Picked by the google people to show their software in the best light.

Now, if you want to assume that means the job is done and that the software can correctly parse and translate any Arabic sentence, that's cool. Feel free to continue believing that.

On an unrelated note, I have a bridge that I'd like to sell to you.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:Unsupported assertions by stevejsmith · 2005-05-31 05:00 · Score: 1

Plus, I don't know if you have any linguistic training (I do) or experience in a non-Indo-European language (or a non-Germanic language, or, hell, even a non-English language!), but translating from a Semitic language to English based on a whole bunch of rules absolutely boggles my mind; I can guarantee you that if this thing can translate Arabic that well, its Swedish, French, Russian, etc. must be near flawless.
Re:Unsupported assertions by coolGuyZak · 2005-05-31 05:04 · Score: 1

"On an unrelated note, I have a bridge that I'd like to sell to you"
I'm interested in that thar bridge. Let's talk price. Say I start high, you lowball it, and we meet somewhere in the middle? Mind you, never been good at the whole haggling thing.
Re:Unsupported assertions by stevejsmith · 2005-05-31 05:08 · Score: 1

You say "one Arabic sentence" as if it's a minor thing. Unless they cheated, just the fact that they could translate this simple sentence is absolutely remarkable. In the time that it would take a native English speaker to learn Arabic fluently, I'll bet they could do any other two Indo-European languages fluently.
Re:Unsupported assertions by gowen · 2005-05-31 05:21 · Score: 1

just the fact that they could translate this simple sentence is absolutely remarkable
Err. No. If it had an exact match in the corpus, it would be absolutely unremarkable. Which, given the sentence's content, its hardly unlikely that there is a very close match in the corpus.

Given that the corpus is mainly political speeches, I'd be considerably more impressed if the machine was shown translating an Urdu sentence about the state Bangladeshi cricket.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:Unsupported assertions by a.ameri · 2005-05-31 05:25 · Score: 1

Well, with all respect to Google, but...
Talking about AI reminds of some othercompanies; you know, like the guys that made the Big Blue thingy that beat Kasparov.
In exactly which one of these above fields that you mention, is Google ahead of IBM, Siemens and Sony? (you know their Robots are pretty impressive, I guess these robots also DO have AI in them).

Google is in a very good position to tackle the MT problem, but don't be mistaken into thinking it is in a unique position. I am sure others will jump on board once they see commercial value in this business, and that is a good thing(TM)

--
-- /* Those who don't underestand Unix, are condemned to reinvent it poorly */
Re:Unsupported assertions by stevejsmith · 2005-05-31 05:40 · Score: 1

Urdu and Bangladeshi are mutually intelligible, or close to it -- this would be about as remarkable as an accurate machine translation from Danish to Swedish. I.e., not very.
Re:Unsupported assertions by rca66 · 2005-05-31 05:45 · Score: 2, Insightful

While they may have the "smart people" you speak of, none of them have applied those people appropriately.

Oh come on! Google may be a great company, but to say it is the first in the history of mankind which is able to motivate its employees or make them being productive is a very strange remark. I don't think I am exaggerating when I say: everything what Google achieved up to now is trivial compared to the problem of translation of human language. If one looks at their ranking, their indexing, G-Mail and so on: the complexity of those tasks is order of magnitudes below the problem of handling human language.
Re:Unsupported assertions by Jay+Carlson · 2005-05-31 05:59 · Score: 1

And you're right, people have thought of this exact idea (I'm sure every other computer major and linguist has, in fact, since the birth of ENIAC--I know the idea's crossed my mind tons of times, not that I'd have the slightest clue how to do it), however actually attempting to do it with a reasonable chance of success? I'm going to say Google is the first.

Language Weaver has been selling statistical machine translation systems for a while.

Their quick glance page indicates >500 words per minute on a minimum-spec'd 2.4GHz box with 2G of memory, though...
Re:Unsupported assertions by Politburo · 2005-05-31 06:44 · Score: 1

You can try to spin your way out of it, but the fact of the matter is that your original statement was incorrect. It doesn't matter how well they peform, or why they are in the business. The bottom line is that part of MS' business is providing web services.
Re:Unsupported assertions by coolGuyZak · 2005-05-31 07:37 · Score: 1

And nothing Microsoft has done with their smart people is particularly innovative either... it works both ways. Microsoft, however, has changed from a company that motivates it's employees to one that... stifles them. Google is now how MS was 10 years ago.

In retrospect, I should have said "most companies now do not apply their smart people" appropriately.
Re:Unsupported assertions by gowen · 2005-05-31 21:26 · Score: 1

You've misunderstood. I want MT to translate from Urdu to English, and I want it to translate a sentence that is unlikely to match an exact sentence in the collected speeches of the UN.

That would be a real test of this system, rather than the spurious one from which you're generalising such great things.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.

so name.. by Turn-X+Alphonse · 2005-05-31 02:19 · Score: 1

Googlefish or babelgoogle? Maybe we shouldjust change "internet?" to google and every site much have google involved.

Googlesoft.com
Googlenix.com
Opengoogle.org
g ooglejournal.com

--
I like muppets.

Re:so name.. by Anonymous Coward · 2005-05-31 02:23 · Score: 0

maybe it's going along with the current trend and will be: translate.google.com
Re:so name.. by 01000011011101000111 · 2005-05-31 03:13 · Score: 1

Much simpler... replace http:/// with gpap:// (google page access protocol) ;) Incidentally, is it me or are the anti-bot images getting worse? they're getting seriously pixellated...

--
Programming is an Art. I am an Artist. Does that mean I get to wear a daft hat?
Re:so name.. by Kjuib · 2005-05-31 03:33 · Score: 1

I believe the name you are looking for is:
Bagle
the Combination of BableFish and Google.
That might be taken.. but I will have to look.

--
- Your stupidity got you into this mess, why can't it get you out? -Will Rogers
Re:so name.. by Anonymous Coward · 2005-05-31 04:02 · Score: 0

booble

Piffle by ear1grey · 2005-05-31 02:20 · Score: 4, Funny

If anyone were capable of making a serious go of MT, that would have to be Google.

An interesting story, but please, for the love of all that's balanced and objective; tell me again how that smudge on your nose really is chocolate.

--
boakes.org

Re:Piffle by Heisenbug · 2005-05-31 02:36 · Score: 1

Piffle yourself. They have 100,000 servers to throw at statistical analysis, they have enough cash floating around to offer sign-on bonuses that even Microsoft can't beat, they have a history of applying PhDs to practical problems, and they have obvious business interests in making machine translation more useful. Google-worship aside, they're certainly a top contender in my book.

Of course, I don't know anything about this specific field, and that article sure was pretty fluffy. I'd be interested in more informed analysiseses ...
Re:Piffle by aicrules · 2005-05-31 02:52 · Score: 0

While somewhat opinionated, I would say that's more of an observation on Google's recent ( 3 years) slew of application feats. As a whole, they have released and announced more major application efforts than any other company. I believe the observation is based on this, rather than it being just an editorial comment.
Re:Piffle by gowen · 2005-05-31 03:02 · Score: 1

As a whole, they have released and announced more major application efforts than any other company
Yeah, because adding fricking maps to google search is entirely analogous to solving the notoriously hard problem of machine translation.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:Piffle by aicrules · 2005-05-31 03:28 · Score: 0

What other application development group would you say has a better chance of creating a better MT system? Google has the people, the funds, and the drive.

And how is there Maps system NOT a major accomplishment? I don't remember being able to do that via MapQuest.
Re:Piffle by dfjghsk · 2005-05-31 03:29 · Score: 1

Pfff... they also have an image search!

--
Help me take back Slashdot. When did 'News for Nerds' become 'FUD and Conspiracy Theories for Extremist Nutjobs'?
Re:Piffle by alassiry · 2005-05-31 03:46 · Score: 1

Their arabic samples are quite impressive! I have never seen any machine translation do that!!! (sure they are "controlled" samples, but they're still impressive)

--
_________________________________________________ Just another Crazy Linux/Perl Maniac
Re:Piffle by gordo3000 · 2005-05-31 03:48 · Score: 2, Insightful

neither their computing power nor their cash is anything to be in awe over. Neither are truly top contenders when it comes to the computing industry, unless you take the time to wonder why this is impressive.

Remember, almost all of those servers are needed for what they are currently working on, sot hey don't really have anywhere near that kind of computing power. I would be willing to bet that if they threw every free cycle at this, they have closer to 20%. Further, most of these servers are for moving data around, from what I have read, almost none of them are high end number crunchers(what is needed for MT).

They have no where near the cash of a lot of the big fish in the computer industry, so don't think they can out muscle people like Intel or IBM, much less the true heavy weight, Miscrosoft

http://www.cbsnews.com/stories/2004/12/22/national /main662452.shtml
(just to give you an idea as to how much cash MS can use to crush competitors, its not an issue of can't, it's an issue of not wanting to)

What makes Google's situation unique is they are in the best position to do this stuff in the group of companies that actually care to spend time on this project. This is the impressive part. A company that thrives on free material doing something so complex. things like google map are incredibly simple and only involve indexing available information. that isn't what this is by any means, this is a company attempting to break out of an niche and enter into the more revolutionary side of computing( I have yet to see a service by google that is actually this).

I'm interested to see how much hype it all is. Hopefully, I can give it some tough japanese and see how well it holds up when it goes beta(the enternal state of any google project).
Re:Piffle by gowen · 2005-05-31 03:51 · Score: 2, Insightful

What other application development group would you say has a better chance of creating a better MT system?
IBM? Remember them? They probably spend more money on Blue Sky thinking than Google's entire research budget. Ever heard of Deep Blue, the computer that beat Garry Kasparov? Wasn't made by Google.

Now, which do you imagine is closer to the sort of non-linear processing needed to do machine translation... playing chess, or cross referencing an enormous lookup table of ZIP codes?

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:Piffle by cei · 2005-05-31 06:21 · Score: 1

tell me again how that smudge on your nose really is chocolate

The difference between brown-nosing and kissing ass? Depth perception...

--
This sig intentionally left justified.
Re:Piffle by Anonymous Coward · 2005-05-31 08:18 · Score: 0

Cross referencing an enormous lookup table of ZIP codes!!

Remember they are using Statistical Machine Translation. It's the brute force approach to MT so they will be generating huge amount of look up tables
Re:Piffle by NoOneInParticular · 2005-05-31 08:55 · Score: 1

In the IBM sense, playing chess is searching an enormous tree defined by the possible moves in an artificially constrained domain using heuristics created by chess-masters and -grandmasters, optionally manually tweaked in between consecutive searches through the tree. So my bet is on the ZIP codes. Deep Blue like all chess computers is a hack.

Altavista Babelfish by yotto · 2005-05-31 02:20 · Score: 4, Funny

When questioned on the matter, Altavista's Babelfish translator gave this quote:

Google does not have anything on my amazing abilities of the translation!

--
Pulp Audio Weekly - Geek News and Reviews

Re:Altavista Babelfish by StikyPad · 2005-05-31 12:09 · Score: 1

In Soviet Russia Babelfish says "Google does not have that -nibyd6 on my amazing abilities of transfer!"

--
https://www.eff.org/https-everywhere

if anyone... by rdc_uk · 2005-05-31 02:20 · Score: 5, Interesting

Actually, my bet for most likely to make a real go of machine translation would be...

IBM

Look how far they ran with chess programs, because they felt like it...

If they decided to go the same distance with translation...

Re:if anyone... by nfk · 2005-05-31 02:39 · Score: 1

They could beat Kasparov and get his expletive reaction on the spot.
Re:if anyone... by LiquidCoooled · 2005-05-31 02:40 · Score: 2, Funny

They won't have any money left to fritter on useless projects after SCO beats them ;)

--
liqbase :: faster than paper
Re:if anyone... by digidave · 2005-05-31 02:54 · Score: 3, Informative

Yeah right. Not while they're trying to convince customers to buy their current generation of crap translators. I got sucked into an IBM conference two years ago where they tried to convince me that their Websphere translator was "near perfect" and that it was ready to be deployed on web sites wanting to offer content in multiple languages. They even went so far as to bring in supposed unbiased happy customers who testified that the Websphere translator was as good as human translators.

In the conference was mostly IBM platinum partners (development firms who specialize in IBM "solutions" and make IBM enough money to be called platinum partners) and they seemed to buy into it. Of course, platinum partners tend to believe everything IBM tells them.

--
The global economy is a great thing until you feel it locally.
Re:if anyone... by rbarreira · 2005-05-31 02:58 · Score: 2, Funny

I believe your thoughts are upside down...

--

The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F
Re:if anyone... by rca66 · 2005-05-31 03:01 · Score: 2, Insightful

Actually, my bet for most likely to make a real go of machine translation would be... IBM

They already did it. Several years ago. You can get it with Websphere and offsprings are sold under different labels.

Look how far they ran with chess programs, because they felt like it...

Chess is trivial compared to the task of translation. You can not compare these two problems.
Re:if anyone... by Anonymous Coward · 2005-05-31 04:02 · Score: 1, Informative

Actually, a lot of the original work that the current statistical mt methods are based on was developed at IBM. They were named, appropriately, IBM Models 1-5.
Re:if anyone... by Anonymous Coward · 2005-05-31 04:26 · Score: 0

You know...

I think you're right...

Your overuse...
of ellipses... is... so... fucking... annoying..................
Re:if anyone... by gowen · 2005-05-31 04:42 · Score: 1

Chess is trivial compared to the task of translation.
And yet some people here (we call them "idiots") are suggesting that Google will crack that nut in no time because (get this) they've made "groundbreaking" software like Google Maps. Which works by cross-referencing ZIP codes, for freaks sake.

--
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
Re:if anyone... by Anonymous Coward · 2005-05-31 05:11 · Score: 0

Do you ever end a sentence without three periods?
Re:if anyone... by Anonymous Coward · 2005-05-31 05:41 · Score: 0

Chess is trivial, in the number of possible combinations (although large) compared to even another game, such as Go, let alone a language. IBM chose chess, since it is (relatively) limited in number of combinations, etc.
Re:if anyone... by Anonymous Coward · 2005-05-31 05:51 · Score: 0

Actually the originator of the statistical MT idea (or at least a statistical MT model and its working implementation) is IBM (Brown et al.)

What Google is doing is based on the original IBM idea, just like any other scientific development is built on the work of previous scientists.

IBM is still working on MT, and I bet they are willing to go the distance. The problem is, translation is much more difficult compared to chess. It really is, ask anyone who has some experience in computational linguistics.

I sure hope Google can come up with something that works, mainly because it is more likely to be freely available on the web. IBM can come up with a good system as well, but it is unlikely to be free.

As for the general question of whether the problem of MT can be solved, not really. All you can hope for is something that resembles English and sort of conveys the meaning of the original document. This will be the case for some time. The reason? Translation is hard! It is AI complete! Talk to professional translators, it is a difficult task even for humans.

Bubla *Cick BAle by kristopher · 2005-05-31 02:21 · Score: 1

Bubla *Cick BAle Walkie *Hotka BaCa Sopika *luek Gack *Zoek Pael Quazic Translate that google!

Re:Bubla *Cick BAle by wootest · 2005-05-31 02:37 · Score: 1

Hey! My mother was a saint!
Re:Bubla *Cick BAle by CyberKnet · 2005-05-31 02:41 · Score: 1

That's the idea. Given enough examples this gibberish and it's counterpart in english, eventually the system could start to 'translate' it. Personally, I'd like to see this used for the reverse. Feed enough random input and english texts for their 'counterparts' and use the service to create a new language.

--
Video meliora proboque deteriora sequor - Ovidius

Only works for translating speeches by Shotgun · 2005-05-31 02:23 · Score: 4, Insightful

If your blog sounds like a politician giving a speech at the UN, this service will do a wonderful job. Doubtful that it will do any better that Babelfish otherwise.

The biggest problem in artificial intelligence is that the system learns the material that it is trained to, and only that material. Computers don't generalize or extrapolate the known into the unknown worth a damn.

--
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba

Re:Only works for translating speeches by hayh · 2005-05-31 02:29 · Score: 0

It would never work for a lot of /. posts, because it would assume that the original text's speeling and grammer are correct ;)
Re:Only works for translating speeches by Dystopian+Rebel · 2005-05-31 02:45 · Score: 2, Funny

And if the peeps chin-wagging at Kofi Annan's gig don't interpret 733T 5P3AK, you're in the saddle!*

*Up the river without a paddle.

--
Rich And Stupid is not so bad as Working For Rich And Stupid.
Re:Only works for translating speeches by atomm1024 · 2005-05-31 02:53 · Score: 1

"733T 5P3AK"
"TEET SPEAK" ?

Teat speak?
Talking like a boob? :)

--
Signature.
Re:Only works for translating speeches by Anonymous Coward · 2005-05-31 03:03 · Score: 0

It's going to use more seed material than that as it progresses, you clever person.
Re:Only works for translating speeches by autopr0n · 2005-05-31 09:28 · Score: 1

Computers don't generalize or extrapolate the known into the unknown worth a damn. They can, if you know how to program.

--
autopr0n is like, down and stuff.
Re:Only works for translating speeches by Alsee · 2005-05-31 15:28 · Score: 1

the system learns the material that it is trained to, and only that material.

Yep. I just asked it to translate:
"Good, we're all in agreement."
and the damn thing BSOD'd.

-

--
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.

Good online translators for other languages by metlin · 2005-05-31 02:23 · Score: 2, Insightful

While Google's existing translator and Altavista's Babelfish are good, they do not help in the translation of several other languages.

That would be a really good benefit - for instance, I wanted something translated to and fro from Svensk (Swedish), but I really couldn't find any translation service that did.

Good translation of the more common languages would be nice, but simple translations, even - of a variety of languages would be really useful.

Re:Good online translators for other languages by TyrelHaveman · 2005-05-31 02:51 · Score: 1

Jag talar inte svenskt, men I grundar detta för att översätta för mig.
http://www.systranet.com/
You have to sign up after like 5 translations, but it IS free to do so. It'll do to/from/between Swedish, Arabic, French, Greek, Spanish, Portuguese, Italian, German, Dutch, Russian, Korean, Japanese, and Chinese (Simplified and Traditional).
Re:Good online translators for other languages by _Laban_ · 2005-05-31 03:09 · Score: 1

Systranoft - Does swedish!
Re:Good online translators for other languages by Anonymous Coward · 2005-05-31 03:32 · Score: 0

Jag talar inte svenskt, men I grundar detta för att översätta för mig.

I'm Swedish, and that was incomprehensible gibberish.

Roughly translated back to English, just to give you an idea, your sentence would become something like:

"I don't speak Swedishly, but Thou ground this to translate at me."

For example, when you used Systran it translated "translate for me" into "översätta för mig".
For = "för", but it's the wrong preposition in this context. (It should have used "åt" instead). I picked "translate at me" in my translation just to show the seemingly overwhelming number of English-only speakers here how weird it can get.

Word-for-word is useless. You've got a hell of a job to do, Google.
Re:Good online translators for other languages by Anonymous Coward · 2005-05-31 06:29 · Score: 0

Far jåg letar efter min ost i din röv?

Yeah for foreign spam! by Anonymous Coward · 2005-05-31 02:23 · Score: 2, Funny

At last I can translate all those non-English spam emails I get! There'll be no more missed opportunities to buy chinese viagra, woohoo.

Re:Yeah for foreign spam! by fuzzybunny · 2005-05-31 02:41 · Score: 1

This is the best one I have ever received. For you German speakers out there. And note the footer and b1ffsteriffi/
Date: Mon, 30 May 2005 06:44:20 -0700 (PDT)
From: harris peters
To: sassisch@yahoo.com
Subject: Grüße

HALLO LIEB, WEISS ich, DASS DIESER BUCHSTABE ZU IHNEN, DA eine ÜBERRASCHUNG,
aber, sich nicht SORGEN, alle KOMMEN MAG IST GUT. Ich BIN Herr HARRIS
PETERS, GESCHÄFTSSTELLENLEITER FINANZIELLEN VERTRAUENSCBankPlc, der IM
MAURITIUS GELEGEN Ist. VOR EINIGEN JAHREN, KAM Ein MANN, der Herrn SHAW
SMITH GENANNT wurde, den, Who AUS IHREM LAND, GENAU VON IHREM TEIL IST, ZU
MEINEM LAND (MAURITIUS) IM GUMMI SECTOR.UNFORTUNATELY ZU INVESTIEREN, ER
STARB IN EINEM SELBSTCAbbruch. Herr SHAW SMITH GESTORBEN, DIE SUMME DER
DOLLAR 15MILLION US IN MEINER BANK LASSEND. Ich ERBITTE HIERMIT IHRE
UNTERSTÜTZUNG ZU HELFEN, das GELD ZU BEHAUPTEN. Ich WERDE SIE BENÖTIGEN,
ALS Der VETTER SPÄTEN SHAW SMITH ZU DIENEN, WEIL IM AUGENBLICK, ER KEIN
FOLGENDES Der STÄMME HAT, DAMIT Das GELD AUF GEBRACHT WERDEN Kann. WENN SIE
RECIEVE DAS GELD, SIE 40% NEHMEN, DAS ÜBER DOLLAR 6MILLION WIE IHR ANTEIL
IST UND SIE GEBEN MIR DAS ANDERE 60%. Die REGIERUNG PLANT, Das GELD ZU
ÜBERNEHMEN, WENN KEINS OBEN DARSTELLT, DA SEIN FOLGENDES VON KIN.I
ÜBERPRÜFT, Daß ALLES UNTER STEUERUNG IST, DA Ich Die NIEDERLASSUNG
MANAGER.SO BIN, das SIE NICHTS HABEN, Sich ABOUT.ALL ZU SORGEN, SIE TUN
MÜSSEN SOLLEN MIR ANTWORTEN, WENN SIE INTERESSIERT SIND, ALSO WIR Die
NESSECARY-, DOKUMENTE FÜR Die ÜBERTRAGUNG ZU VERARBEITEN BEGINNEN KÖNNEN.
GESCHÄFTSSTELLENLEITER DES DANKES HARRIS PETERS F.T.B
harris_peters@yahoo.com
__________________ ________
Cashette stops spam. 100% effective and free! Go to http://www.cashette-inc.com/

--
Cole's Law: Thinly sliced cabbage

Pre-emptive strike by eno2001 · 2005-05-31 02:24 · Score: 3, Funny

Since it's become "hip" to bash Google these days and support either MSN's search technology or Yahoo, I'm making a pre-emptive strike for the IT fashionistas:

"Duh!!! The best machine translator in the world already exists and there can be no improving upon it! Babblefish (thank you Altavista) has been doing this for well nigh a decade. All you Johnny-come-latelys are probably going to rave on with fanboy adoration of Google (the company that can do no wrong)!!! To top it all off, you lot apparently know nothing about Microsoft's language transtlation project which is slated to be deployed as part of Longhorny in 2010. Online language translation from Google will fail because Microsoft will have it built into the OS itself. Why send your document online for translation when the OS itself will not only translate it, but it will correct the grammar, punctuation and generate a WMA file in one of ten thousand gorgeously rendered synthetic voices. Google has lost. Google as been trolled. Google will have a nice day".

We now return you to your regularly scheduled pos[tt]en.

--
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o

Re:Pre-emptive strike by ConceptJunkie · 2005-05-31 02:37 · Score: 1

and generate a WMA file in one of ten thousand gorgeously rendered synthetic voices

They might now have 10000 synthetic voices, but I bet they still all sound like GORF.

--
You are in a maze of twisty little passages, all alike.
Re:Pre-emptive strike by StikyPad · 2005-05-31 12:22 · Score: 1

All you Johnny-come-latelys are probably going to rave on with fanboy adoration of Google (the company that can do no wrong)!!! To top it all off, you lot apparently know nothing about Microsoft's language transtlation project which is slated to be deployed as part of Longhorny in 2010.

Who talks like that? The only 2 possible answers I could come up with, were the English, and (the way it sounded to me) drag queens.

--
https://www.eff.org/https-everywhere
Re:Pre-emptive strike by eno2001 · 2005-05-31 14:53 · Score: 1

Are you questioning my sexuality? ;P

--
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o

Old news... by jasonmicron · 2005-05-31 02:25 · Score: 4, Funny

There is already a tranzilator

Re:Old news... by NinjaFarmer · 2005-05-31 02:56 · Score: 1

A tale of the oppressed

T.Q. by moviepig.com · 2005-05-31 02:29 · Score: 4, Insightful

The system has been trained using the United Nations Documents as a corpus.

Seems one could devise a TQ (tranlsation quotient) measuring the effectiveness of machine (or human) translators. Take any standard reading-comprehension test, a send its text material through the translator, and back ...and then compare the scores of subjects taking the resulting test vs. those taking the original.

(Before such translators make their way into, say, diplomatic circles, I'd sure hope there's some objective demonstration of near-infallibility...)

--
Seeing bad movies only encourages them. Watch responsibly

Re:T.Q. by AYauFu · 2005-05-31 05:56 · Score: 1

Another point: What happens when the program starts using its own translated text as source material? For example, someone in the UN writes a document in English and allows the translation program to generate all of the non-English official texts. Does the program loop back and eat its own content for future translations?

oh no! by danharan · 2005-05-31 02:30 · Score: 5, Interesting

I don't ever expect such translation to work perfectly, but taking existing phrases should lead to useful first drafts.

This will mean one less possible career for me, and fewer babelfish induced laugther moments.

As a fluently bilingual person, I often recognize expressions that were translated in Canadian government documents. "Anglicisme" is the word the french have for it.

There's subtlety to languages we may forever lose. Take for example:

"Je donne ma langue au chat" - "I give up (answering a riddle) instead of the more picturesque "I give my language to the cat". Well, that should be tongue, but hey, it's just babelfish!

"Bullshit" won't produce "merde de taureau". That is a strange expression you anglos have, don't you realize?

"Il pleut comme vache qui pisse" will give us "it's pouring cats and dogs" rather than "it's pouring like cows' a'pissin". The french also have never heard of cats and dogs falling from the sky.

While an improved Babelfish may improve our mutual comprehension, please pause for a moment to consider all the linguistic hilarity we'll forever lose.

--
Information: "I want to be anthropomorphized"

Re:oh no! by fuzzybunny · 2005-05-31 02:38 · Score: 4, Funny

While an improved Babelfish may improve our mutual comprehension, please pause for a moment to consider all the linguistic hilarity we'll forever lose.

Yeah, like me going to work for Bull in 1997, and searching for "comment dit-on, le, fuck, le chose sur lequel on tappe, thingy qui connecte a l'ordinateur, ah yeah, le clavier". French Bull dude: "ah, le keyboard."

Hilarity indeed.

--
Cole's Law: Thinly sliced cabbage
Re:oh no! by bhima · 2005-05-31 02:43 · Score: 1

Most of the work I do is in both German & English and you're right "the linguistic hilarity" is delicious! Particularly when you include regional dialects rather than just "proper grammar".

--
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
Re:oh no! by benjcurry · 2005-05-31 02:49 · Score: 1

I'm bilingual as well (Egnlish/Spanish), and I certainly enjoy being able to speak both. However, bemoaning the potential consolidation of languages is a bit of a useless battle, as the internet has already dug the grave for wide variety of colorful sayings and phrases in languages all over the world. This is the way the evolution of language has always happened, it's just happening more quickly in the information age. As one brand of "linguistic hilarity" dies, the nature of human beings will only birth another to take its place.

--
BenCurry.net
Re:oh no! by Anonymous Coward · 2005-05-31 04:11 · Score: 0

"Je donne ma langue au chat" - "I give up (answering a riddle) instead of the more picturesque "I give my language to the cat". Well, that should be tongue, but hey, it's just babelfish!

"langue" means both "language" and "tounge" in french.. damn homographs :P
Re:oh no! by danharan · 2005-05-31 05:13 · Score: 1

hehe... that was "deja vu" (an expression the french haven't even heard of)

I remember some colleagues smoking "du shit" (fr sp? / a low grade hash), going to "tres select" clubs or overhearing someone say "c'est la life"

Oui, c'est cool.

More professional doozies were "checkiner" and "checkouter". Although it's a bona fide french snigglet, I never heard "javaquer" in my work place.

aaah, Paris. I wish I could get that job again!

--
Information: "I want to be anthropomorphized"
Re:oh no! by SlartibartfastJunior · 2005-05-31 16:46 · Score: 1

dude, that just made my day. I'm not the only one!

What next? by chrisnewbie · 2005-05-31 02:30 · Score: 1

I predict we'll see google developping the Universal translator pin.
then the warp drive,,then teleporter and why not everlasting youth?

Oh yeah!

Re:What next? by Anonymous Coward · 2005-05-31 03:31 · Score: 0

in 2035 you can upload your brain into google
Re:What next? by Eric604 · 2005-05-31 04:57 · Score: 1

in 2035 you can upload your brain into google
And search it ! 8D
Re:What next? by Anonymous Coward · 2005-05-31 13:32 · Score: 0

And they promise they won't replace 10% of your brain's storage with subliminal advertising...

3 cents... by BipinG · 2005-05-31 02:31 · Score: 0

Most of the time you don't know in what language the text is written in. When you get a alian looking content..... most of the time, you don't know the best way to make sense out of the shit! they should have something that detects (pattern matching etc.....) the language in which the context is written in!

20 Billion? by Bananatree3 · 2005-05-31 02:32 · Score: 1

That should be 200 billion words according to the article

All your base by 1967mustangman · 2005-05-31 02:32 · Score: 1

So how do you think it will handle all your base are belong to us? Seriously thought it will be interesting to see how well they can make it work. My expereince so far with translators has been dreadful

--
Madre de Dios! Es El Pollo Diablo! -- Captain Blondebeard

Re:All your base by Anonymous Coward · 2005-05-31 03:22 · Score: 0

Funny that you choose a mistranslation as a basis of rating an MT.
Re:All your base by 1967mustangman · 2005-05-31 03:28 · Score: 1

It was a joke

--
Madre de Dios! Es El Pollo Diablo! -- Captain Blondebeard

How about Google Calendar? by blankoboy · 2005-05-31 02:34 · Score: 1

When are we going to see calendaring functionality with Gmail? You know it's in the works in Google labs...come on Google! ;)

Time to move the AI bar by TopSpin · 2005-05-31 02:36 · Score: 3, Interesting

First, this is outstanding; Google, unsatisfied with traditional machine translation techniques, pioneers their own design. I'm certain their advertisers will be pleased to have their adds auto-translated to whatever language is necessary.

Second, I think we'll witness a case of having the AI ante upped once again when another traditional AI challenge is met. Wikipedia puts this best; When viewed with a moderate dose of cynicism, AI can be viewed as 'the set of computer science problems without good solutions at this point.' Once a sub-discipline results in useful work, it is carved out of artificial intelligence and given its own name.

--
Lurking at the bottom of the gravity well, getting old

Re:Time to move the AI bar by davids-world.com · 2005-05-31 05:39 · Score: 1

First, this is outstanding; Google, unsatisfied with traditional machine translation techniques, pioneers their own design. I'm certain their advertisers will be pleased to have their adds auto-translated to whatever language is necessary.

Where did you get that?
Google bought one of the well-known people in statistical MT, who had been working at USC ISI (leading in MT research), and he probably has a team of people taking care of scaling it up.

Saying that they're pioneering a new design seems like conjecture to me. A more educated guess IMHO would be that they're using phrase-based statistical MT... But even for that I'd have to look things up or talk to the researchers working on it :)

Other uses... by HaydnH · 2005-05-31 02:37 · Score: 1

This sounds very interesting... imagine the possibilities for localization of applications - I'm sure a simple script could be created to extract strings from source, parse them through the translator and substitute them in your chosen language, this could save a LOT of time!!!

I can't wait for a Welsh version of firefox =P

--
Time is an illusion. Lunchtime doubly so. - Douglas Adams

Re:Other uses... by I+confirm+I'm+not+a · 2005-05-31 03:01 · Score: 1

I can't wait for a Welsh version of firefox =P

According to this Mozilla QA document, Firefox should have had a Welsh locale since 1.0.2? Not that I've looked, the closest I come to speaking owt other than English is claiming I speak "Lallans" whilst in the West of Scotland... aye richt ;-)

--
This is where the serious fun begins.
Re:Other uses... by HaydnH · 2005-05-31 03:45 · Score: 1

Yes, it should... but it doesn't.

--
Time is an illusion. Lunchtime doubly so. - Douglas Adams
Re:Other uses... by I+confirm+I'm+not+a · 2005-05-31 03:58 · Score: 1

Aha! That'll explain why the document I was looking for didn't appear to exist. Thanks.

--
This is where the serious fun begins.
Re:Other uses... by cicho · 2005-05-31 05:35 · Score: 1

"...a simple script could be created..."

You've never worked on localizing a non-trivial application, have you?

--
"Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan
Re:Other uses... by HaydnH · 2005-05-31 20:37 · Score: 1

"You've never worked on localizing a non-trivial application, have you?"

Well lets have a look at a Mozilla source file, a simple shell script could be created like the following pseudo code:

for each line in file;
if "DONT_TRANSLATE" appears in line: skip to this line + 2;
else: translateViaGoogle(What's in between the dirks);
goto next line;

OK - it might not work straight out of the google box, but fixing the errors would more than likely be a lot quicker than manually translating a full project!

--
Time is an illusion. Lunchtime doubly so. - Douglas Adams

IMHO it's too early for that by trandism · 2005-05-31 02:37 · Score: 1

Good luck to them, but I doubt that they are gonna make it.

OK, make in 10 years or sth

I'm into natural language processing myself and it seems to me that it's very difficult to build a system that works globally on all kinds of input.

They'll have to LISP it to death!

Anyway my $0.02

--
www.lemonodor.com A mostly Lisp weblog

Re:IMHO it's too early for that by benjcurry · 2005-05-31 03:02 · Score: 1

Actually, that's the point: it won't take long (theoretically). They just need to write a set of rule with which the translationn programs learns the rules of translation and grammar by matching up the corresponding translations of all of the 200 billion words worth of material in the U.N. vaults. Viola!

--
BenCurry.net
Re:IMHO it's too early for that by cicho · 2005-05-31 05:38 · Score: 1

"Viola!"

You know what machine translation will never do? (For small-to-medium values of "never".) It won't easily compensate for errors in source text, like the one you've just made. Statistical approach won't help a computer when the text as-written is not the intended sense. Human translators do such compensation all the time.

--
"Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan
Re:IMHO it's too early for that by trandism · 2005-05-31 19:36 · Score: 1

So you think that everything that has to be done is find a corresponding translation for each word??

Unfortunately that is NOT true from what I've seen in the field.

Maybe it's good enough (not perfect of course) for some technical documents.

But for speeches, news etc. and especially literature this oversimplified technique sucks.

--
www.lemonodor.com A mostly Lisp weblog
Re:IMHO it's too early for that by benjcurry · 2005-06-01 06:01 · Score: 1

Viola? That's Spanish for "violate". Viola!

--
BenCurry.net
Re:IMHO it's too early for that by benjcurry · 2005-06-01 06:04 · Score: 1

No, what I'm saying is that the U.N. docs, while not representative of common language usage in all cases, are in fact translated to different languages BY HUMANS, not a word-for-word type of translation. Patterns that appear in corresponding translations will be recorded and used to infer what the translation should be for the new translation. It's a fabulous idea. Will it work? Dunno.

--
BenCurry.net
Re:IMHO it's too early for that by trandism · 2005-06-01 22:18 · Score: 1

Ah I see. Yes it's a good idea but it has to be combined with other techniques in order to give decent results. But, it's a good start anyway.

I insist that it's going take years to get good results. Maybe not 10 years (as I said in my first post) but surely more than 5.

--
www.lemonodor.com A mostly Lisp weblog

Machine Translation may never get there.. by acomj · 2005-05-31 02:40 · Score: 1

A relative worked in an "internationalization" department, creating software/manuals in many langugages.

In order for machine translation to be as good as human translation, you fist need to determine what the sentance "means". Often times you need to track previous sentances to determine meaning of things like the word "it". Human languague is not very detailed and relies on common knowledge experences to infer meaning.

Its very hard. Some langauges are easier than others for this stuff. German/french/spanish all change the gender of the word "the" based on the noun and give clues about how its used in a sentence. This can help a little.

For many web pages this approach may give an understandable translation, but for literary references and books (manuals etc) machine assisted translation is now the norm.

even using AI determining meaning is very difficult. google semantic processing for companies trying. One is CYC, a stanford spin off.

http://www.cyc.com/

Lovely translation source... by isa-kuruption · 2005-05-31 02:40 · Score: 5, Funny

So when you go to translate.google.com and translate something, the result will be legal-eze in the resulting languages.

Spanish: "Que pasa?"
English translation: "With regards to the current situation, how is the day progressing?"

Re:Lovely translation source... by ShadyG · 2005-05-31 03:48 · Score: 4, Funny

No, it actually translates "que pasa" into "We hereby condemn these actions taken by the Israeli government."

--
Nerd Rock In Progress
Re:Lovely translation source... by Retired+Replicant · 2005-06-01 08:14 · Score: 1

Actually, I think it will never actually translate your document. It will however generate an extensive series of proclamations about how it is in favor of translating your document, but that action must not be taken to hastily, and further debate must occur on the merits of translating your document.

how do they know? by blue_adept · 2005-05-31 02:42 · Score: 1

FTA:
researchers working on this enabled the system to translate from Chinese to English without any researcher being able to speak Chinese

Hmmm.. and they that it works because...?? ;)

--

"Is this just useless, or is it expensive as well?"

Re:how do they know? by pedantic+bore · 2005-05-31 03:13 · Score: 1

Perhaps they just look at it and ask themselves "if I was Chinese, would I say something like that?"
Seriously, given the profound differences between Chinese and English, and each languages' complexity, I'd be very impressed it did a decent job. Until I see it working, however, I remain skeptical. After all, the field of machine translation has been around longer than most Google employees have been alive, and it's still got a long way to go.

--
Am I part of the core demographic for Swedish Fish?

DVD's subtitle tracks by Jotham · 2005-05-31 02:42 · Score: 3, Funny

DVD subtitle tracks would be another good addition to help pick up slang too (most have an english track along with a couple others depending on the region)... all time-synced and easy to match up...

(I'm guessing that it'd fall under fair use and google wouldn't have to struggle to get the movie studios approval, (even though such tech would benefit the studios too))

Re:DVD's subtitle tracks by BullfrogJones · 2005-05-31 03:46 · Score: 3, Insightful

One serious problem I see with the 'matching source' method is that it's rare to find two sources that truly match. Movies are a great example - as a native English speaker that lived for 5 years in Spain, I can attest to the fact that the translations provided by the movie studios (used for subtitles in the theater and also for DVDs) are problematic on many levels.
It's not enough to recognize a given word in language A is such and such word in language B, and not even enough to do the same with idiomatic phrases such as 'His bark is worse than his bite' (Mucho ruido, pocas nueces in Spanish, literally 'Lots of noise but few pecans').

The problem is that the content itself is sometimes changed in translation. Cultural differences, pop culture references, names and places are all changed liberally when creating movie subtitles. This is something that it is easy enough for a bilingual human to notice and disregard, but how is a computer to know what to keep and what to disregard when comparing the supposedly matching sources.

Choice of source material is extremely important here, and probably explains why they are starting with UN documents, a formal, business-like body of text with presumably less room for content differences. Unfortunately, the fact that movie translations cannot easily be used means that much of what we humans find amusing about bad babelfish translation (literal translation of slang, etc...) will continue to plague us for some time to come.
Re:DVD's subtitle tracks by kesuki · 2005-05-31 03:51 · Score: 2, Interesting

you assume dvd subtitles are in iny way related to the original audio content at all. That is a pretty big assumption. As a matter of fact, subtitles are very rarely a solid translation of the words meaning... usually they're an approximation of 'what fit in in the subtitled language.' Sometimes, they're completely ad-libbed. fansubs aren't much better, since many of them are being translated by people just learning how to translate.

I watch a lot of anime, and a lot of fansubs, subtitles are the worst way to learn a language.

--
https://www.gnu.org/philosophy/free-sw.html
Re:DVD's subtitle tracks by fuck+nwbvt · 2005-05-31 04:16 · Score: 2, Interesting

If the aim (ultimately) is to help you understand things from other languages better, then what's the problem with changing pop culture references? Someone talking in British English about Kylie Minogue's lovely bum, for example, could probably be replaced in American English with a phrase about Shakira's boobs. Which is good, because no one in the States (in my experience) knows who Kylie is, and the translation gets the concepts right. That can only be a good thing, right?
Re:DVD's subtitle tracks by cicho · 2005-05-31 05:45 · Score: 1

Actually, movie subtitles make a very poor source for statistical analysis, because of space and time limitations. An actor utters 30 words in the time most people can read maybe 12, and that's about the ratio of actual dialogue to subtitles. And if these 12 words don't fit on screen, you have to make do with ten or eight.

--
"Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan
Re:DVD's subtitle tracks by cicho · 2005-05-31 06:24 · Score: 1

And you really expect software to be able to make such substitutions, Kylie to Shakira? That requires compiling sum total of current (as in, this year) culture into a form that the translation mechanism can then pick individual references from. If you know how to do it, you'll be richer than Bill Gates by next year.

--
"Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan
Re:DVD's subtitle tracks by pomo+monster · 2005-05-31 06:59 · Score: 1

Isn't that the whole point of this translation mechanism? If it can automatically determine that the English phrase "it's raining cats and dogs" is not to be translated literally, but rather substituted with the language-specific idiom, then there's no theoretical reason it couldn't also intelligently substitute culturally idiomatic references. Depending on the amount of hand-holding you do with the translation, the substitution of such cultural references might even happen automatically.

You'd need the right source texts, of course, and enough number-crunching to find the right associations, but these issues are of the same nature as the issues with the process as a whole.
Re:DVD's subtitle tracks by greed · 2005-05-31 07:58 · Score: 1

The GP means compare the multiple subtitle channels, not the subtitles to the spoken.
At least all the subtitle channels are given the same space constraint. Of course, every language takes a different amount of space to express a given idea, so each translation will have different trade-offs.

And then, as others said, you get cultural translation too, like:

Spoken Japanese: "I am most sorry, it is a terrible dishonor, I humbly beg for your forgiveness."
Subtitled English: "My fault."

Language choices by Anonymous Coward · 2005-05-31 02:44 · Score: 0

But can it translate Pig Latin, Bork Bork Bork!, and Klingon?

Starting Wars ! by justanyone · 2005-05-31 02:45 · Score: 4, Funny

In 'Hitchhiker's Guide to the Galaxy' (the 'trilogy' of books, not the recent movie), it's mentioned that the babelfish has effectively started many, many wars. The reasons seem to be that any being can be rude to any other being without a serious set of translations that explain exactly what the rude terms mean and how they should be regarded.

I'm highly concerned for this warmongering that Google has undertaken.

Reference Here: http://www.bbc.co.uk/cult/hitchhikers/guide/belgiu m.shtml

Picture this: I write a blog entry with either bad punctuation or erroneous content. Under the old system (pre-Goolge translation), I would receive several flames about my idiocy. With Google translations:

* People around the world will be confused and angered about my punctuation;
* Vastly larger numbers of people will complain about my erroneous content;
* Other people will step up to my defense and a massive flame war will ensue;
* Idiots eveywhere (who speak other languages) will echo my idiocy by believing the erroneous content I posted;
* The signal to noise ratio of the net will rise markedly;
* I will still be unsure of whether to count on my fingers starting with my thumb or forefinger depending on which European country I'm in.

I believe this pro-war, anti-peace, conflict-ridden idea of making everyone THINK they understand each other is ripe for critism. God made everyone else speak funny, I think it should stay that way! Only right thinking people speak my language anyway, and everyone else should just shut up and sit down!

(WARNING: above post contains carcinogenic levels of sarcasm, fasciousness, satire, irony, and adjectives. Please unplug brainstem and wipe with a clean, damp cloth before continuing.)

--
Unitarian Church: Freethinkers Congregate!

to translate: by dep01 · 2005-05-31 02:48 · Score: 1

That happens being, the Google has an updated technology and it goes, it will make a method it is a first in them,! It congratulates in them. To them being company percentage chance to this!!

--
"hey, could you pass me a paper towel? er.. I mean... DEPLOY ABSORBTION PANEL!"

Two thoughts by duffbeer703 · 2005-05-31 02:49 · Score: 0

- If they use UN documents as a guide, the Google MT engine will be excellent at translating bureaucratese between languages. I'm not sure if that's a good thing!

- Its obvious that the US Gov't is dumping money into Google -- I often wonder if Google is a front for some US gov't agency.

--
Conformity is the jailer of freedom and enemy of growth. -JFK

Re:Two thoughts by Secret+Agent+99 · 2005-05-31 03:23 · Score: 2, Insightful

If they use UN documents as a guide, the Google MT engine will be excellent at translating bureaucratese between languages. I'm not sure if that's a good thing!

Exactly. And the UN surely has fairly rigorous QA processes for its translations. Now try expanding the corpus with more translated copy.

In addition to feeding the system with translations that haven't been through formal QA (in many but not all cases), you also are now feeding it copy that has not had all the style deliberately squeezed out of it for easy translatability. (Which is the way they write in bi- and multilingual bureaucracies.)

If and when MT can handle that situation, I'll be impressed. But a "bureaucratese" translator seems like a much smaller challenge to me, relatively speaking.
Re:Two thoughts by NitsujTPU · 2005-06-03 14:26 · Score: 1

Actually, UN documents are good for this.

I have worked on similar research before. You align corpora in multiple languages, and perform a series of steps regarding regular, natural usage of the language in translation.

The aim is multifold. Honestly, the primary aim is to acquire the data cheaply, since you hopefully won't have to pay annotators to do much.

The problem is finding large bodies of text that meet the criteria. The UN corpora would be excellent for this.

hype by Lazy+Jones · 2005-05-31 02:53 · Score: 1

If anyone were capable of making a serious go of MT, that would have to be Google.

Oh, come on. I (still) like Google, but that's a bit silly, no?

--
"I love my job, but I hate talking to people like you" (Freddie Mercury)

But it's evolutionary! by Anonymous Coward · 2005-05-31 02:55 · Score: 0

People using a translator who don't take the time to familiarize with grammatical-lexical quirkies of mechanical translation and 'take offense' should be rounded up along with all those people who are so fond of taking offense on behalf of others who might be offended. Grind up for shrimp feed.

Imagine a world in which everyone stopped to consider environment, context, and cultural POV when engaged in conversation with others.

But, evolution won't let this happen. It favors numbers, rapid breeding, and in the case of humans, the hive-nest-swarm-colony-'what have you' of group focus on simplistic solutions serving fulfilment of immediate desire.

Translate that GOOGLE!

Middle East Media by rlp · 2005-05-31 02:56 · Score: 0

MEMRI (memri.org) does a nice job of translating articles, essays, and even video from various media in the Middle East.

--
[Insert pithy quote here]

Re:Middle East Media by leicaM6 · 2005-05-31 04:35 · Score: 1

Yeah, memri.org does a good job if you like Israeli backed propoganda websites that try to make all arabs look like fanatics. Do a little research on memri and disregard it as a source. Learn arabic and watch al-jazeera for yourself or go to http://english.aljazeera.net/english/ . Or there was a well done documentary about al-jazeera's coverage of the Iraq war called "Control Room" it is circling the p2p networks.

yeah, but can it translate this? by nullset · 2005-05-31 02:56 · Score: 4, Funny

Wenn ist das Nunstruck git und Slotermeyer? Ja!... Beiherhund das Oder die Flipperwaldt gersput. be careful! If you translate this you may end up dead.....

Re:yeah, but can it translate this? by zev1983 · 2005-05-31 07:16 · Score: 1

Yes but which of these is the original and which is the translation?

--If monkeys of the jump aiming at the fall in that inverni which has with the work of the plate of Moscow of the house in the case him.

--If the crack monkeys fall in winter what flying cup runners will have houses in falling.
Re:yeah, but can it translate this? by Anonymous Coward · 2005-05-31 07:18 · Score: 0

Heh, put that at the bottom of a homework for my German professor once, just to see if he'd comment. Strangely, he was thoroughly convinced it really was German, just not a dialect he understood.
Re:yeah, but can it translate this? by Anonymous Coward · 2005-05-31 08:10 · Score: 0

Bullshit. That is not even close to sounding like German, and I believe that anyone who speaks a bit of German can easily see that.
Re:yeah, but can it translate this? by VoidWraith · 2005-05-31 10:22 · Score: 1

The majority of the simple words are precisely German... I don't see why you wouldn't think it was German if you weren't a fluent speaker. One would tend to assume that the vocabulary was just things they didn't know.

opera by Anonymous Coward · 2005-05-31 02:56 · Score: 0

Yes, but can it translate German or Italian opera to english and still have it rhyme? :-)

Wait, why? by Ieshan · 2005-05-31 02:56 · Score: 4, Interesting

"Computers don't generalize or extrapolate the known into the unknown worth a damn."

Fortunately, that's not all that google has to go on. Google has 8 billion webpages, in many different languages, most of which are written by non-speechwriters. Not only can they analyze words based on translated context, but they can analyze words based on intra-language context, to form associations between words and meanings.

The real trick is getting down two important linguistic concepts: "Sandhi Rules" (for instance, the use of "an" before a vowel and "a" before a consonant, which are totally regular but more complicated than a word-to-word matchup), and the "degree" or "quality" of words, which indicate the type of adjective most appropriate in any given context.

For instance, "erudite", "learned", "educated", "knowledgeable", "skilled", and "cunning" could all be related words, but many of them have positive or negative assocations which may only really be conveyed by understanding the meaning, irony, or sarcasm of a particular phrase.

For instance, "John has been skilled in writing beautiful code for most of his adult life" is quite different from "John has been educated in writing beautiful code for most of his adult life", or "John has been erudite...". The first one is probably right if John has had a natural inclination to doing it properly, the second if he has undergone some training (though we don't know the actual state of his ability), the third (though the word doesn't even really make sense here) if he has been arrogant about his ability, shouting RTFM! every time someone asked him a question.

Re:Wait, why? by Anonymous Coward · 2005-05-31 05:09 · Score: 0

The real trick is getting down two important linguistic concepts: "Sandhi Rules" (for instance, the use of "an" before a vowel and "a" before a consonant, which are totally regular but more complicated than a word-to-word matchup), and the "degree" or "quality" of words, which indicate the type of adjective most appropriate in any given context.
What's the point of learning grammer rules when bloggers rarely use them?
Re:Wait, why? by databyss · 2005-05-31 05:17 · Score: 1

OUCH! That's one clear blogosphere smear right here!

--
Hmmm witty sig or funny sig? Maybe elitest techy sig!
Re:Wait, why? by Anonymous Coward · 2005-05-31 07:57 · Score: 0

Fortunately, that's not all that google has to go on. Google has 8 billion webpages, in many different languages, most of which are written by non-speechwriters. Not only can they analyze words based on translated context, but they can analyze words based on intra-language context, to form associations between words and meanings.

Yes, and just think of all the copyrighted text etc. in there. It'd be a nightmare if they actually decided to only use the texts they have permission to use. (Not that anyone would ever find out...)

Will it support Esperanto? by dsplat · 2005-05-31 02:59 · Score: 1

Since Esperanto is mentioned so prominently, I have to wonder whether the tool will support it. There has been at least one previous attempt to use Esperanto as an intermediate language for a machine translation project. The only English translation of the article I could find is now only available in Google's cache. There is an ironic symmetry to that.

--
The net will not be what we demand, but what we make it. Build it well.

Re:Will it support Esperanto? by patio11 · 2005-05-31 04:03 · Score: 1

Great idea... instead of using one lossy compression, use two! Maybe the errors introduced from the intermediate text to the final text will cancel out the errors introduced when making the intermediary text!

--
Help poke pirates in the eyepatch, arr.
Re:Will it support Esperanto? by adrianbaugh · 2005-05-31 04:34 · Score: 1

Perhaps as an intermediate language they'd be better off using lojban. It's aimed at removing ambiguity so it sounds ideal for the job, particularly for translating diplomatic stuff where ambiguity is a really bad thing....

--
"'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
- JRR Tolkien.
Re:Will it support Esperanto? by MrByte420 · 2005-05-31 04:36 · Score: 1

L.L. Zamenhof's idea with Esperanto was not to replace native languages but to supplement them with an easy-to-use language that people would feel natural moving to. For example, word order is not important in Esperanto because object nouns have a disticnt prefix (-on) versus subject noungs (-o). However you make them all plural by adding a j (-ojn versus -oj) and its always regular so all nouns do this. All verbs are regular, etc.

Maybe in another way the article does make a good comparison b/c what google is trying to do is make everyone's language the universal language.

They're programs are just so damn spiffy some times.

--
If religous zealots don't believe in Evolution, then why are they so worried about bird flu?
Re:Will it support Esperanto? by dsplat · 2005-05-31 05:17 · Score: 1

The problem is that you can't arbitrarily remove ambiguity in translation. In some cases, the ambiguity was not intended, where a word has multiple meanings, but given enough context, it is clear which one is correct. You can remove that. In other cases, ambiguity serves a legitimate purpose. There are times when you want to talk about an arbitrary member of a group hypothetically or generally.

Claude Piron talks about some of these issues from the perspective of a translator. He also points out that most source texts have errors in them as well.

--
The net will not be what we demand, but what we make it. Build it well.
Re:Will it support Esperanto? by dsplat · 2005-05-31 05:26 · Score: 1

L.L. Zamenhof's idea with Esperanto was not to replace native languages but to supplement them with an easy-to-use language that people would feel natural moving to.

I know. Actually, the example you gave concerning word order is both one of the greatest strengths and one of the sore points among Esperantists. I for one, forget the accusative a huge percentage of the time. Most of the Esperantists I've talked to don't have trouble understanding me with any word order that places the subject before the object (SVO, SOV, VSO). But I'm sure that I wouldn't be universally understood, and my lapses are at least going to make it harder for some listeners to follow me, so I'm trying to correct myself.

It turns out that there is a big plus in the regularity of Esperanto and the heavy use of derived and compound words besides making the language easier to learn. I've experienced this first-hand several times. You don't have to use Esperanto daily to maintain decent fluency. And you can invent words on the fly with a reasonable expectation that your listener will understand.

--
The net will not be what we demand, but what we make it. Build it well.
Re:Will it support Esperanto? by Anonymous Coward · 2005-05-31 11:20 · Score: 0

IMO this was an unnecessary complication whenb Subject Object Verb is practically hard wired in human brains.

Google IM by loconet · 2005-05-31 02:59 · Score: 1

As the article suggest, Google could use this if they ever decide to go ahead and launch an instant messanger. Imagine being able to chat with anyone in the world while google does the translation in real time for you. What are the implications of this.

As an example, in one hand my family back in Peru, who don't speak english, would be able to chat with my current gf who doesn't speak much spanish but still likes chatting with them. In the other hand, this would slow both parties' motivation of learning a new language (maybe good in my case ;) ).

--
[alk]

I'm looking forward to... by Trikenstein · 2005-05-31 03:00 · Score: 1

The Adventures of *Super Monkey Car *

How long until you can pump in a raw anime file by Anonymous Coward · 2005-05-31 03:00 · Score: 0

...and get out a fansub?

Cool, but we cannot hope for miracles by mincognito · 2005-05-31 03:01 · Score: 1

I think it's great that goolge is putting their resources behind this and I'm sure improvements in MT will be the result. What we can't expect though is perfect machine translations. Computers translate on the basis of syntax and semantic correspondance between words in the two languages. What's missing from the software is an understanding of context. I think google's efforts should help here. Training the software should be able it determine that a word is likely to have a particular value/meaning on the basis, for example, of certain words surrounding it (i.e. in previous and subsequent sentences). Current software seems only to translate at a per sentence level -- thus the lack of coherence in translated paragraphs. What google can't do though is solve in reference to non-textual context, for example: the character of the writer, when the text was written, for whom, purpose of the text, etc. So much of "meaning" for us humans is also the affect or force that a text has on us -- how it makes us feel -- and writers tweak what they say to bring about those particular effects. For this reason computers would find it difficult to account for stylistic variations that affect how a human would interpret. So much of meaning is implicated (i.e. not literal but tied to speaker/writer intentions) and this is what easily gets lost in translation. Even humans find translating this stuff hard. As any translater will tell you there is no such thing as a perfect translation (the translator can understand the meaning in the original language but not when trying to think it in the other). BUt still, I'm really excited to see what google is going to be able to do.

--

Ludwig Wittgenstein

err: I'm looking forward to... by Trikenstein · 2005-05-31 03:02 · Score: 1

The Adventures of *Super Monkey Car [Insert Blank]

words don't really have meanings by mincognito · 2005-05-31 03:12 · Score: 5, Interesting

Some people here seem to have a false picture of how language works. Individual words do not have meanings. Not to a human interpreter anyway. Sentences used in actual contexts have meanings (unless a single word is uttered as an elliptical sentence). The "meanings" of words, as found in dictionaries, are simply abstractions from occasions of use. The idea that individual words have meanings hasn't been current in philosophy or linguistics for about 50 years. Also, the idea of St. Augustine that children learn the meaning of words by associating sounds that they hear with particular objects that they observe is now also considered rather dubious.

--

Ludwig Wittgenstein

Re:words don't really have meanings by RealAlaskan · 2005-05-31 04:16 · Score: 1

Also, the idea of St. Augustine that children learn the meaning of words by associating sounds that they hear with particular objects that they observe is now also considered rather dubious.
I've got one learning language right now.

She points to something and asks: ``Wha'sthis? Wha'sthis? Wha'sthis? Wha'sthis? ''. We tell her, and she repeats with the same something, to see what we'll say this time. Repetition is important to kids. Then she says the word back to us, and she's ready for the next object. ``Wha'sthis? Wha'sthis? Wha'sthis? Wha'sthis? ''.

I agree that doesn't give you a very sophisticated view of meaning, but it's an essential first step, I think. I can see how St. Augustine fell into his error: that's the only learning process (related to meaning, that is) that you can really see.

Just to get totally off-topic, she has evolved a unique approach to grammar. She's hearing english and chinese, and when she wants something, e.g. milk, she says something like: ``Noo-nai baby''. If she wants big sister to give her the book, she says: ``jae-jae shoo baby''. Those are ``Milk baby'' and ``Big sister book baby'', respectively. If I should give the book to Mama, it's ``Baba Mama shoo''. All nouns, and the verb is implied. Her order seems to be subject object, which works in either language.

The other two kids have kept their english and chinese separate, and have been less experimental in their grammar. This one throws words at us until she gets what she wants.

--
See what I've been reading.
Re:words don't really have meanings by fuck+nwbvt · 2005-05-31 04:19 · Score: 1

You're absolutely right. How fundamentally postmodern (just like all Google's most successful algorithms). I can't wait to see the google translator in action.
Re:words don't really have meanings by Anonymous Coward · 2005-05-31 05:40 · Score: 0

Right and wrong.

(Remember, when referring to the liberal arts, never assert generalizations unconditionally--usually :p)

Certain languages have very specifically-defined words, with very specific rules as to the situation for which the word is most appropriate. This it is possible for individual words to actually have individual meanings in such languages, though this does not imply the same of every word of that language.

In the end, it really depends on how the individual is trained to think about words--i.e. the first language(s) and the nature thereof. Some languages are more abstract, some are less, some are both, and some vary depending on the cultural emphasis of the idea.

My concerns are:

1) How well would such a method handle non-linear phrases. By non-linear, a phrase may take on one or many of the possible meanings.

2) How would the method take into account meaning that is only spoken with tone that might normally be otherwise implied by native speakers of that language.

3) If a language had no written form, how would one archive it using this method, and how would such an archival method be of any use.

In my experience and opinion, producing true a translation requires several intermediate abstraction steps, which shifts the abstraction pattern from one language to the abstraction pattern of the other. This is easy (for us humans) with languages that are closely related, but otherwise extremely difficult. The process of abstraction itself is probably impossible for discrete machines.
Re:words don't really have meanings by Anonymous Coward · 2005-05-31 06:11 · Score: 0

Who cares whether philosophers or linguists currently think "word meaning" is fashionable? Every day there are psycholinguists demonstrating the effects of word meaning in comprehension, production and learning; computer scientists generating realistic models of word meaning (include sense taggers); and neuroscientists narrowing down the anatomy and timing of neural networks that process single word meanings. My guess is that very few of them give a damn what the latest theory-of-the-week is in linguistics or philosophy.
Re:words don't really have meanings by Anonymous Coward · 2005-05-31 06:23 · Score: 0

That's the mark of creativity. Don't you forget it.
Re:words don't really have meanings by Anonymous Coward · 2005-05-31 10:40 · Score: 0

True.
/certified professional philosopher without delusions of meta-omniscience

you add this plus their google accelerator and... by drasfr · 2005-05-31 03:17 · Score: 1

imagine the product you have if they can do it.

You configure the web accelerator to automatically translate all the languages to yours and all the pages would be translated to YOUR language in real time.

Wouldn't that be great? And seem to be easily doable with their technology if they would like to.

except, no. by mattdm · 2005-05-31 03:20 · Score: 3, Insightful

"It's all just observation," Yarowsky adds. "Children do the same thing, but they also do it through visual stimulation and feedback. They see a book and hear the word 'book,' and eventually they learn that it's a book. They see a bird with its wings flapping around and learn that is called a bird. It's the same with machines, only they have much better memories. Computers could remember exactly when and where they saw the words bird and book."

Except, no. Humans are basically generalization machines. Babies are able to grasp very quickly that words apply to categories of things -- not just that a *specific* item is a bird or a book, but to learn "I know a bird when I see it", even without necessarily being able to provide a scientific definition. Computers can be built to emulate this ability, but learning word-to-word mappings isn't *nearly* the same as learning abstract concepts and which words apply to them.

Re:except, no. by rreyelts · 2005-05-31 04:31 · Score: 2, Interesting

Babies are able to grasp very quickly that words apply to categories of things

This is so true. I remember being utterly amazed when my toddler was able to immediately spot a bird in real life based off a cartoonish caricature in one of his children's books. It just flabbergasts me how a mind so young can perform recognition that we can't achieve with a beowulf cluster of supercomputers.
Re:except, no. by databyss · 2005-05-31 05:09 · Score: 1

ahhh yes... but can your baby run linux?

--
Hmmm witty sig or funny sig? Maybe elitest techy sig!
Re:except, no. by russellh · 2005-05-31 06:03 · Score: 1

plus, machines don't have the sense feedback that way we do, and they don't get upset and need to be held, changed, etc. while their data set is still small.

--
must... stay... awake...
Re:except, no. by Anonymous Coward · 2005-05-31 07:38 · Score: 0

> Except, no. Humans are basically generalization machines. Babies are able to grasp very quickly that words apply to categories of things

Actually, that's only a problem with teaching a computer to be artifically intelligent. For translation, there is no strong need for such creative expansion of definitions. A large enough statistical mapping will work very well.
Re:except, no. by mattdm · 2005-05-31 07:52 · Score: 1

Actually, that's only a problem with teaching a computer to be artifically intelligent. For translation, there is no strong need for such creative expansion of definitions. A large enough statistical mapping will work very well.

I don't doubt that. However, the whole spiel of "this is just how children's brains work" is just hyperbole -- it's not actually like that at all.

AI wanted by Sneeka2 · 2005-05-31 03:20 · Score: 1

I guess some languages are harder to translate than others, and until they some up with a really good AI, they won't make it. Languages like Japanese simply lack a lot of concepts that are in English, German, French and the like. No plural or future tense for example. "Neko no mimi" could mean cat ears in general, the ears of a bunch of cats, a specific cat's ears, one specific ear of cat and so on. Stuff like this is usually clarified by the context. But depending on the text, the context might be considered as understood and therefore not be specified in a sentence.
If Googlefish learns only on a sentence pattern basis, this will not really help anymore in translating Japanese texts than current technology does. To adequately grasp the contents of a text and correctly translate it, a lot of AI work will need to be done for these languages...

--
Bitten Apples are still better than dirty Windows...

Google sets itself up for success by Potor · 2005-05-31 03:26 · Score: 1

Ah, the promise of all translation software!

Of course, the issue would be for me to show that I add value to what may freely (presumably) be gotten from the web. And luckily enough, no translation software has come close to providing literature-quality work.

In my mind, Google's choice of the UN indicates a confidence that they will reach a high level of accurate technical translation. This makes great business sense, as the UN is typical of markets that will require a quick turnaround on translation, and thus will be a great proving ground.

Also, those docs are all written in an argot which is highly repetitive and quite uniform. Thus, Google has, in a way, set itself up for success.

Machine Translation Language by Danuvius · 2005-05-31 03:27 · Score: 1

Rick Mourneau's Lexical Semantics details his creation of a machine translation intermediary language.

Absolutely fascinating stuff if you're into that sort of thing. Though definitely a less AI-esque attempt at the problem.

---

On an unrelated topic: if the stupid captcha's instituted by the idiotic editors continue for much longer, I will go out of my way and null-route all slashdot ad sources at both home and work.

--
Akarsz Magyar Gentoo fórumot? Akkor

Bad idea by slapout · 2005-05-31 03:28 · Score: 1

Great, we create an advanced translator and then use the words of politicans to train it. Now everything we run thru it still won't make sense!

--
Coder's Stone: The programming language quick ref for iPad

Hip? by Anonymous Coward · 2005-05-31 03:37 · Score: 0

Hip?

Search on long phrases like this:

http://www.google.com/search?hl=en&q=history+choco late+has+been+associated+with+romance+and+sharing

Doesn't find this:

http://www.cadbury.co.uk/

But finds a lot of sites that clone the text, like this:
http://search.hotbot.co.uk/results/chocolate/
http://yahooshopping.rediff.com/yahooshopping/even ts2004/newyear/yhxmas-1-4-0-0-0-1021752.htm
http://www.jlr.co.uk/partners.htm

Its not hip to bash Google, its deserved criticism for launching a poor result. They're getting off lightly.

Re:Hip? by eno2001 · 2005-05-31 04:05 · Score: 1

Ahhh.. but you make the mistake of thinking that the Cadbury site is an appropriate answer. I don't give a rat's ass about marketing or consumerism. I want a list of answers that might possibly provide me with EXACTLy what I'm looking for and Google does just that. If I WANTED to look at the Cadbury site, I'd just type http://www.cadbury.co.uk./ The worst search results are the ones peppered with links to related (or worse, unrelated) businesses that provide no useful info. A search engine is a research tool, not a marketing tool. At least Google repsects that and keeps the commerce links off to the side where they can be deservedly ignored.

--
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o
Re:Hip? by Anonymous Coward · 2005-05-31 04:38 · Score: 0

"you make the mistake of thinking that the Cadbury site is an appropriate answer"

It is, because Google returns a search engine scrape of the Cadbury site as the top answer.

This is the same for 30-40% of long phrases, Google will not find the most important site but second and third tier sites, which for long phrases are often just spam.

Next step in learning? by MadCow42 · 2005-05-31 03:38 · Score: 1

Learning from pre-translated texts is a great start...

Step two should be human corrections to machine-translated documents (learn from your mistakes - like we do), should it not?

MadCow.

--
I used to have a sig, but I set it free and it never came back.

translations of translations by grahamsz · 2005-05-31 03:42 · Score: 1

Many bible translations aren't made from the original languages but from other modern language versions.

Thus, i'd expect you'd find, a french translation of the NIV, which is quite a modern translation in the first place.

Re:translations of translations by Secret+Agent+99 · 2005-05-31 04:16 · Score: 1

Can you back that up? AFAIK, the French are just as obsessed as the English when it comes to producing "true" translations from the original texts. There is no French-language Bible that I know of that's a translation of an English translation.

In any case, the core issue isn't whether a translation is modern. While I think most would agree that it would be best to use a modern version based directly on the original languages, there is still the problem that no two translations agree fully on exactly how to read and interpret the originals, especially the Hebrew and Aramaic parts. And there are numerous competing translations made based on the original languages, at least in English.
Re:translations of translations by grahamsz · 2005-05-31 05:08 · Score: 3, Insightful

I was wrong about the french. However the spanish NVI appears to parallel the NIV, and i'd imagine would be pretty good candidates for this sort of analysis.

http://www.booksofthebible.com/p2390.html

I believe it's key that in the situation of

Ancient Lang A -> Modern Lang B -> Modern Lang C

that B and C will be far closer than

Ancient Lang A -> Modern Lang B
Ancient Lang A -> Modern Lang C
Re:translations of translations by Secret+Agent+99 · 2005-05-31 05:29 · Score: 1

OK, I see your point, but it renders the Bible (per se) irrelevant to the process. (Which is probably for the best, because stirring up the ire of fanatics by messing with sacred texts is not such a great idea.)

I think the critical point is that the corpus has to be fed with translations that are known and accepted to be accurate ( having gone through some kind of QA process). If you feed it willy-nilly with controversial or sloppy translations, the chances of getting garbage out rise dramatically.
Re:translations of translations by Abreu · 2005-05-31 05:32 · Score: 1

However the spanish NVI appears to parallel the NIV

If I remember correctly, both the NVI and NIV were created roughly at the same time, from the greek and hebrew originals. They are sister projects

--
No sig for the moment.

Redeeculous by Ancient_Hacker · 2005-05-31 03:44 · Score: 1

The old saw in computer translation is the huge computer that was given this line from the Bible:

"The spirit is willing, but the flesh is weak"

The computer crunched and crunched, tapes spun (this was back in the 60's) and eventually it printed out:

"The wine is pretty good, but the steak is lousy"

Therein encapsulated is all the folly of every attempt at word-matching translation.

Re:Redeeculous by Danuvius · 2005-05-31 03:57 · Score: 1

Therein encapsulated is all the folly of every attempt at word-matching translation.
What does this article/discussion have to do with "word-matching translation"?

The article describes a process of phrase matching. Whereby the biggest possible portion of a given text is matched to a "trusted" translation thereof.

To go with your example, instead of matching each individual word the match would more likely be the whole sentence, or the two component phrases:

"The spirit is willing, but the flesh is weak"
OR
"the spirit is willing", "but the flesh is weak".

--
Akarsz Magyar Gentoo fórumot? Akkor
Re:Redeeculous by Ancient_Hacker · 2005-05-31 04:49 · Score: 1

Well, not to get into a flame war, but I just did a Google search on "the sprit is willing". The actual phrase it turns out is "the spirit is indeed willing". Which showcases the other downfall of phrase-matching-- just one inserted word that doesnt change the meaning or horribly inverts the meaning will trip up any dumb phrase matcher. For example, the google matcher is more likely to glom onto the soap opera "The spirit is willing", rather than the exact biblical verse "the spirit is indeed willing". Or if the extra word is somehing negatory, or sarcastic, like "the spirit is NOT willing", or "The spirit is like bodaciously willing". It's unlikely that every possible permutation with 1,2,3,4 added adjectives or adverbs is going to be found in text somewhere, so there are going to be a lot of non-matches or mismatches. JMHO.
Re:Redeeculous by Danuvius · 2005-05-31 07:08 · Score: 1

Are you presuming a static (non-growing) base corpus? If so, why?

I don't see a system like this ever functioning without ongoing human review. And as human reviewers catch mistakes, the corpus would grow and probably even change as language itself changes.

No, it isn't strong AI. But the sort of mistakes you are talking about could probably be made quite infrequent on an ongoing basis.

--
Akarsz Magyar Gentoo fórumot? Akkor
Re:Redeeculous by JJ · 2005-05-31 08:25 · Score: 1

Actually, I did a master's thesis on early MT efforts with someone who was there and he always vehemently denied that this ever happened and I in two years of digging thru every source I could find couldn't find anything but folksy repitions of that.

--
So long and thanks for all the fish . . . !!!
Re:Redeeculous by rlp · 2005-05-31 10:49 · Score: 1

The version I heard was:

During a demo the phrase "Out of sight, out of mind" was translated into Russian. As the audience attending the demo didn't know Russian, the Russian phrase was then translated back to English. The result was "Invisible idiot".

--
[Insert pithy quote here]

Google Stock by artlu · 2005-05-31 03:49 · Score: 1

As long as we have another Google thread started, what are some stock thoughts. Jim Cramer is advocating this stock upwards of $440 on the basis of forward P/E and a 30% earnings growth. His main thought process is "Think of this stock as a $26 stock going to $35." However, he is not taking options into account. Options hedging strategies on GOOG are extremely costly at near-the-money options, and within two trading days one can become wealthy or poor trading GOOG options.

Anyway, I am a hedge fund manager, and our fund growing increasingly bearish on this puppy as well as the whole U.S. economy, so I just wanted to pique some interest of the /. community. However, we do not have any position in GOOG at this time.

-Aj
http://theopenfund.com/

--
-------
artlu.net

Limits by JJ · 2005-05-31 03:49 · Score: 1

I can see this working for languages with similiar grammars (like English German or even English-Chinese) but once you throw in languages with somewhat different grammars (like English-Japanese or English-Basque) I can't see how a statistical approach will succeed.

--
So long and thanks for all the fish . . . !!!

Don't forget... by ballpoint · 2005-05-31 03:49 · Score: 2, Funny

John, the cunning linguist.

--
Flourescent (adj): smelling like ground wheat.

Re:Don't forget... by Daktaklakpak · 2005-05-31 06:05 · Score: 1

Or John, the master debator. Seems a little more appropriate for slashdot ;).

Someone call a whaaaaaaambulance! by Anonymous Coward · 2005-05-31 03:51 · Score: 0

Mod parent "-1 whiny bitch", please.

I modded this as redundant, but since it's now modded 5, Interesting, I'm going to post to get my mod points back.
Jesus, you suck. Since your 'redundant' mod got overpowered, you posted to get your mod points back? I think the only thing worse than doing that is admitting you did that. What a fucktard.

It would have been much more interesting to see how the new translator would handle the news blurb.
Yes, it would. Why didn't you post that then, instead of whining about how your mod didn't matter? Oh...what's that? You don't have access to the 'new translator'? No one does? Then STFU.

Great to see a new translation engine! by Jugalator · 2005-05-31 03:55 · Score: 1

I mean... Something like half the web's translation services seem to be licensing Systran's aging engine, and the rest are even worse. Yes, there are ambiguities that are hard to take care of, but with computers great at managing a lot of data, you'd think they'd at least have more complete dictionaries. :-p And regarding the ambiguities -- it's here a good engine comes into play. The better it is, the more it'll be able to correctly resolve by analyzing the context.

--
Beware: In C++, your friends can see your privates!

Not so fascinating - those are old methods. by msbmsb · 2005-05-31 03:56 · Score: 1

Statistical MT methods are old hat and have been even used in things like automatic image annotation years ago. Parallel text correspondence learning is not novel. Short bib.

What about Language Weaver by Anonymous Coward · 2005-05-31 03:57 · Score: 0

Language Weaver (http://www.languageweaver.com/) is an INQTEL company that has statistical-based MT. How will Google's software differ from that and will it actually be better.

MT is a VERY difficult problem space, especially for languages that are non-Western (unlike Spanish, German, English, French).

Re:What about Language Weaver by durian · 2005-05-31 06:00 · Score: 1

Not to mention people working in research/AI/etc? I wrote a system that does translation using statistical machine learning techniques in 1995, sounds like it could be similar to this system. I never got enough data to get decent results though :-)

-peter

Machine Translation by Anonymous Coward · 2005-05-31 03:59 · Score: 0

I am a translator and interpreter and part time computer geek. I have tried various computer translation programs and have never been satisfied with any of them. I hoped they would at least be good enough to give me a draft translation that would only need minor editing but none have ever lived up to the hipe. I have found that it is still faster to translate the old fashioned way.

The biggest problem I have found with these programs is not necessarily the programs fault. People tend to write the way the speak and that is seldom grammatically correct and often filled with the jargon from their particular industry and/or slang. (a good example is this post) The programs I have used are not designed to work this way. They seem to rigidly adhere to the rules of the particular languages they are designed to translate. The other problem I have had relates directly to the jargon. It can take weeks of work to add all or most of translations I need in order to get a relatively accurate translation out of the programs I have used, and that is too long.

At the end of all of this, my hope is that someone can get it right so I can spend less time typing translations and more time interpreting. I enjoy the personal contact and the challenge of interpreting much more than translating alone in my office. Like I said earlier, even a program that could produce a decent draft document would be helpful. So I'll keep trying them out as they improve but until they reach that point, I'll still do it the old fasioned way because for me it's faster.

They were one of the first in the early '90s... by msbmsb · 2005-05-31 04:05 · Score: 2, Informative

The Mathematics of Statistical Machine Translation: Parameter Estimation by Brown, Pietra, et al. IBM was on this a while ago, and other efforts have improved upon this work, through the use of Maximum Entropy, etc.

IDIOT by SimianOverlord · 2005-05-31 04:08 · Score: 1

You need quote marks (") if you're searching for a particular phrase. Add the quote marks and your webpage is ranked....#1. Learn to use Google, kthnx.

--
Meine Schwester ist sehr, sehr reizvoll - Nietzsche

Re:IDIOT by Anonymous Coward · 2005-05-31 04:50 · Score: 0

" You need quote marks (") if you're searching for a particular phrase. "

I wasn't searching for a particular phrase, I was searching for a bunch of words. If I search for a bunch of words and Google gives me second or third tier sites, how would I know that I was not getting the best sites, or how to fix the query to get the best sites?

Fortunately, Google suggests this for "search":
http://www.google.com/search?hl=en&q=search

Altavisa at top slot, which works correctly without me being psychic.

http://www.altavista.com/web/results?itag=ody&q=hi story+chocolate+has+been+associated+with+romance+a nd+sharing&kgs=0&kls=0

Write for translation by autopr0n · 2005-05-31 04:08 · Score: 1

I'd be willing to be that if people agreed on a simple subset of their languages, macine language would work a lot better.

--
autopr0n is like, down and stuff.

Re:Write for translation by cicho · 2005-05-31 06:02 · Score: 1

Yeah, and if people agreed never to click certain key sequences on the keyboards, Windows would never BSOD. You being funny, right?

--
"Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan

Although, Not Quite True by Ieshan · 2005-05-31 04:09 · Score: 1

"John went to the bank to exchange some money."

Take for example the word "bank". When you read that word, do you activate either the representation of "Bank", as in, "the place where you deposit currency", or "Bank", as in, "the place where the river's edge meets the land?"

It's both, actually. Both representations activate. So it's not so clear that representations in the brain are context dependant.

Re:Although, Not Quite True by mincognito · 2005-05-31 04:39 · Score: 1

Actually, your example could be seen to prove my point. Under certain circumstances the hearer would rightly understand that John went down to the river bank to exchange some money. That's where the black market exchanges happen to occur. As for "activating representations," don't confuse the mental pictures that words create with meaning.

--

Ludwig Wittgenstein
Re:Although, Not Quite True by cicho · 2005-05-31 05:59 · Score: 1

" Both representations activate. So it's not so clear that representations in the brain are context dependant."

Nah. Both might activate initially, but context is wider than just the neighboring words in the sentence. Context includes the whole text, the historical period, culture, source of the text, etc. In a modern city context the "river" representation doesn't really register.

And anyway, context disambiguates senses, but disambiguation need not be complete. When we understand speech we extrapolate and make guesses all the time.

--
"Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan
Re:Although, Not Quite True by Ieshan · 2005-05-31 12:25 · Score: 1

Not mental pictures. PET activations.

Oh yeah... by eno2001 · 2005-05-31 04:14 · Score: 1

...thanks for the link there tiger. [ROWR!!!] The birds on the Cadbury splash page are hott!!

--
-"...bad old ideas look confusingly fresh when they are packaged as technology" - Jaron Lanier (Digital Maoism on Edge.o

Re:babelfish translation by MemeRot · 2005-05-31 04:24 · Score: 1

HELLO DEAR, WHITE I THAT THIS LETTER LIKES a SURPRISE, but, TO THEM, THERE not CONCERNS, all COMING IS PROPERTY. I AM Mr. HARRIS PETER, OFFICE LEADER FINANCIAL confidence $ bank PLC, which is IN MAURITIUS CLUTCHES OF EGGS some years ago, CAME a MAN, who WAS CALLED Mr. SHAW SMITH, that, Who FROM ITS COUNTRY, DETAILS FROM YOUR PART IS, TO MY COUNTRY (MAURITIUS) IN the RUBBER SECTOR.UNFORTUNATELY TO INVESTMENT, HE DIED IN a SELBSTCAbbruch. Mr. SHAW SMITH DIED, the SUM the DOLLAR 15MILLION US IN MY BANK LEAVING. I REQUEST HEREBY YOUR SUPPORT TO HELPING, the MONEY TO STATING. I BECOME THEM NEEDING, WHEN the COUSIN LATE SHAW SMITH FOR SERVING, BECAUSE at the moment, HE DOES NOT HAVE the FOLLOWING of the TRUNKS, SO THAT the MONEY ON BROUGHT can. IF THEM RECIEVE THE MONEY, THEM IS 40% TAKING, THE OVER DOLLAR 6MILLION LIKE YOUR PORTION AND IT GIVING ME THE OTHER 60%. The GOVERNMENT PLANS, the MONEY TO TAKING OVER, IF KEINS REPRESENTS ABOVE, SINCE ITS FOLLOWING OF KIN.I EXAMINES that EVERYTHING IS NOT UNDER CONTROL, SINCE I AM the ADDRESS MANAGER.SO THEM ANYTHING CREDIT, itself ABOUT.ALL TO CONCERNS, THEM DOING HAVING BEING SUPPOSED ME ANSWERS, IF THEY ARE INTERESTED THUS WE the NESSECARY -, DOCUMENTS FOR the TRANSMISSION TO PROCESSING BEGINNINGS ABILITY. OFFICE LEADER OF THE THANKS HARRIS PETER F.T.B

How will it translate ambiguous headlines? by Chyeburashka · 2005-05-31 04:25 · Score: 2, Insightful

I smiled when I read this recent headline:

Clinton tours devastated Bandeh Aceh.

Of course, I knew what the writer really meant. But the Bable Fish translation into French produces exactly the meaning which I first parsed when reading that headline.

Les excursions de Clinton ont dévasté Bandeh Aceh.

If machine translation become more common, perhaps English writers will have to be a little more careful.

Re:How will it translate ambiguous headlines? by Anonymous Coward · 2005-05-31 05:27 · Score: 0

My favorite ambiguous headlines, from when they came through the library in the "Columbia Review Of Journalism":

Hirohito's Body Moved

County Commissioner to Help Service Widows

Judge Says Sex With Minor Worth Felony Charge
Re:How will it translate ambiguous headlines? by Idarubicin · 2005-05-31 12:54 · Score: 1

Clinton tours devastated Bandeh Aceh.
Of course, I knew what the writer really meant. But the Bable Fish translation into French produces exactly the meaning which I first parsed when reading that headline.

I still don't know which interpretation is correct. ;-)

--
~Idarubicin

IAAT by WormholeFiend · 2005-05-31 04:34 · Score: 1

I'm a translator, and I'll be impressed by machine translations the day a computer can translate a joke so that it's also funny in the target language.

IMO, the future of written document translation lies in translation memory software, which records pairs of syntactic units as documents are translated, for future reference.

If such pairs come up in a future translation, the software can auto-replace those syntactic units within a specified tolerance, which in turn accelerates the translation process.

Re:IAAT by cicho · 2005-05-31 06:09 · Score: 1

Well, translation memory is only as good as the translator whose work is being reused, for one thing. For another, it only works well within the same narrow semantic area, and for highly formulaic sources, such we weather reports, technical manuals or software. For legalese I would hesitate to use TM, because a subtle difference in phrasing can make a huge difference in court.

--
"Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan
Re:IAAT by WormholeFiend · 2005-05-31 11:05 · Score: 1

translation memory is only as good as the translator whose work is being reused

The same is true of machine translation software programming.

Besides, professional human translators are still superior to machines.

Which is why in TM programs, less than 100% matches have to be approved...

In a perfect TM database building environment, each translations would be proofread and approved by a couple of other qualified people before being added to the software memory.

The real question is... by thisisauniqueid · 2005-05-31 04:42 · Score: 1

How will it translate W00t?

Is it just me... by TheMadPenguin · 2005-05-31 04:45 · Score: 1

or was their choice of Arabic translation text a bit... ummm... odd? Of all the things to choose from, they chose this? Wow.

--
Linux with kernel panic...
MadPenguin.org

my reaction was funny by halfelven · 2005-05-31 04:49 · Score: 1

I read the headlines, notice the Google thing, I am not surprised at all - I'm like "yeap, right on cue" - but then I'm like... hey, why I'm not surprised by this thing? :-)
Such is Google, I guess...

Alizée by fr2asbury · 2005-05-31 04:58 · Score: 1

I'll be happy when it stops 'translating' the name of my favorite French pop singer into "geostrophic."

Yes I know alizé means tradewind or some such thing, but really, there IS an extra 'e' on her name. ;-)

Hopefully they've improved Japanese-English by Anonymous Coward · 2005-05-31 05:11 · Score: 0

At present, you get stuff like:
"As for the red swimming wears being few,
The Mass phosphorus (everyone phosphorus is not) with the coffee drinking, inside the ,
Because photographing started, m is (the _ _) m
Even story is funny excessively, it is with * & & & & *"

Proper names typically come through as a burst of nouns, like "Showa Water Beauty Feather".

On second though, the actual content is probably less engaging.

(-_^)

Bollocks by p3d0 · 2005-05-31 05:13 · Score: 1

I bet you understood that word in isolation.

--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....

Re:Bollocks by mincognito · 2005-05-31 05:54 · Score: 1

No, I understood it in reference to my post.

--

Ludwig Wittgenstein
Re:Bollocks by p3d0 · 2005-05-31 14:23 · Score: 1

Touche.

--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....

AI by roman_mir · 2005-05-31 05:19 · Score: 1

so if Google will be able to use statistical analysis to translate any language into any other language how difficult will it be to use statistical analysis to connect actual meaning to that text? The way to train Google to understand the meaning and context of the language will be by using people with various devices attached to them: video-cameras, microphones, touch/smell/heat/acceleration/pain/pleasure sensors in order to collect statistical information on the meaning of the text.

And the first real AI will be born.

--
You can't handle the truth.

Business Model? by SnprBoB86 · 2005-05-31 05:48 · Score: 1

What is the business model on this one?

Injected ads in the translation?
Licensing the technology to corperate instant messaging and email services?
Embeding the technology into voice recognition phone systems?
Pay translation services?

--
http://brandonbloom.name

Re:Business Model? by Schwarzchild · 2005-05-31 09:06 · Score: 1

How about licensing it to the government to aid in the war on terror? I understand that Arabic translators are a scarce resource in the military, CIA, etc.

--
"sweet dreams are made of this..."

Future of programming by Raindance · 2005-05-31 05:56 · Score: 1

The language translation aspect of this system is impressive. However...

Is this a key component of the future of programming?

Give it thousands of high-level design documents. Give it the thousands of corresponding pieces of code which resulted from said documents. Do you get a system that can translate between design documents and code?

Perhaps there's going to be some pre-processing and post-processing, but I don't think this is out of the question. Think about it.

RD

Re:Future of programming by software_trainer · 2005-05-31 08:34 · Score: 1

You've made a flawed assumption here: that the design documents have anything to do with the resulting code.

[ducks flying objects]

--
User Training for Busy Programmers

Looks like Dutch by Simonetta · 2005-05-31 06:13 · Score: 1

Uebersetzoongsystems zee fullooeeng prudoocshun et zee fectury ruoote-a ooff zee A Mey 19 tu juoorneleests.

Looks like Dutch, which is somewhat close to English.

The problem with phonetic and dialect translations is that they rely on a non-standard way of expressing the phonetic sounds. A French speaker is going to have a different way of spelling out a Swedish accent than an English speaker.
Linguists have a precise set of symbols for describing the funny sounds that people make with their mouths. Unfortunately no one else knows this symbol set.
It might not be a bad idea to start using these standard symbols as a way to encode speech into text using computers. Maybe, just maybe, we can start a systematic and scientific way to approach machine translation speech-to-text-to-alternative language speech. The idea of using religious texts as the basis of multiple language translation gives me pause because (no offense, but let's be real here) most religious texts have been originally written by people with severe mental disorders whom we accept as messiahs and prophets simply because it is politically expedient and convenient for us to do so.
The more advanced the translations become, the greater the risk of incorporating the reminents of these mental disorders into our translation machines.

But ... by Anonymous Coward · 2005-05-31 06:16 · Score: 0

will it allow to translate italian cooking recipes, so I can do it right? UN documents probably won't help with that. Darn, have to learn italian then.

Yes but... by dangrover · 2005-05-31 06:49 · Score: 1

Yes, but will it do Klingon?

Or what about other weird languages? Darmok on the water at Tanagra! Tanagra, his arms wide! Darmok and Tanagra on the ocean!

/ducks

Oppertunity for efficiency? by Anonymous Coward · 2005-05-31 06:55 · Score: 0

The reliance on datasets for statistical analysis seems to be a prime oppertunity for the use of the semantec web. Where datasets could be appropriately described using the ontology and the indexing and processing of these datasets could be then completed autonomously.

Reasonable?

You can study a book as much as you like. Copyright no more excludes a statistical analysis like this than it excludes publishing an article which points out that a book has 600 pages.

As long as the text itself is not reproduced, either explicitly or implicitly, you're fine.

They've probably been using U.N. documents because it's a nice homogenous set that's already entered into a computer, or at least is all in one place. Chasing down copies of Moby Dick in Tagalog is hardly a productive use of time. It takes too much brainpower... you can have grunts handle the processing if you have all the documents in one place to start.

Thief by Hal+The+Computer · 2005-05-31 07:59 · Score: 1

That's my line!

--

int main(void){int x=01232;while(malloc(x));return x;}

MT and the FBI? by seven+of+five · 2005-05-31 08:02 · Score: 1

I keep hearing that the FBI is backlogged out the wazoo when it comes to translating Arabic, Pashto, etc for terrorist messages. A news story ("60 minutes"?) on the subject stated that the FBI is also sluggish to remedy the situation due to dumbass bureaucratic game-playing.

But MT seems now to be mature enough to step in and solve the problem almost in a single stroke.

FBI, are you listening?

Input from Google Print? by Anthony+Coward · 2005-05-31 08:10 · Score: 1

Perhaps Google is going to use the information made available in the Google Print Library Projekt and in the Google Print Publisher Program to feed this project with lots and lots of text in different languages.

--
This .sig is the short tail.

Bible bad corpus for training translation software by software_trainer · 2005-05-31 08:26 · Score: 1

If it were any other book, you might be able to establish a valid parallel between two different languages. However, almost every translation of the Bible is "informed by tradition." This means the translators attempted to translate the Bible in the context of what the people paying the translators believe. Almost all Bible translations are made by committees. They interpret the text through theological doctrines and dogmas that arose centuries after the Bible was written. And, this "understanding" of what the Bible means can change not only from version to version, but also from culture to culture. The book is just too burdened with tradition for any two translations to parallel each other as closely as, say, two translations or Huck Finn. Any Gaus's translation, "The Unvarnished New Testament," is the only one I've found that simply translates from the original Greek without interpretation. You would need two language versions that both attempt to suppress the author's prejudice and beliefs to use the Bible as a corpus for translation.

--
User Training for Busy Programmers

This is incredibly useful by abbamouse · 2005-05-31 08:31 · Score: 1

For the last six years, I've been collecting data on all civil wars fought since 1816 as part of an update to the Correlates of War datasets, which have been instrumental in reshaping the scientific study of international politics. Right now, the biggest obstacle to further progress is that most of the abscure wars we're considering simply aren't described in English. The only materials on many Latin American wars (e.g. the dozen or so civil conflicts in Ecuador) are in Spanish, while information on many African revolts is only available in French. This project simply doesn't have the resources to hire full-time translators, so even basic MT would be great, for it would allow me to skim through reams of documents and online articles in order to identify the materials worth the costly time of a human translator. In addition, even a modest improvement in MT would allow me to extract data from foreign-language materials myself, since I'm generally seeking quantitative data on casualties and force levels, not a detailed description of events.

--
Make cheese not war 8:)

A better solution by mattlandau · 2005-05-31 08:46 · Score: 2, Interesting

There is an arguably better solution which is to agree on a common writing system (note that adopting a common writing system is more feasible than adopting a common language as one need not learn any phonology). Fifty years ago, a man by the name of Charles K. Bliss developed a system he hoped that, in the future, would become universally adopted. His invention was dubbed Blissymbolics. It is currently used in the field of augmentative and assistive communication where it gives language to those who would, due to handicap, be unable to communicate with any fluency.

The basic idea behind Blissymbolics is to use mostly indexical ideographs - that is to say, eg, the symbol for man looks somewhat like a stick figure man. There are some pure symbols, however, though they somewhat conventional - for instance, a heart shaped symbol represents emotion. However, it is not limited to concrete meanings, and, though I doubt it could be proved, I believe it's has the same capability for expression as any other writing system, including English writing, due to its compositionality. Couple that with the fact that it can be learned quite easily, one might begin to see that yes, this is a better solution. I am dedicated to this ideal, so if you get a chance, check out http://www.activebliss.com/ for more information about the ideal of universal communication.
Cheers,
Matt Landau

Context by pbaer · 2005-05-31 09:28 · Score: 1

How will it handle words that have 2 or more very different meanings. Best Example I can think of: spanish word fui= I was or I went. Fui al cine= I went to the movies. So from the context it should learn the Fui al cine= I went to the movies NOT I was the movies.

But what happens when it's translating say some fantasy novel where a boy is turned into a house and it's supposed to say he was a house, how will it translate that into spanish?

--
There are 11 types of people, those who know unary and those who don't.

Interesting Choice of Languages by wintermute1974 · 2005-05-31 11:43 · Score: 1

The choice of languages used to demo the new translation tool seem to point to something interesting.

In the only four slides where translations are shown, these are the original languages which are translated into English:
Slide 137 - Chinese
Slide 138 - Arabic
Slide 139 - Arabic
Slide 140 - Arabic

As accidental as these choices may be, is Google trying to sell the new translation tool to some arm of the U.S. Federal Government?

Consider that the previous two U.S. wars were fought against enemies whose holy text is considered definitive only in Arabic.

Consider that the only nation challenging the U.S. as a global superpower is China.

Re:Interesting Choice of Languages by Anonymous Coward · 2005-06-02 05:29 · Score: 0

Well, duh. Yes, of course. Look at the post above about http://www.languageweaver.com/. Guess who funds INQTEL? I'll give you three clues: C.I.A.

Re:fascinating-- Foiling Attempts... by davidsyes · 2005-05-31 12:55 · Score: 1

Why do *I* get the feeling that this won't be available "off-line".

Imagine if some corporate or government espionage entities get subpoenas or inspection "rights" to the queries and translations in the Google and other online translation engines.

Imagine if entrepreneurial but not-yet-burnt-int-the-real-world types innocently place implicit trust in these systems, only to find out the handy-dandy idea they were translating "to go international" got ripped off and disseminated by larger, faster, lawyer-backed corporations.

Imagine a world where every off-line move you made or idea you formulated on your local PC (read: Linux PC or deprecated windoze PC) got intercepted or monitored by government agencies which had the self-accorded "right" to pre-empt your "publications" or periodicals or distributions of information.

Imagine a world where more companies slide under the sheets with governments when they realize there's a profit (in money or newly-accorded exemptions or such) in providing international espionage (*cough* domestic protection) enhancement.

(Slightly lifing thin, umm, tin-foil hat...)

If you don't do much electronic writing or distribution of information and don't fear governments and don't do marginally-interesting things, then fear not, I suppose.

(Letting thin-foil hat fall back down...)

But, if you're a rogue of types, and intent on translating them for global release, the next form of "defamation" or "discrediting" could come when the government-backed translation engine fatally alters your doc with "subtle" or "nuanced" changes in diction, grammar, word choice, and the like, or just makes the "hostile" or "ingrate" document look or sound "unprofessional" by introducing improperly spelt (spelled) worlds.

(Cocking tin/thin-foil hat again)

But, then, there always are available the various traditional brick-and-mortar (and pricey) translation services in major cities to which you can drive or overnight your documents. And, likely you can SUE them for bad effort, shoddy work, and the like so long as the product rendered is not beyond contractural protections.

(SLAMMING thin foil down, AGAIN)

Now, think of some company that can't innovate, yet steals or "co-opts" or "borrows" ideas from others and runs them out of business (ie, willing to "cut off their air supply"...), and you'll likely have Google out of business because it lacks the deep $60 Bn pockets...)

(Thin foil worn out now...)

David Syes

--
Previously: "Linux... Toward the Sunrise..." Now: "Linux... Toward the-- No, now, part of Every Sunrise"

Re:Wait, why? We have bicycles... by davidsyes · 2005-05-31 12:58 · Score: 1

Reminds me of:

"We have bicycles for boys with adjustable seats."

Who/what has "adjustable seats"? Boys or bicycles?

That was the topic of sentence structure, word choice, and sensible description of the subject vs the object...

--
Previously: "Linux... Toward the Sunrise..." Now: "Linux... Toward the-- No, now, part of Every Sunrise"

Has anyone actually tested this service? by Ogemaniac · 2005-05-31 13:10 · Score: 1

It seems as pointless as Babelfish to me. Here is the Japanese I dumped in, from an email I received yesterday. This is really simple stuff. UOEZX"úi-ØjSwZÀOE±Ì½ßAOEïiî-ØA"Ñ"cA'AZODjðXZzRO©çZ nßÜB "sÌlÍ\oÄB Here is Google's translation: June 9th (the wood) for student experiment, workshop (the rice plant wood, Ida, the medium bridge, Miyoshi) o'clock of 9 it begins from 30 minutes. The person whose are inconvenient please requests. Here is my translation: In order to accomodate student experiments, the journal seminar (Inagi, Iida, Nakahashi, Miyoshi) will be moved to Thursday, June 6th at 9:30am. If this is inconvenient please let me know. It didn't even get the abbreviated 'Thursday' right, even though it is written this way all the time. It also missed half the names, even though these are common ones.

C3P0 by Anonymous Coward · 2005-05-31 13:28 · Score: 0

I think google should name it after the best translator ever

FURTHER IDIOCY by SimianOverlord · 2005-05-31 22:43 · Score: 1

You searched for a bunch of words appearing in the text on a page, expecting that page to come up straight away.

Moreover, you searched for a phrase contaminated by other search engines search pages. Altavista has obviously had human intervention to remove these from the consideration of the results, Google hasn't. I could equally find terms where Altavista was corrupted and Google wasn't - WTF does that prove?

Finally, you have to think about what google is interpreting what you want from your search. You wanted a web page containing those words. To get that, google has a mechanism whereby you can search for those words in that context. The way you entered the search terms was phrased as a request for information on "history chocolate romance sharing", as google redacts unnecessary words from your search terms. Google is a contextual search, it searches for pages about the history of chocolate. Does the cadburys page contain a great deal of useful information on the history of chocolate? If so, it will appear higher. It is ranked not by keyword, but by the most useful webpage in the search that google guesses you want. So cadbury's despite possessing all your keywords, might not cut the mustard. So is not ranked at the top.

The difference is not in the ability of google to find the page you want and rank it accordingly. The difference stems from your inadequate understanding of search engines. Google may, or may not be inferior to Altavista. I'd rather fuck a puppy than bother to find out. Your hamfisted attempt to show that it was was pointless and misleading.

And for gods sake, if you're already using google to search, you don't want google appearing when you type in search and click "I'm Feeling Lucky" - its no use at all. I'm not surprised google redacts itself from the search query - I'm surprised it's in there at all.

--
Meine Schwester ist sehr, sehr reizvoll - Nietzsche

Input / Job Changes by SeanDuggan · 2005-06-01 02:16 · Score: 1

*shrug* Machine translation is always going to require humans to massage the translation algorithms, I suspect. Those translators who get involved early on are more likely to be in the position to be the experts here.

--
This sig has absolutely no significance and serves only to take up screen space and waste the time of the reader.

UN Translation of Idioms by SeanDuggan · 2005-06-01 02:49 · Score: 1

This means that the system would handle idioms almost perfectly when there are corresponding idioms in the target language, and adequately even when there aren't any (since the hard work of coming up with standard translations for those has already been done by several generations of UN translators).

Somewhat off-topic in some ways, but I was amused by a story I read some years ago in a magazine, some mention made here and here, about a UN translator who, stymied by a Russian idiom which defied literal translation, drew from Shakespeare and translated it to "Something's rotten in the state of Denmark" which of course led to protests from the representative from Denmark, etc. The core idea part of the story stays intact, but the location, date, and details of the Russian idiom vary (I remember the first time reading it, it was "something about a cow and two piles of hay" and the links I've included talk about "an orange tree, a backyard, Moscow" and "an elder-bush in the garden and an uncle in Kiev"), so there's a decent chance this is an urban legend.

Personally, I'm curious less about the idioms than I am about the MT's parsing of grammar. Not all languages use Subject-Verb-Object grammar and the rules from adding adjectives, adverbs, suffixes, and the like vary greatly between languages and often aren't all that consistent. For instance, Russian doesn't have articles like English does, instead relying on order of words in the sentence to indicate whether one is referring to a generic instance of an object or a specific instance. The grammar section of Mark Rosenfelder's Language Construction Kit provides several examples of differing grammars in other languages. I'm currently taking ASL courses (which admittedly do not have a written form for this kind of translation) and I will freely admit that learning to express sentences in a "Timeframe-object-subject-verb-time signifier-query word" structure is kicking my ass, despite having done some studying of other languages in the past. Heck, just learning when and where to place adjectives before or after words usually takes years for most people. (That's one place where English does seem to shine. Adjectives are always in front of nouns, as best I can recall. Adverbs, on the other hand...)

Anyhow, I'll be eagerly watching the progress here, inasmuch as my scattered attention span will ocasionally provoke me to check my bookmarks list...

--
This sig has absolutely no significance and serves only to take up screen space and waste the time of the reader.

a more mundane problem by Anonymous Coward · 2005-06-01 04:42 · Score: 0

is the quality of translation. I often see subtitles poorly done due to budget & time concerns, as well as the translator's comprehension of the source language.

If some phrases only show up once or twice in a corpus (say only once in one out of hundreds of films), and the translator didn't have the time or wits to get it right, it would be stuck for ever in the analysis.

A minor in AI? by autopr0n · 2005-06-01 09:10 · Score: 1

You certanly didn't give a very technical explanation other then to assert (yes I'm right). how exactly would one teach a Neural Network language?

--
autopr0n is like, down and stuff.

Re:A minor in AI? by Nytewynd · 2005-06-01 14:37 · Score: 1

You certanly didn't give a very technical explanation other then to assert (yes I'm right). how exactly would one teach a Neural Network language

The concept is called priming. The neural network starts making weighted links between words. When it sees one word, it then follows the strongest link to the next word. If you feed it enought patterns like the article suggested, it starts to assign probabilities that one word = another word based on the number of times the 2 words apprear in similar locations. The network can learn a strong link between the same word in different languages. With human guidance, you can verify and adjust the links. It can even learn to associate one word in one language with a small phrase in another. Obviously this takes a long time to learn, and would need tons of human correction at first. The longer the network is running, the more likely it would be to correctly link words. In effect, it can teach itself after it learns enough.

That is exactly how humans work. For example, there is no translation in the mind of bilingual people. Because of priming, the instant someone hears a word in English, they automatically think of the Spanish word, rather than perform a lookup. That has been proven with response tests. There was no delay at all in bilinguals fluent in both languages. The stonger the bond, the less the delay.

--
/. ++

Slashdot Mirror

Coming Soon, The Google Translator

418 comments