Romancing The Rosetta Stone

Sure! by Anonymous Coward · 2003-07-28 03:49 · Score: -1, Troll

Right after I romance this FP.

Re:Sure! by Anonymous Coward · 2003-07-28 04:20 · Score: -1, Offtopic

Suck YOU and Fuck MY dick! oh yeah, btw, drdink is ghey!

episode by Anonymous Coward · 2003-07-28 03:49 · Score: -1, Offtopic

did anyone see that saved by the bell episode where screech fucks albert clifford slater by tutoring kelly and denying george michael tickets?

kelly had a camel toe.

This just in... by Anonymous Coward · 2003-07-28 03:50 · Score: -1, Offtopic

Talk show radio hosts are all talking today about the sad death of Bob Hope. Bob was found dead in his home last night, he was 100 years old.

He will be sadly missed as he was truly and American Icon.

oh oh... by wwest4 · 2003-07-28 03:51 · Score: -1, Offtopic

prepare for the universal translator joke onslaught :)

Re:oh oh... by Anonymous Coward · 2003-07-28 03:53 · Score: 4, Interesting

This is exactly NOT a universal translator as it uses matched bilingual texts. You need an already translated text for his system to work.
Re:oh oh... by Anonymous Coward · 2003-07-28 04:00 · Score: -1, Redundant

Not only this launching of the news of the university of southern California has a fantastic title, he also has a great content. This history is near one of its scientists, Franz Jose Och, who software aligns very above between systems of the translation. "déme enough parallel data, and you can have a system of the translation for any two languages in a question of hours," said to Dr Och, paraphrasing Archimedes. Its approach trusts two concepts, compiling enormous amounts of data, and applying statistical models to these data. It totally does not pay attention of rules and dictionaries of the grammar. the "method of Och uses the matched bilingual texts, the equivalent ones computer-codified of stone the famous inscriptions of Rosetta. Or, something, gigabytes and gigabytes of stones of Rosetta." Read my summary for more details."
Re:oh oh... by SoTuA · 2003-07-28 04:18 · Score: 1

My impression is that you need the matched texts to *train* the translator. After that, you give it one side and it tries to build the translation, based on what it learned in training (sounds like neural networks and all that to me).
Re:oh oh... by Abcd1234 · 2003-07-28 04:19 · Score: 1

You need an already translated text for his system to work.

Well, you need a pool of already matched texts. Once you have this for a given language pair, you can immediately start translating (presumably). So, it is "universal" in that, once the system is primed, it can immediately begin translating (ie, you don't have to build grammar rules, dictionaries, etc, etc).
Re:oh oh... by wwest4 · 2003-07-28 04:21 · Score: 1

but i think the training only applies to a system for translating between the languages of the datasets used.

so "training" it using parallel texts of japanese and english would produce routines for translating between japanese and english, but not french and english.
Re:oh oh... by Have+Blue · 2003-07-28 04:24 · Score: 1

Sure it is... It's not a magic psychic UT like the one on Star Trek that allows you to instantly converse with an unknown species, but it's "universal" in that it has no internal dependencies on specific languages and could be used to go between any two.
Re:oh oh... by Shads · 2003-07-28 04:34 · Score: 1

right... duh.

If you wanted french you'd have to feed it french and engrish

--
Shadus
Re:oh oh... by wwest4 · 2003-07-28 04:37 · Score: 1

Thanks!
Re:oh oh... by Man+Eating+Duck · 2003-07-28 04:45 · Score: 2, Interesting

I think what was implied was that if you already had a translation engine trained for English/Japanese, when you are training it for English/French you can use the already existing "metadata" for English/Japanese to make the process quicker (requires smaller datasets to achieve the same precision).

I might be far out here. Excuse my crappy English, btw.

--
Are you a grammar Nazi? I'm trying to improve my English; please correct my errors! :)

Where can I download his software? by georgeha · 2003-07-28 03:51 · Score: 1, Funny

Since I mistakenly borrowed some undubbed Cowboy BeBop.

Re:Where can I download his software? by Anonymous Coward · 2003-07-28 04:02 · Score: 0, Redundant

Not only this launching of the news of the university of southern California has a fantastic title, he also has a great content. This history is near one of its scientists, Franz Jose Och, who software aligns very above between systems of the translation. "déme enough parallel data, and you can have a system of the translation for any two languages in a question of hours," said to Dr Och, paraphrasing Archimedes. Its approach trusts two concepts, compiling enormous amounts of data, and applying statistical models to these data. It totally does not pay attention of rules and dictionaries of the grammar. the "method of Och uses the matched bilingual texts, the equivalent ones computer-codified of stone the famous inscriptions of Rosetta. Or, something, gigabytes and gigabytes of stones of Rosetta." Read my summary for more details."
Re:Where can I download his software? by Doomrat · 2003-07-28 04:41 · Score: 1

"Spike-uh!"
"ne desi begasi nada ping pong - BLOODY EYE.".
Re:Where can I download his software? by Anonymous Coward · 2003-07-28 06:08 · Score: 3, Informative

Franz Josef Och homepage is at:

http://www.isi.edu/~och/

There are links to 3 software packages for download.

Article text by Anonymous Coward · 2003-07-28 03:51 · Score: 4, Informative

Romancing the Rosetta Stone

'Give me enough parallel data, and you can have a translation system in hours'

University of Southern California computer scientist Franz Josef Och echoed one of the most famous boasts in the history of engineering after his software scored highest among 23 Arabic- and Chinese-to-English translatio systems, commercial and experimental, tested in in recently concluded Department of Commerce trials.

"Give me a place to stand on, and I will move the world," said the great Greek scientist Archimedes, after providing a mathematical explanation for the lever.

"Give me enough parallel data, and you can have a translation system for any two languages in a matter of hours," said Dr. Och, a computer scientist in the USC School of Engineering's Information Sciences Institute.

Och spoke after the 2003 Benchmark Tests for machine translation carried out in May and June of this year by the U.S. Commerce Department's National Institute of Standards and Technology.

Och's translations proved best in the 2003 head-to-head tests against 7 Arabic systems (5 research and 2 commercial-off-the-shelf products) and 14 Chinese systems (9 research and 5 off-the-shelf). In the previous, 2002 evaluations they had proved similarly superior.

The researcher discussed his methods at a NIST post-mortem workshop on the benchmarking held July 22-23 at Johns Hopkins University in Baltimore, Maryland.

Och is a standout exponent of a newer method of using computers to translate one language into another that has become more successful in recent years as the ability of computers to handle large bodies of information has grown, and the volume of text and matched translations in digital form has exploded, on (for example) multilingual newspaper or government web sites.

Och's method uses matched bilingual texts, the computer-encoded equivalents of the famous Rosetta Stone inscriptions. Or, rather, gigabytes and gigabytes of Rosetta Stones.

"Our approach uses statistical models to find the most likely translation for a given input," Och explained

"It is quite different from the older, symbolic approaches to machine translation used in most existing commercial systems, which try to encode the grammar and the lexicon of a foreign language in a computer program that analyzes the grammatical structure of the foreign text, and then produces English based on hard rules," he continued.

"Instead of telling the computer how to translate, we let it figure it out by itself. First, we feed the system it with a parallel corpus, that is, a collection of texts in the foreign language and their translations into English.

"The computer uses this information to tune the parameters of a statistical model of the translation process. During the translation of new text, the system tries to find the English sentence that is the most likely translation of the foreign input sentence, based on these statistical models."

This method ignores, or rather rolls over, explicit grammatical rules and even traditional dictionary lists of vocabulary in favor of letting the computer itself find matchup patterns between a given Chinese or Arabic (or any other language) texts and English translations.

Such abilities have grown, as computers have improved, by enabling them to move from using individual words as the basic unit to using groups of words -- phrases.

Different human translators' versions of the same text will often vary considerably. Another key improvement has been the use of multiple English human translations to allow the computer to more freely and widely check its rendering by a scoring system.

This not coincidentally allows researchers to quantitatively measure improvement in translation on a sensitive and useful scale.

The original work along these lines dates back to the late 1980s and early 1990s and was done by Peter F. Brown and his colleagues at IBM's Watson Research Center.

Much of the improvement and

Re:Article text by Anonymous Coward · 2003-07-28 04:08 · Score: 0

"Give me a place to stand on, and I will move the world," said the great Greek scientist Archimedes, after providing a mathematical explanation for the lever.

Calm down - you`re just translating one code into another!
Re:Article text by Anonymous Coward · 2003-07-28 06:57 · Score: 0
Can he translate:
- Slashdot editorese into English.
- IRC teen chat into English
- 1337-5p33k into Swedish chef.
Re:Article text by Lord+Ender · 2003-07-28 11:58 · Score: 1

'Give me enough parallel data, and you can have a translation system in hours'

Give me enough data, and I can instantly give you a translation system. And mine requires no statistical analysis or anything like that. All I need as input is every possible phrase in both languages. Nyah!

--
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.

Let me know by gazuga · 2003-07-28 03:51 · Score: 5, Funny

when it's in the form of a fish, and can fit in my ear...

--
"I turn away with fright and horror from the lamentable evil of functions which do not have derivatives."

Re:Let me know by Wordsmith · 2003-07-28 04:40 · Score: 1

sure. it's at babelfish.altavista.com.
Re:Let me know by Cruciform · 2003-07-28 04:52 · Score: 2, Funny

That's not a real Babelfish though, it's just a Beta
Re:Let me know by Elbow+Macaroni · 2003-07-28 05:08 · Score: 1

And let me know when it can translate bad English or heavily accented English.

--
-------------------------------------
Technically, we are beyond survival.
Re:Let me know by Luigi30 · 2003-07-28 07:51 · Score: 0, Offtopic

But then wouldn't that prove that God exists, and God would disappear, then the guy who proved it would prove that black is white?

--
503 Sig Unavailable

The Signature could not be accessed. Please try again later or contact the administrator

Goatse Receiver, ass contortionist, dead at 55 by Anonymous Coward · 2003-07-28 03:51 · Score: -1, Troll

Goatse Receiver, ass contortionist, dead at 55

I just heard some sad news on talk radio - ass strectching exhibitionist Goatse Receiver was found dead in CmdrTaco's bed this morning. There weren't any more details. I'm sure everyone in the Slashdot community will miss him - even if you didn't enjoy his work, there's no denying his contributions to making the intarweb a great place for millions of users. Truly an American icon.
*_g_o_a_t_s_e_x_*_g_o_a_t_s_e_x_*_g_o_a_t_s_e_x_* g_______________________________________________g o_/_____\_____________\____________/____\_______o a|_______|_____________\__________|______|______a t|_______`._____________|_________|_______:_____t s`________|_____________|________\|_______|_____s e_\_______|_/_______/__\\\___--___\\_______:____e x__\______\/____--~~__________~--__|_\_____|____x *___\______\_-~____________________~-_\____|____* g____\______\_________.--------.______\|___|____g o______\_____\______//_________(_(__>_\___|_____o a_______\___.__C____)_________(_(____>_|__/_____a t_______/\_|___C_____)/______\_(_____>_|_/______t s______/_/\|___C_____)__RIP__|_(___>_/__\_______s e_____|___(____C_____)\______/__//__/_/_____\___e x_____|____\__|_____\\_________//_(__/_______|__x *____|_\____\____)___`----___--'_____________|__* g____|__\______________\_______/____________/_|_g o___|______________/____|_____|__\____________|_o a___|_____________|____/_______\__\___________|_a t___|__________/_/____|_________|__\___________|t s___|_________/_/______\__/\___/____|__________|s e__|_________/_/________|____|_______|_________|e x__|__________|_________|____|_______|_________|x *_g_o_a_t_s_e_x_*_g_o_a_t_s_e_x_*_g_o_a_t_s_e_x_* mportant Stuff: Please try to keep posts on topic. Try to reply to other people's comments instead of starting new threads. Read other people's messages before posting your own to avoid simply duplicating what has already been said. Use a clear subject that describes what your message is about. Offtopic, Inflammatory, Inappropriate, Illegal, or Offensive comments might be moderated. (You can read everything

mmm beer bot by nohear_t · 2003-07-28 03:52 · Score: -1, Offtopic

Can this be modified to seek out and fetch me a beer on a hot day like this?

I would gladly tell the robot a single command that it understoad as:

1. start
2. seek beer
3. plot course
4. get beer
5. return

mmmmmmmmmmmmmm..beer

first stone! by Anonymous Coward · 2003-07-28 03:52 · Score: -1, Offtopic

First CSLib translation stone! Fnord!

First on-topic post! by CubeDude213 · 2003-07-28 03:52 · Score: 1, Funny

This could be an amazing improvent to search engines. If they could instantly translate a page before showing it in the results.

Re:First on-topic post! by Anonymous Coward · 2003-07-28 03:58 · Score: -1

You mean like google does already?
Re:First on-topic post! by Anonymous Coward · 2003-07-28 04:12 · Score: 0

bullcrap, google translates on the fly when you click on "translate".
Re:First on-topic post! by Thud457 · 2003-07-28 04:30 · Score: 0, Offtopic

How's a thermos know to keep hot stuff hot but to keep cold stuff cold?

--
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
Re:First on-topic post! by Anonymous Coward · 2003-07-28 06:35 · Score: 0

Magic.

Obsolete? by Lord+Kholdan · 2003-07-28 03:52 · Score: -1, Insightful

Am I the only one who thinks that translation is quickly becoming obsolete?

Almost everyone can speak, read and write at least tolerable english and most young people can have full fledged discussions in it. Just look at Slashdot, I'm quite sure I'm not the only one who doesn't have english as primary language. It's not that farfetched idea that in the (near) future everyone uses or at least knows english well enough to make translations meaningless in all but the most complicated subjects.

Re:Obsolete? by Anonymous Coward · 2003-07-28 03:54 · Score: 0, Insightful

Guess what asshat, the majority of the people on this planet don't speak English. Just because everyone you know does doesn't make it a majority or even a large minority.
Re:Obsolete? by StressedEd · 2003-07-28 03:57 · Score: 0, Funny

So you think translation is becoming obsolete do you? Perhaps you need to "get out more".

--
Be nice to people on the way up. You will meet them again on your way down!
Re:Obsolete? by BlackHawk-666 · 2003-07-28 03:57 · Score: 0, Funny

You done good be rite is. We fckuing enGlish good nows.

--
All those moments will be lost in time, like tears in rain.
Re:Obsolete? by Surak · 2003-07-28 03:57 · Score: 5, Insightful

'Almost everyone'? What *are* you talking about? You must be an American. From a recent online Harris poll, most Americans think at least half the world speaks English. This is just plain wrong. The truth of the matter is that it's more like 20%. That's it. Most people on the NET might speak English, but most people in the world? Hardly.

--
My journal has hot /. gossip.
Re:Obsolete? by Anonymous Coward · 2003-07-28 03:58 · Score: -1, Offtopic

Almost everyone can speak, read and write at least tolerable english

LOL! LOL! LOL

Fucking pig ignorant english-centric poster you are.

you're correct as long as you put "Almost everyone can speak, read and write at least tolerable english if you don't include the majority of the world who can't"

oh dear you just made my day with that funny.
Re:Obsolete? by Anonymous Coward · 2003-07-28 03:58 · Score: 2, Insightful

English may be the closest thing we have to a universally-spoken language, but it certainly isn't going to become the -only- language any time soon, if ever. If all other languages disappeared, though, we would definitely need translation for all the literature we have that isn't written in English.
Re:Obsolete? by ShadeARG · 2003-07-28 03:58 · Score: 3, Informative

Here is Japanese Slashdot, and I'm sure there are others.
Re:Obsolete? by timftbf · 2003-07-28 03:59 · Score: 2, Flamebait

If email, IRC/"chat rooms" *spit* and SMS are anything to go by, a great number of young and not-so-young people who *do* have English as a first language are barely capable of forming even simple sentences in it correctly.

Regards,
Tim. (Grumpy old man day)
Re:Obsolete? by Anonymous Coward · 2003-07-28 04:00 · Score: 2, Funny

DARPA actually proposed that a forced conversion to English policy would be more cost effective for the defense department to implement through military invasion than some complicated translation scheme. Hence congress's support for the translation project.
Re:Obsolete? by DG · 2003-07-28 04:02 · Score: 5, Funny

A man who speaks three languages is trilingual.

A man who speaks two languages is bilingual.

A man who speaks one language is American.

DG

--
Want to learn about race cars? Read my Book
Re:Obsolete? by JeffTL · 2003-07-28 04:03 · Score: 1

"Almost anyone"? Show me the data. And of course, you must remember that while many can speak English, fewer can speak it well.
Re:Obsolete? by Surak · 2003-07-28 04:06 · Score: 1

Heh. A friend of mine from India who worked for GM was fond of that joke. :)

--
My journal has hot /. gossip.
Re:Obsolete? by lildogie · 2003-07-28 04:06 · Score: 2, Interesting

> Americans think at least half the world speaks English.

Better-informed Americans (a small miniority of the class) would be aware that Spanish is well on the way to becoming the predominant language in the USA.

But, IMHO, English could become the next Latin: the dead language that everybody has to learn if they're going to try and influence the world.

BTW, every "% of humanity" statistic has to consider that most humans are Chinese.
Re:Obsolete? by jpkunst · 2003-07-28 04:06 · Score: 1

Am I the only one who thinks that translation is quickly becoming obsolete?

Almost everyone can speak, read and write at least tolerable english and most young people can have full fledged discussions in it.

That isn't much help if you want to read (say) De Uitvreter by Nescio and you don't know Dutch, does it? Or for a slightly more geeky angle, if you want to read Edsger Dijkstra's Dutch texts?

JP
Re:Obsolete? by Ummite · 2003-07-28 04:06 · Score: 1

Not only it is not obselete in the real world, it can have tremendous influence on how we can decrypt information, when it's a mix of cipher / steganography.
Re:Obsolete? by Anonymous Coward · 2003-07-28 04:07 · Score: 1, Insightful

And if it wasn't for Spanish (and South America), Americans would think 100% of the world speaks English..
Re:Obsolete? by Anonymous Coward · 2003-07-28 04:07 · Score: -1

Chill dudez, YHBT. Most Americans literate enough to post here wouldn't be retarded enough to think this.
Re:Obsolete? by Anonymous Coward · 2003-07-28 04:08 · Score: 0

Most everybody on Slashdot reads and writes fairly good english. Yep. That's true!
But guess what? The reason for that is not becasue everybody on the net speaks english, it's because [drum roll] Slashdot is an english web site! Can you believe it? Visitors to an english web site speaks english?
I'm sure that most people in your home country (most likely the US) speaks english too, I wonder what that means...
Re:Obsolete? by Anonymous Coward · 2003-07-28 04:09 · Score: 0

But, IMHO, English could become the next Latin: the dead language that everybody has to learn if they're going to try and influence the world.
Dead language? Exactly when do the Americans plan on killing all the English?
Re:Obsolete? by Anonymous Coward · 2003-07-28 04:09 · Score: 0

Not most, but closer to the highest percentage. 1.3billion chinese out of 6 billion.

Still a helluvalot, compared to english speakers.
Re:Obsolete? by Anonymous Coward · 2003-07-28 04:10 · Score: 0

Heh. A friend of mine from India who worked for GM was fond of that joke. :)

Oh yeah, I know him!

He's dead.
Re:Obsolete? by GoofyBoy · 2003-07-28 04:10 · Score: 2, Insightful

http://www.britishcouncil.org/english/engfaqs.htm# howmany

Translators are needed for 3/4ths of the world. Not what I would call close to obsolete.

--
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
Re:Obsolete? by kmac06 · 2003-07-28 04:10 · Score: 1

I think the poll is misleading. More accurate would be:
Most Americans think at least half the world that matters speaks English.
American attitude is if you don't speak English, you don't matter :)
Re:Obsolete? by Anonymous Coward · 2003-07-28 04:15 · Score: 0

Then you "knew him", not "know him".
Re:Obsolete? by Verteiron · 2003-07-28 04:17 · Score: 1

BTW, every "% of humanity" statistic has to consider that most humans are Chinese.

I've been thinking about this a lot lately. As in, "Should I be learning Mandarin Chinese?" China is rapidly becoming a high-tech nation. There are a lot more of them than there are of anyone else. Frankly, I think the only thing that keeps China from being the single largest world influence is the fact is its government, and that can't last forever. Sooner or later (maybe after a bloody revolution) China is going to become THE major world power, and the US is going to take a role like that of Great Britian's (no offense to any Brits reading this) today.

--
End of lesson. You may press the button.
Re:Obsolete? by Shads · 2003-07-28 04:19 · Score: 1

True but you could argue that even the english and americans can't speak it well.

--
Shadus
Re:Obsolete? by notcreative · 2003-07-28 04:22 · Score: 2, Funny

A man who speaks no known language is Dubya.
I don't think this translation program would be able to deal with his Texan affectations.
Re:Obsolete? by Anonymous Coward · 2003-07-28 04:23 · Score: 0

I'm an American - speak 4 languages - worked as a translator for 16 yrs...not ALL Americans believe "English is universal" - shit it's not universal in the USA... If I could use my family for reference, 60% of Americans speak one of the American dialects of English - which tend to vary based on generational/locational factors... and are at times very hard to translate...

Try getting help on a street corner in Frankfurt, Paris, or Dahran speaking only English - you'll find your language is definately in the minority... (Some will help you , sure, but many more will not / can not)...
Re:Obsolete? by cdrudge · 2003-07-28 04:23 · Score: 1

Right...215.4 million Americans speak only English. 28.1 million Americans speak some form of Spanish (with or without also speaking English). I wouldn't exactly say 13% is "well on the way". English definitely isn't going to be the next Latin. The majority of science, technology, aviation, and computers is English based. It isn't going anywhere soon.
Re:Obsolete? by panda · 2003-07-28 04:24 · Score: 1

I'm American, yet I speak four languages including my native language.

What does that make me a quadrilateral? :-)

--
Just be sure to wear the gold uniform when you beam down -- you know what happens when you wear the red one.
Re:Obsolete? by red_dragon · 2003-07-28 04:25 · Score: 3, Informative

Spanish Slashdot: Barrapunto. It's been around for almost as long as Slashdot itself.

--
In Soviet Russia, Jesus asks: "What Would You Do?"
Re:Obsolete? by cybercuzco · 2003-07-28 04:26 · Score: 1

20% would be more than any other language, including chinese. So if you were going to pick a language to learn that the most people would understand, you would pick english

--
Re:Obsolete? by Anonymous Coward · 2003-07-28 04:26 · Score: 0

Am I the only one who thinks that translation is quickly becoming obsolete?

Yes.
Re:Obsolete? by Alton_Brown · 2003-07-28 04:29 · Score: 1

Fucking pig ignorant english-centric poster you are.

What's up with this? When did Yoda get such a filthy mouth? :)
Re:Obsolete? by Anonymous Coward · 2003-07-28 04:31 · Score: 1, Funny

Then you "knew him", not "know him".

I'm dead too.
Re:Obsolete? by bogado · 2003-07-28 04:31 · Score: 1

And before that every one that was anyone knew french. Before that latin. This didn't make the need for translation. Basic english may be enouth to find out where the bathroom is or how much a bigmac costs, but many people don't go much ahead of this. You are forgetting that there are countries with many, many people who are not literate in their own language.

And if you are trying to get to know a culture, you MUST know the language. So for works of art, and even for not so artistic content such as movies, a good translation or knowledge of the language of the original country is needed.

--
[]'s Victor Bogado da Silva Lins
^[:wq
Re:Obsolete? by Count+of+Montecristo · 2003-07-28 04:32 · Score: 1

The fact of the matter is that the very objective of creating this kind of tools is so that Enlgish speaking individuals, regardless of national origin, who simply refuse to learn other languages can benefit from the wealth of literary beauty and knowledge ammassed in other languages.
Just think of it.. the most spoken languages in the world are Chinese, then Spanish. English only comes after those two.. and then comes arabic and Bengali.
According to this more people in the world speak languges OTHER than english

--
*shower*
Re:Obsolete? by Jon+Abbott · 2003-07-28 04:33 · Score: 1

It makes you a polyglot! :^)

--
Slashdot's first reaction to VMware
Re:Obsolete? by Concerned+Onlooker · 2003-07-28 04:33 · Score: 1

You must be an American
Perhaps you missed this sentence fragment from the original poster:
...I'm quite sure I'm not the only one who doesn't have english as primary language.
It would be hard to classify this person as an American.

--
http://www.rootstrikers.org/
Re:Obsolete? by JohnsonJohnson · 2003-07-28 04:34 · Score: 2, Insightful

BTW, every "% of humanity" statistic has to consider that most humans are Chinese.

If you want to be even remotely close to statistically significant you have to include citizens of India as well most of whom are very different from those of Chinese descent. . In fact most people will probably be an Indian citizen within the next 20 years. However citizens of India are a more heterogeneous population than that of China. Then again, Chinese of the diaspora (eg. in Malaysia, Indonesia, the Philipines, Vancouver etc.) are also a large population but can be very different than mainland Chinese. So I guess in the end every % of humanity statistic that measures some culturally derived phenomenon has to be considered BS.
Re:Obsolete? by vveak · 2003-07-28 04:36 · Score: 1

I would think the "bloody revolution" would cut down their numbers a bit.
Re:Obsolete? by anonymous+loser · 2003-07-28 04:36 · Score: 1

You must be an American

The original poster said:
I'm quite sure I'm not the only one who doesn't have english as primary language.

Now, while this certainly doesn't *preclude* them from being American (there are plenty of Americans whose primary language is not English, after all), I'd suspect they are not. That being said, at least in first world countries the original poster is somehwat correct; most children in first world countries spend some number of years studying English. Whether they are capable of conversing in the language is another story.
However thanks to the huge populations accounted for by the third world, the number of English speakers per capita in the world is fairly small.
Re:Obsolete? by Anonymous Coward · 2003-07-28 04:36 · Score: 0

It's always fun to laugh at those stoopid Americans! As long as they keep sending those jobs and dollars our way - eh? And if they aren't sending us anything, close the border, eh?

Canada! The only nation with more subs in a mall (West Edmonton Mall) than in the Navy... ;-)
Re:Obsolete? by Planesdragon · 2003-07-28 04:42 · Score: 1

Better-informed Americans (a small miniority of the class) would be aware that Spanish is well on the way to becoming the predominant language in the USA.

Why? Because self-identified Hispanics use it a lot?

When non-hispanic subcultures start replacing English with Spanish, THEN it'll have a chance of predominance. As it stands now, Spanish is the largest secondary-language in the nation, and may very well become a national second, but it's not about to replace English.
Re:Obsolete? by mikewolf · 2003-07-28 04:45 · Score: 0, Offtopic

you must not be american either, its conversating, not conversing in modern english (at least thats what mtv says)
Re:Obsolete? by Anonymous Coward · 2003-07-28 04:48 · Score: 0

it makes you better than me!
Re:Obsolete? by RevMike · 2003-07-28 04:49 · Score: 1

most people in the world? Hardly.
Most of the people that matter speak English :)
The fact of the matter is that the US is the dominant commercial power in the world. They speak a language similiar to English in the UK (but we didn't fight a revolution so that we could ride the lift to go out a light a fag :). Add in Canada, Australia, and the up-and-coming heaveyweight India, and the critical mass gets much bigger. English literacy is very high in Europe. Japan has a failry high level of English literacy, and China does as well - at least amongst academia, r&d, and the international commerce communities.
The fundamental fact is that English is the dominant international language, the language that most people are going to choose to learn in addition to their native language.
Now if the rest of you would just start calling it soccer and drop that silly metric system the world would be even better.
Back to the grand-parent poster's point - since most of the world's technology development and commerce is happening in English already, there is little need for translation. On the other hand, cultural and literary works are still important - but likely cannot be translated mechanically. The middle ground for this technology is basic informational data - news reports, manuals, etc.
Re:Obsolete? by I8TheWorm · 2003-07-28 04:49 · Score: 1

Actually, when you consider that most Chinese and a good number of Indianpeople speak English, 50% is fairly accurate.

The current world population (estimated).

--
Saying Android is a family of phones is akin to saying Linux is a family of PCs.
Re:Obsolete? by JeffTL · 2003-07-28 04:50 · Score: 1

That is, I'm afraid, too often the case. The US is well on the way to being a Spanish-speaking country, and if that actually happens, I think it'll be in the best interests of Americans actually being able to understand each other.
Re:Obsolete? by Anonymous Coward · 2003-07-28 04:58 · Score: 0

Well, we ARE talking about the NET, are we not? The original post was talking about automatic translation of WEB PAGES being obslolete. Not what the rest of the world thinks.
Re:Obsolete? by Felonius+Thunk · 2003-07-28 05:02 · Score: 1

Most Chinese don't speak "Chinese" either. They can read a common language, but there are hundreds of mutually unintelligible dialects in China (at least as distinct as, say, Norwegian and Swedish, and often more so; Croatian and Serbian are just the opposite - mutually intelligible spoken language, different writing systems). The definition of what counts as a language is often politically motivated, so lumping all of China into "Chinese" and assuming all India fits into Hindi makes for some skewed numbers.

With that in mind, English ranks higher in number of native speakers than you might think, and is way, way ahead in number of non-native speakers. This is especially true for written language (where machine translation can somewhat work).

I have no idea why you think Spanish is going to be the predominant language in the USA. My guess is that either you have birth rates and # of language speakers conflated or just feel that even the slightest trend toward bilingualism requires a winner and loser.
Re:Obsolete? by JJ22 · 2003-07-28 05:09 · Score: 1

According to this more people in the world speak languges OTHER than english
Actually, if you look at the notes on the statistics, they are only the first language speakers in each country.
From a Time article last year:
Mandarin may have the largest number of native speakers (about 800 million), but English, with 1.9 billion speakers--including some 350 million native speakers--is far and away the largest global lingua franca. The next largest, Spanish, claims 450 million competent speakers worldwide, while French is spoken by a mere 130 million. The most vital statistic is that some 1.5 billion people around the globe speak English as a second language. "It has become the working language of the global village," says ESU chairman Lord Alan Watson.
For native speakers, Chinese wins. For overall comprehension, English is out in front (which doesn't affect the need for good translation tools).
Re:Obsolete? by Surak · 2003-07-28 05:10 · Score: 1

Exactly how is 20% 'most people'? If I have 20 green marbles, 10 black marbles, 5 blue marbles, 15 orange marbles, 9 plaid fuschia marbles, 11 clear marbles, etc., are 'most' of my marbles green? No, I have more green marbles than any other kind.

So that would mean (if 20% is indeed higher than any other language, I have seen no statistics to verify this fact -- last I heard one of the main Chinese dialects was the most widely spoken language in the world, not English, but this was 20 years ago) if you were going to pick a language that was (based on what you have said) the most widely spoken language, then English it would be. But my own understanding is that this is not the case.

--
My journal has hot /. gossip.
Re:Obsolete? by Surak · 2003-07-28 05:12 · Score: 1

Most of the people that matter speak English

Okay, so 80% of the world population is irrelevant?

--
My journal has hot /. gossip.
Re:Obsolete? by alkali · 2003-07-28 05:22 · Score: 1

I've been thinking about this a lot lately. As in, "Should I be learning Mandarin Chinese?" China is rapidly becoming a high-tech nation.
I've had the same thought, but wonder if it should be Cantonese -- the south (near HK) speaks Cantonese and is at the forefront of Chinese high tech and economic development.
Re:Obsolete? by RevMike · 2003-07-28 05:29 · Score: 1

so 80% of the world population is irrelevant?
In short, yes.
Please understand that I am not saying that I don't have compassion for that 80%, or that they don't matter to me on an moral/ethical level.
I don't have an direct interaction with those people, however, and so it does not matter to me whether or not they can speak English or I can speak their native language.
People are needlessly critical of others for not being multi-lingual. But the fact of the matter is that for most Americans multi-lingual doesn't do anything to aid them in their lives. Europe is different, of course, because of the large number of relatively small nations with their own languages. A Dane who works in Sweden and vacations frequently in Germany will derive worthwhile benefit form knowing Danish, Swedish, and German. A New Jersean who works in New York and vacations in Florida derives little benefit from knowing non-English languages.
The fundamental fact is that the international communities with whom I am likely to interact probably speak English already. It may not be fair that they learn 2+ languages and I only need to learn one, but life isn't fair.
Re:Obsolete? by egoff · 2003-07-28 05:35 · Score: 1

a unilingual anglophone. (say that ten time fast)
Re:Obsolete? by Anonymous Coward · 2003-07-28 05:46 · Score: 0

>A man who speaks one language is American.

correction, he speaks french
Re:Obsolete? by rtv · 2003-07-28 05:47 · Score: 1

That's quite insulting to the millions of Americans to whom English is a second language. The 1950s view of American monoculture is dating fast, at least down the coasts.
Re:Obsolete? by Anonymous Coward · 2003-07-28 05:56 · Score: 0

As an American, I can drive for 24 hours in any direction and still be surrounded by people who speak English as their primary language. Do I really need to learn another language?
Re:Obsolete? by Anonymous Coward · 2003-07-28 05:59 · Score: 0

Half using what metric? Half of the world population? Half of populated geographic areas? Half of the highly-devloped nations? It depends. A statistic like that is heavily skewed because of very heavily populated third-world and communist nations like India and China.

And what's with the cheap jab to America? There are civilized, intelligent people that live here, you know. I'm really getting fed up with being branded as a naive, self-absorbed, patriotic moron. "Almost everyone" is an easy heuristic to jump to considering that most developed nations (especially in Europe) speak English as a second language, so I don't know why you were so quick to jump on this guy. Besides that, Americans don't have nearly as much incentive to learn another language as someone in Italy, for example.

Would you like to know why most people in educated, developed nations choose to learn English as a second language? It's because people from English speaking nations tend to have a lot of money and come from some of the richest economies in the world. If you wish to engage in business with these peanut-butter-and-jelly-eating-idiot-americans, it would behoove you to learn their language.
-z
Re:Obsolete? by GlassHeart · 2003-07-28 06:15 · Score: 1

I can drive for 24 hours in any direction and still be surrounded by people who speak English
Ever hear of that wonderful American invention called an airplane? How about that other American invention called a telephone? What makes you think you need to drive anywhere to have use for a foreign language?
Do I really need to learn another language?
You don't need to learn anything. It's not as if billions of people are finally getting the education you take for granted in the First World, and are just dying to take your job. Nothing to worry about.
Re:Obsolete? by aminorex · 2003-07-28 06:23 · Score: 1

With Mandarin you can get by almost everywhere
in China that would have tech higher than an oxcart.
The schooling in China is done in Mandarin. Public
schooling is mandatory. As a result essentially everyone
in the country below the age of 50 is fluent in Mandarin.

Street life in HK, GZ, SZ, is in Cantonese (less in SZ, since
it is so heavy with migrants), but business, science, tech,
will use Mandarin primarily.

--
-I like my women like I like my tea: green-
Re:Obsolete? by Anonymous Coward · 2003-07-28 07:15 · Score: 0

"The truth of the matter is that it's more like 20%. That's it."

By who's count? You're making a comparison with a Harris Poll, which says Americans overestimate, with data that has no source. So how can you really say it's utterly, completely, and proveningly wrong?

Personally, I think the Harris Poll may be a bit screwed up. Yes, Americans overestimate, probably still be a lot. In many ways, 20% of the world speaking Enlgish seems high to me in some respects. However, I wouldn't be surprised if it was closer to 30% than 20%. A lot of foreigners speak English. Not as their first or primary language, but I'm always amazed when relatives come to visit and all of them speak a decent amount of English. In DC, nearly every foreigner speaks decent English, although that population is of course skewed (those visiting may be more inclined to speak English, as well as come to DC and hold jobs because they can).

Still, what is the source of your 20% statement? How did they arrive at the 20%, by looking only at primary languages, those that could pass the TOEFL, enough to carry on an understandable but broken conversation?

Or did they do some crappy percentage calculation by nation population and primary language? Most people in Quebec speak French, but a whole lot of them can also speak English. I have not yet met a German that didn't know at least a bit of English and I have 2 friends, one which travelled and one which lived there for a year, both of whom concur.
Re:Obsolete? by Verteiron · 2003-07-28 08:31 · Score: 1

A bit? You got that right. Let's say that, in the course of the hypothetical Chinese revolution, 50,000,000 people are killed. This is roughly the number of people who died in World War 2. That's a lot of people, well over a hundred times the entire population of the USA.

This brings the population of China down from about 1.5 billion to about... 1.45 billion.

The phrase "drop in the bucket" springs to mind.

--
End of lesson. You may press the button.
Re:Obsolete? by Jedi+Alec · 2003-07-28 11:19 · Score: 1

heh, the french have the same joke, starring themselves as the latter party.

--

People replying to my sig annoy me. That's why I change it all the time.
Re:Obsolete? by gnovos · 2003-07-28 12:00 · Score: 1

'Almost everyone'? What *are* you talking about? You must be an American. From a recent online Harris poll, most Americans think at least half the world speaks English. This is just plain wrong. The truth of the matter is that it's more like 20%. That's it. Most people on the NET might speak English, but most people in the world? Hardly.

By that same token, every country that has an international airport has English speakers in the tower... The same is not true of Chinese.

--
"Your superior intellect is no match for our puny weapons!"
Re:Obsolete? by Obfiscator · 2003-07-28 12:09 · Score: 1

Let's say that, in the course of the hypothetical Chinese revolution, 50,000,000 people are killed. This is roughly the number of people who died in World War 2. That's a lot of people, well over a hundred times the entire population of the USA.

What? The USA has around 290 million people. How is fifty million over a hundred times the entire population of the USA? Unless I'm misunderstanding something...

--
"Nothing shocks me. I'm a scientist." -Indiana Jones
Re:Obsolete? by Anonymous Coward · 2003-07-28 14:09 · Score: 0

Actually, people in Italy have the incentive to learn how to speak English, because Americans aren't willing to learn Italian.

However, they learn English perfectly and then find out that Americans don't speak it.

- Thjorska
Re:Obsolete? by Anonymous Coward · 2003-07-28 15:57 · Score: 0

Holy shit! I see dead people!
Re:Obsolete? by orblee · 2003-07-28 23:10 · Score: 1

Bizarrely enough, I just heard that joke being given by a man on the street about 2 hours ago on the Paramount Comedy Channel. Freaky.
Re:Obsolete? by cybercuzco · 2003-07-29 06:52 · Score: 1

its not a majority, its a plurality, more than any other faction. Any percent can be a plurality as long as its more than the other factions. I never said that most people learn english, I merely said that english is the logical choice to learn if you want to be able to communicate with the most people.

--
Re:Obsolete? by Surak · 2003-07-29 07:09 · Score: 1

most ( P ) Pronunciation Key (mst) adj. Superlative of many., much. 1. 1. Greatest in number: won the most votes. 2. Greatest in amount, extent, or degree: has the most compassion. 2. In the greatest number of instances: Most fish have fins. n. 1. The greatest amount or degree: She has the most to gain. 2. Slang. The greatest, best, or most exciting. Used with the: That party was the most! pron. (used with a sing. or pl. verb) The greatest part or number: Most of the town was destroyed. Most of the books were missing. adv. Superlative of much. 1. In or to the highest degree or extent. Used with many adjectives and adverbs to form the superlative degree: most honest; most impatiently. 2. Very: a most impressive piece of writing. 3. Informal. Almost: Most everyone agrees. Idiom: at (the) most At the maximum: We saw him for ten minutes at the most. She ran two miles at most.

Note that there is nothing in there about a 'plurality'. Most means greatest in number, the greatest amount or the greatest part or number. Note that 20% of anything is not the greatest part or number of the whole, while it may be the largest piece, it is not the 'most'.

I wish people would be more precise in their use of language.

--
My journal has hot /. gossip.
Re:Obsolete? by cybercuzco · 2003-07-29 08:43 · Score: 1

1. Greatest in number: won the most votes.
So for example, if candidate A wins 45%, candidate B wins 40% and candidate C wins 15% that would be a plurality, not a majority, yet candidate A would have won the most votes. most is an imprecise word, it can mean a plurality or a majority, if you truly want to be precise in your language use, use plurality or majority instead of most. I take "greatest part of the whole" to mean that of all the individual parts, this is the largest. If you cut a loaf of bread into three unequal peices, one peice will be the greatest, i.e. larger than the other two individual peices taken seperately. Assuming you gave each peice to one person, the person who got the largest peice would then have the most bread. If two peices were given to one person, they may or may not now have the most bread, depending on the size of the individual peices.
Regardless of this, i think your confusion is actually coming from my use of the word "the" in my sentence. When i say"in order to be able to communicate with the most people, you would learn english" the most implies the maximum amount possible (while still only learning one language) Your contention that "most people dont know english" is correct, because your useage implies that most means a majority, and combines all the other languages into one (not english)

--
Re:Obsolete? by Verteiron · 2003-07-29 14:49 · Score: 1

Whoops, you're absolutely right. I hate it when I lose sight of the forest for the trees.

At any rate, 50,000,000 people is still a lot. And the rest of my post stands.

--
End of lesson. You may press the button.

What if the two texts don't match ? by Anonymous Coward · 2003-07-28 03:53 · Score: 1, Funny

We'll have a supercharged Babelfish ?

Re:What if the two texts don't match ? by Anonymous Coward · 2003-07-28 04:02 · Score: -1

My hovercraft is full of eels.

Great summary by spectasaurus · 2003-07-28 03:53 · Score: 3, Insightful

You know, it's not really a summary when you just delete half the article.

Article text by Anonymous Coward · 2003-07-28 03:53 · Score: -1, Troll

Romancing The Rosetta Stone

Posted by Hemos on Mon July 28, 17:48
from the cool-story dept.
Roland Piquepaille writes "Not only this news release from the University of Southern California has a fantastic title, it also has a great content. This story is about one of their scientists, Franz Josef Och, whose software ranks very high among translation systems. "Give me enough parallel data, and you can have a translation system for any two languages in a matter of hours," said Dr. Och, paraphrasing Archimedes. His approach relies on two concepts, gathering huge amounts of data, and applying statistical models to this data. It completely ignores grammar rules and dictionaries. "Och's method uses matched bilingual texts, the computer-encoded equivalents of the famous Rosetta Stone inscriptions. Or, rather, gigabytes and gigabytes of Rosetta Stones." Read my summary for more details."

Did I do this right?
--
Karma-whore beginner.

DARPA by BlackHawk-666 · 2003-07-28 03:54 · Score: 2, Insightful

That reference to DARPA has me a little worried about the sort of uses this technology will be put to. I wonder, are the CIA trying to shore up holes in their translation abilities (particularly for Arabic/etc) by using software. What happens when you pair this technology up with the Echelon project? Are we going to see a dramatic rise in the ability of the government to spy on nationals and particularly foreign nationals now?

--
All those moments will be lost in time, like tears in rain.

Re:DARPA by Abcd1234 · 2003-07-28 04:04 · Score: 5, Insightful

Oh please... so many conspiracy theories. You do realize that the *internet* was originally developed by DARPA, right? My point: DARPA does a lot of work... not all of it revolves around spying on or otherwise taking away the rights of American citizens.
Re:DARPA by kmac06 · 2003-07-28 04:14 · Score: 1, Interesting

Kneejerk /. response: its a government conspiracy to take away more of our rights.

Kneejerk /. mod response: he's right.
Re:DARPA by cybercuzco · 2003-07-28 04:18 · Score: 1

Well I dont have to worry, I dont speak arabic OR chinese!

--
Re:DARPA by wwest4 · 2003-07-28 04:30 · Score: 5, Insightful

well, not EVERY bottle of beer at the duff plant has a nose or hitler's head in it, but i'm glad the inspector is tasked to look at every single bottle.

just because government abuse isn't guaranteed doesn't mean we shouldn't vigilantly examine the possibilities when we see them.

it's all boils down to balancing powers of government and freedom of individuals, and this country (USA) was founded upon principles intended to favor the rights of individuals. i'll go out on a limb and make a value statement - that's the way to go. power to the people, man!
Re:DARPA by Anonymous Coward · 2003-07-28 04:48 · Score: 0

Are we going to see a dramatic rise in the ability of the government to spy on nationals and particularly foreign nationals now?

I sure hope so!
Re:DARPA by Detritus · 2003-07-28 04:49 · Score: 1

Just because we're part of the Defense Department, that doesn't mean that we are developing translation technology for national security applications. No Sir! We've been dumping millions of dollars into translation research for decades, just so that we can read all those dirty novels published in foreign countries without having to wait for an English edition. We're addicted to pr0n.

--
Mea navis aericumbens anguillis abundat
Re:DARPA by kcelery · 2003-07-28 05:32 · Score: 1

If pr0n is your motive, millions of dollars in research is probably wasted. Most people simply focus on the graphics.
Re:DARPA by Jeremi · 2003-07-28 05:59 · Score: 1

Kneejerk /. second-order response -- slashdotters are kooks, the US gov't would never do such a thing.

--

I don't care if it's 90,000 hectares. That lake was not my doing.
Re:DARPA by kmac06 · 2003-07-28 06:03 · Score: 1

Kneejerk /. third-order: the second kneejerk was...

nevermind
Re:DARPA by zebs · 2003-07-28 08:56 · Score: 0, Redundant

Kneejerk /. forth-order: OOOOOooowwwwww me knee. I'm getting to old for this :(
Re:DARPA by Platupous · 2003-07-28 18:16 · Score: 1

Although I don't believe there is a "Conspiracy", I do know that there is an initiative in the military to fund this research.

The primary motivation is more efficient communications in battle, as anyone can imagine, having a translator of the enemy's* language instantly available to a foot soldier is invaluable. Think universal translator.

This also helps in spying, obviouslly.
Re:DARPA by bigsmelly · 2003-07-28 21:14 · Score: 1

of course, the translation software will get it wrong, and innocent people will be locked up in cuba!
Re:DARPA by jo42 · 2003-07-29 08:19 · Score: 1

The goat is in the barn, I repeat, the goat is in the barn.

Imagine a beowulf cluster of these... by mjmalone · 2003-07-28 03:55 · Score: 1, Funny

No really... what if it used a shared database and there were hundreds, or thousands, of the systems around the world... Seems like it could become a pretty sophisticated system. And maybe one day it will be available in the form of a small fish which you place in your ear?

--
Visualize the world of wine

Re:Imagine a beowulf cluster of these... by CycleMan · 2003-07-28 05:13 · Score: 1

Or how about a distributed computing solution to language translation. Instead of looking for communications from outer space your PC could crunch texts from Finland!
Finland, Finland, Finland,
The country where I want to be,
Pony trekking or camping,
Or just watching TV.
Finland, Finland, Finland.
It's the country for me.
- Monty Python's Flying Circus

Oh god... by gerf · 2003-07-28 03:55 · Score: 4, Funny

The uber-geeks are going to have a field day with Klingon...

Re:Oh god... by laughing_badger · 2003-07-28 04:04 · Score: 3, Funny

Yay! We can finally finish translating all of Shakespear into English.

--
Help children born unable to swallow - www.tofs.org.uk
Re:Oh god... by daeley · 2003-07-28 04:46 · Score: 1

Yay! We can finally finish translating all of Shakespear into English.

I'm pretty sure you can have your throat slit for saying "Yay!" near a Klingon. Do be careful. ;)

--
I watched C-beams glitter in the dark near the Tannhauser gate.
Re:Oh god... by Jeremi · 2003-07-28 05:43 · Score: 4, Funny

I'm pretty sure you can have your throat slit for saying "Yay!" near a Klingon. Do be careful. ;)

Having your throat slit is nothing compared to what Klingons do to people who put smiley-faces in their text messages...

--

I don't care if it's 90,000 hectares. That lake was not my doing.
Re:Oh god... by daeley · 2003-07-28 05:51 · Score: 4, Funny

Having your throat slit is nothing compared to what Klingons do to people who put smiley-faces in their text messages...

You're telling me! My emoticons used to have noses! Now look:

:(

Such a tragedy.

--
I watched C-beams glitter in the dark near the Tannhauser gate.
Re:Oh god... by Phroggy · 2003-07-28 08:27 · Score: 1

In December 1996 I wrote to the person in charge of the Klingon Bible Translation Project, and received a reply the next day: > I have a question regarding your translation of the Bible into Klingon: > are you translating from an English translation, or from the original > Greek and Hebrew? From the Greek, Hebrew, and Aramaic when we have translators who know these languages. For those who know only English, those of us who do know the languages will check their work against the original. :: Kevin A. Wilson :: :: Department of Near Eastern Studies :: :: The Johns Hopkins University :: (signature edited to get around Slashdot's lameness filter)

--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Re:Oh god... by Dread_ed · 2003-07-29 04:28 · Score: 0, Redundant

)

My EYES!!! DEAR GOD MY EYES!!!

yes mr. filter software I am yelling, you would too if someone poked out your eyes.

--
When the only tool you have is a claw hammer every problem starts to look like the back of someone's skull.
Re:Oh god... by jo42 · 2003-07-29 08:21 · Score: 1

'Shakespear' was written in English. You philistines are speaking (?) Amerikan!

A bit of a worry for privacy by Anonymous Coward · 2003-07-28 03:55 · Score: 1, Interesting

This is a bit of a worry for privacy concerns, given that if I want to keep something secret from the world and private just between me and my intended recipient I have one less option.

How long until this is able to decode things like speech, too, and convert it into something recognisable in another langauge? would it still hold my voice patterns and sound like me? and if it were converted back to the English I already do speak, with mistakes, could that then be used against me in a court of law?

Scary stuff

Re:A bit of a worry for privacy by bigjocker · 2003-07-28 04:17 · Score: 4, Insightful

This is a bit of a worry for privacy concerns, given that if I want to keep something secret from the world and private just between me and my intended recipient I have one less option.

If you are using foreign languages or even lexically analyzable scemes to do your encription, you deserve what you get

--
Life isn't like a box of chocolates. It's more like a jar of jalapenos. What you do today, might burn your ass tomorrow.
Re:A bit of a worry for privacy by WalterDGeranios · 2003-07-28 04:40 · Score: 1

This is a bit of a worry for privacy concerns, given that if I want to keep something secret from the world and private just between me and my intended recipient I have one less option.
How long until this is able to decode things like speech, too, and convert it into something recognisable in another langauge? would it still hold my voice patterns and sound like me?
Scary stuff
Well, I wouldn't worry too much about somebody compiling a huge parallel database of text in a foreign language and your speech. That would require a full-time bilingual transcriber to stalk you with a microphone for several dozen years, in which case you'd have more pressing privacy concerns.
Re:A bit of a worry for privacy by nanojath · 2003-07-28 04:55 · Score: 2, Insightful

It's time for us all to get over the fact that technology is going to end practical privacy. It's a done deal. Cameras and microphones will get smaller and smaller. Translation, electronic selectivity (i.e. snoop anybody transferring bombmaking directions) and tapping of all forms of electronic conversation will get more and more sophisticated. I've no doubt the NSA made PGP its bitch a long time ago. IF they hadn't it would be getting fought a lot harder. Assuming you can get real privacy from something on the scale of the government is just foolish.

I'm not, incidentally, saying just live with it. I'm saying, you can't stop the technology, you have to fight it on the level of policy and practice. Get interested in the work of privacy advocates, work for a consitutional amendment guaranteeing privacy in the same manner as freedom of expression, protest egregious violations of privacy (basically, be against John Ashcroft).

--
It Is the Nature of Information to Transgress Artificial Boundaries

Hello by Anonymous Coward · 2003-07-28 03:56 · Score: -1, Troll

I am a geek from Kabul. We too have rosetta stone for converting Visual Basic into Commodore 64 assembly so we can develop new DivX algorithms and then see more Baywatch. Jon promised to send me an iPod and some special pictures of himself. I never got the iPod.

Junis.

The Law of Eventuality by Speare · 2003-07-28 03:56 · Score: 3, Insightful

"Give me enough" is a key element of the Law of Eventuality. Give me enough money, and I'll solve the Microsoft monopoly threat with a hostile takeover. Give me enough time and I'll clean up almost any unnatural disaster site by leveraging nature's own methods.

Give me enough simulated neurons and enough truisms and I'll make a sentient machine.

Eventually, with enough resources, anything is possible. Throwing more time and resources to a problem is rarely exciting science. Reducing the inconveniently large values of 'eventually' and 'enough' are the real problem.

--
[ .sig file not found ]

Re:The Law of Eventuality by Abcd1234 · 2003-07-28 04:09 · Score: 2, Insightful

Err... how is this interesting or insightful? It's barely related to the discussion! If what you're is referring to is the large corpus of paired texts they inject into the system, you've completely missed the point.

The cool science here is in the advancements in their statistical model and new techniques they've developed for "scoring" translations in order to improve their output. In addition, they've also demonstrated the ability to statistically translate whole phrases effectively, rather than individual words, which can also improve translation quality. The fact that you've missed all this makes me wonder if you actually *read* the press release.
Re:The Law of Eventuality by TopShelf · 2003-07-28 04:13 · Score: 1

OT, I know, but how would a hostile takeover solve the "Microsoft monopoly threat"? Sounds like one giant replacing another...

--
Stop by my site where I write about ERP systems & more
Re:The Law of Eventuality by NDPTAL85 · 2003-07-28 04:30 · Score: 1

Well if one had the money then one could simply buy Microsoft, fire everyone who works there and shut it down.

--
Mac OS X and Windows XP working side by side to fight back the night.
Re:The Law of Eventuality by Anonymous Coward · 2003-07-28 04:30 · Score: 0

Give *me* enough money and you'll be just another poor sucker.
Re:The Law of Eventuality by GoofyBoy · 2003-07-28 04:39 · Score: 1

If you give me every possible combination of statements and its translation then you can get 100% perfect translation from me just by me using a simple lookup algothrim. Thats what I think the original poster meant.

Given enough time/resources/translations, anything can be done with brute force.

--
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
Re:The Law of Eventuality by Abcd1234 · 2003-07-28 04:43 · Score: 2, Funny

And that's not what's being done, which is why there is interesting science going on here, hence the poster not understanding what the press release is actually about.
Re:The Law of Eventuality by spuke4000 · 2003-07-28 04:53 · Score: 2, Interesting

Maybe this is offtopic, but if you want really elegant language processing you should check this out. Basically, you look at the compressiblity of given text and can determine what language it's in, or even what author produced it. This works with as few as 20 words.

I realize this isn't translation, but cool nonetheless. For further reading see here and here.

--
This post cannot be rebroadcast without the express written constent of Major League Baseball.
Re:The Law of Eventuality by nanojath · 2003-07-28 04:59 · Score: 1

Yeah, that's why a chess computer still can't beat a grand master... oh. Wait.

"His approach relies on two concepts, gathering huge amounts of data, and applying statistical models to this data."

This actually reminds me of the story of how Robert Morris built a spell-check with no lexicon at all - it just looked for statistical anomolies. Of course it wasn't perfect, just an interesting exercise.

I'm also strangely reminded of my dad, a retired minsister. In the seminary he found Latin fairly easy but Greek difficult, so he always kept a Greek/Latin New Testament handy as a cheat to help translate from the Greek. Parallel data indeed.

--
It Is the Nature of Information to Transgress Artificial Boundaries
Re:The Law of Eventuality by TopShelf · 2003-07-28 05:43 · Score: 1

Dude, whatever you're smoking, please pass it around. Or do you know somebody with $300 billion+ that they'd like to blow just to make the /. crowd giddy for a day?

--
Stop by my site where I write about ERP systems & more
Re:The Law of Eventuality by Anonymous Coward · 2003-07-28 05:43 · Score: 0

The original poster was referring to the first line of the article, ass.
Re:The Law of Eventuality by maxentius · 2003-07-28 06:54 · Score: 1

Yeah, you're right. The drugs are cheaper.

--
Imagine a Beowulf cluster of neurons.
Re:The Law of Eventuality by t · 2003-07-28 09:38 · Score: 1

"The computer uses this information to tune the parameters of a statistical model of the translation process. During the translation of new text, the system tries to find the English sentence that is the most likely translation of the foreign input sentence, based on these statistical models."
Uh, that is exactly what is being done.
Re:The Law of Eventuality by Igmuth · 2003-07-28 10:56 · Score: 1

Mind you a fortune 500 company dissapearing overnight would most likely be a bad thing economy wise...
Re:The Law of Eventuality by Abcd1234 · 2003-07-28 14:58 · Score: 1

You're telling me this:

If you give me every possible combination of statements and its translation then you can get 100% perfect translation from me just by me using a simple lookup algothrim.

is the same as this:

The computer uses this information to tune the parameters of a statistical model of the translation process. During the translation of new text, the system tries to find the English sentence that is the most likely translation of the foreign input sentence, based on these statistical models.

Please, tell me, how are these the same? The former is nothing more than a dictionary lookup (sort of). The latter is more akin to a neural net... taking a system and training it based on an input dataset. Yes, I *suppose* you could view the statistical model as a fancy lookup algorithm, but it's FAR more complex and interesting than that, hence my claim that there are real, interesting scientific advancements here.

Units of Measure by teamhasnoi · 2003-07-28 03:57 · Score: -1, Funny

How many Rosetta Stones would fill the Library of Congress eched on the head of a pin?

Could help by gerf · 2003-07-28 03:57 · Score: 0, Troll

The translation of that Harry Pooter book quicker, though perhaps not perfectly grammatical or literarilly good. Me, i don't read that fodder, but translation is interesting nevertheless. Was it the Germans looking to translate it before the German-release?

Re:Could help by Abcd1234 · 2003-07-28 04:14 · Score: 4, Interesting

I'm not sure this is really applicable to translating literary works. These kinds of translations require an understanding of the native culture of both the source and target languages, as well as the intent of the writer, in order to generate an understandable translation that the target group can appreciate. A computer translation system like this one is incapable of performing these sorts of analysis.

What this is really good for is on-the-fly translation of material where the reader simply wants to comprehend what was written (think the old babelfish engine). This has obvious applications on the web, as well as many other areas (on-the-fly server-side translation for IM systems, etc, etc).
Re:Could help by gerf · 2003-07-28 04:16 · Score: 1

Hey, i never said it would make a good translation. Just that it could be used.
Re:Could help by Anonymous Coward · 2003-07-28 04:41 · Score: 1, Insightful

In this case I believe the statistical analysis would work. One would just use the 4 other Potter books on the market and their subsequent translations into German. I'm sure JK Rowling writes in a similar style in all the books... so one phrase that means one thing in one book should mean the same in the new one... so for books in a series maybe this should be the first crack? (And then have actual translators correcting what may be wrong?)
Re:Could help by Anonymous Coward · 2003-07-28 05:06 · Score: 0

Smart.

sample by Anonymous Coward · 2003-07-28 03:58 · Score: -1, Troll

goatse -> translator -> man stretching his bum open

Don't Worry... by Anonymous Coward · 2003-07-28 03:59 · Score: -1

Spidey will stop him...

The Magic Eight Ball Says: by Anonymous Coward · 2003-07-28 03:59 · Score: 1, Funny

Am I the only one who thinks that translation is quickly becoming obsolete?

Yes.

The vodka is strong but the meat is rotten by zptdooda · 2003-07-28 03:59 · Score: 5, Interesting

That's an example from a few years' back of an attempt to translate "the spirit is willing but the flesh is weak" from English to Russian and back to English using a different translator.

Can anyone try this on the new (or some other recent) algorithm?

BTW here's Doc Och's most recent website:

Franz Josef Och

--
Esteem isn't a zero sum game

Re:The vodka is strong but the meat is rotten by mjmalone · 2003-07-28 04:08 · Score: 1

translated to russian using systran and back using babelfish I got "spirit is willingly ready but flesh it is weak"

--
Visualize the world of wine
Re:The vodka is strong but the meat is rotten by rossz · 2003-07-28 04:13 · Score: 4, Insightful

That particular phrase translated badly because they used a word-for-word translation program. You simply can't do that, especially when dealing with euphenisms. This new system is the only possible way that could properly translate text.

My wife is a professional translator and has absolutely no respect for machine translatations.

--
-- Will program for bandwidth
Re:The vodka is strong but the meat is rotten by Abcd1234 · 2003-07-28 04:16 · Score: 2, Insightful

Heh, given this is a not-uncommon phrase in the English language, it very well may be in their English-to-target-language corpus, meaning it could end up being a straight lookup-and-translate operation. Which is, of course, one of the advantages of a system like this (you can translation common idioms without having to analyze the text itself).
Re:The vodka is strong but the meat is rotten by Alton_Brown · 2003-07-28 04:19 · Score: 1, Interesting

With all due respect, does your wife have no respect because they currently stink compared to a human or because she'll be out of a job when they're sufficiently accurate?

Who thought computers would grow up and play chess so well? Who thought they'd be building cars? Certain jobs will go to machines, but jobs will stil be there in a re-defined state. If DARPA has an interest in the technology, it's only a matter or time before the system approaches the accuracy level of a human. After all, on the translation side language is largely a logic problem. It's on the conversational side that you actually need AI.
Re:The vodka is strong but the meat is rotten by bogado · 2003-07-28 04:42 · Score: 2, Insightful

I doubt computers will ever get near a good translator, shure it can make some people lose their jobs translating math thesis, but a book, play, movies or even conversation have to use humans. Humans are the only thing that can realy understand what is going on, human translator (good ones) knows about the culture of both countries that it is translating. It can understand the subtext and change the words so they have the same subtext in the other language.

A good book has many things to be learned that are not written in words.

--
[]'s Victor Bogado da Silva Lins
^[:wq
Re:The vodka is strong but the meat is rotten by Anonymous Coward · 2003-07-28 04:43 · Score: 0

That is the same damn translator, dumbass. They are both fucking systran.
Re:The vodka is strong but the meat is rotten by iastor · 2003-07-28 05:08 · Score: 3, Funny

Let's see what google has to say:

English: The spirit is willing but the flesh is weak.

German: Der Geist ist bereit, aber das Fleisch ist schwach.
back: The spirit is ready, but the flesh is weak.

French: L'esprit est disposé mais la chair est faible.
back: The spirit is laid out but the flesh is weak.

Italian: Lo spirito è disposto ma la carne è debole.
back: The spirit is arranged but the meat is weak person.

Portugese: O espírito é disposto mas a carne é fraca.
back: The spirit is made use but the meat is weak.

All I can say is this spirit person needs a better pimp!
Re:The vodka is strong but the meat is rotten by aastanna · 2003-07-28 05:15 · Score: 0, Redundant

I thought the whole point of this algorithm was that if you give it enough data, eventually it would see that phrase and remember it, giving a perfect translation.

The only problem with ignoring grammer and syntax and just getting massive amounts of data would likely be storage space and training/translation time.
Re:The vodka is strong but the meat is rotten by gwernol · 2003-07-28 05:18 · Score: 1

Another (in)famous machine mis-translation that I was taught in my Introduction to Natural Language Processing course was:

Hydraulic Ram -> Water Sheep

Ahem.

--
Sailing over the event horizon
Re:The vodka is strong but the meat is rotten by capnjack41 · 2003-07-28 05:27 · Score: 1

Computers are pretty good at chess but the game sort of lends itself to being a bunch of numbers to crunch, which of course computers handle just fine. Language doesn't really quite have the one-to-oneness with a mathematical model. So you can call the Moviephone, speak your movie name (loudly, several times) and get listings, but you won't (for a very very long time) have a machine that can accurately translate.
(offtopic) Kasparov complained about playing with Big Blue or whatever it is now, because it just doesn't have that human "feel" -- it doesn't get nervous and fuck up, or squirm or sweat.
Re:The vodka is strong but the meat is rotten by gotak · 2003-07-28 05:29 · Score: 1

The german one makes sense. German gramma is more similar to English then any of the other languages i have attempted to learn.
Re:The vodka is strong but the meat is rotten by micromoog · 2003-07-28 05:32 · Score: 1

The problem is hard, not impossible. Eventually machines will be as good as humans, then better. A machine could have all of this cross-cultural knowledge you talk about, not just a subset from one person's experience.
re: the vodka is strong but the meat is rotten by ed.han · 2003-07-28 05:33 · Score: 1

a good example of why translation is an art, not a science.

as the parent notes, there are many expressions in many languages which will get mangled using word for word translation. but doesn't this mean that there should be a table of expressions and other items not to translate literally?

and here's a fun little detail: although you can find bilingual attorneys when dealing in international law, do not try to get most translation services to certify (in a legally-binding sense) that their translation of a contract is identical for all intents and purposes in both languages. i used to run into this a lot a few years ago when i used to do contract work.

in response to a different reply to the same parent: in order to produce these massive quantities of parallel texts in multiple languages, someone(s) will continue to have to translate a certain baseline amount of material as usage and new expressions enter usage.

ed
Re:The vodka is strong but the meat is rotten by Fratz · 2003-07-28 05:53 · Score: 2, Interesting

My wife is a professional translator and has absolutely no respect for machine translatations.

Most of them suck, but I worked on a system that was actually quite good. It was designed for technical documentation in the heavy equipment domain, and because of this limited use, we were able to constrain the input grammar and vocabulary, which made it easier to make very good translations.

We worked with some of the best human translators around to make it as accurate and natural-sounding as possible, but we made the mistake of allowing the human translators at our customer's company to evaluate the system. They felt threatened by it and decided they didn't like it, even when they had to criticize sentences the system generated which were word-for-word what they asked us to make the system do.

--
-- Fratz, human
Re:The vodka is strong but the meat is rotten by Wizard+of+OS · 2003-07-28 05:55 · Score: 1

'shure' ?? What language did you translate that from? :-)

--

--
If code was hard to write, it should be hard to read
Re:The vodka is strong but the meat is rotten by rossz · 2003-07-28 06:20 · Score: 3, Insightful

Because they suck, of course. She uses computers to assist her. It's just a tool. Just as you can't expect a wrench to rebuild your transmission, you can't (currently) expect a computer to create a proper translation. That will change in the future (as this article shows).

Currently, computer translations work the best in technical documents and the worse in prose (stinking turd horribly bad quality translations).

BTW, computer translations has never been any kind of competition for work. These days, competition is from untrained college students in Central Europe. All too often a Romanian student who "knows Hungarian" bids a couple of pennies per word, far under the going rate and far too little for my wife to consider as reasonable pay. The resultant translation sucks, but that's to be expected from someone who not only isn't trained as a translator, but also doesn't not have a good command of either languages in question (Hungarian and English).

Oops, I started ranting.

--
-- Will program for bandwidth
Re:The vodka is strong but the meat is rotten by bogado · 2003-07-28 06:58 · Score: 1

Then you would be talking about a machine that is sentient. :-/

--
[]'s Victor Bogado da Silva Lins
^[:wq
Re:The vodka is strong but the meat is rotten by bogado · 2003-07-28 07:04 · Score: 1

From typo-english for sure. :-)

--
[]'s Victor Bogado da Silva Lins
^[:wq
Re: the vodka is strong but the meat is rotten by yotto · 2003-07-28 07:58 · Score: 1

as the parent notes, there are many expressions in many languages which will get mangled using word for word translation. but doesn't this mean that there should be a table of expressions and other items not to translate literally?

The problem with this is the way (At least in English) these idioms grow and change. Take "He's not the sharpest knife in the drawer." We all (well, most of us) either know the reference, or can guess what it means quickly. Knowing it, we instantly grasp similar terms like "He's not a sharp knife" or even "He's not too sharp" (Different etymology, I know, but stick with me here)
We recognize all this, but still also understand if someone asks us to find "The sharpest knife in the drawer" in their kitchen, without even thinking about the other reference (Or perhaps only to make some kind of joke, "I found the sharpest knife in the drawer. I must be the sharpest knife in the drawer!")
I'm not saying a computer will never be able to do this. We can, and we're just organic computers, but I don't think a computer can do it /now/ or even /soon/.

--
Pulp Audio Weekly - Geek News and Reviews
Re:The vodka is strong but the meat is rotten by zaphod_es · 2003-07-28 08:02 · Score: 1

My wife is a professional translator and has absolutely no respect for machine translatations.

If she is expecting a professional translation she is right. On the other hand there are many reasons why people do use such programs.

I know several local residents who do not speak much Spanish and routinely scan, OCR and translate into English all "Official looking" mail. This will usually identify messages threatening to cut off the power supply, tax demands, this week's special offer from the bank and the Mayor pandering to the masses.

Sure, it would be nice to have a perfect text every time. In the mean time they are very happy to be able to get the gist of the message and not spend so much wasted time with lawyers, agents and translators.
Re:The vodka is strong but the meat is rotten by Lars+Arvestad · 2003-07-28 08:20 · Score: 1

These days, competition is from untrained college students in Central Europe. All too often a Romanian student who "knows Hungarian" bids a couple of pennies per word, far under the going rate and far too little for my wife to consider as reasonable pay. The resultant translation sucks, but that's to be expected from someone who not only isn't trained as a translator, but also doesn't not have a good command of either languages in question (Hungarian and English).
But this is why your wife eventually will face competition from computers. Computer translations may suck, but if it is good enough (for instance, comparable to the untrained foreign students), then there will be applications for them.
I don't think computers will replace human translators anytime soon, but a lot of routine translations could become useful very soon. Furthermore, a human translator could probably find a half decent translation to be a good starting point. Why do all that typing by hand, when some thoughtful editing is all that is necessary?

--
Reality or nothing.
Re:The vodka is strong but the meat is rotten by Anonymous Coward · 2003-07-28 09:31 · Score: 0

You must some new grammer learn.
Re:The vodka is strong but the meat is rotten by cpeterso · 2003-07-28 09:32 · Score: 1

good example. How would/should a machine translator deal with the "word" shure?

--
cpeterso
Re:The vodka is strong but the meat is rotten by Anonymous Coward · 2003-07-28 10:02 · Score: 0

That's a pokey quote.
Re:The vodka is strong but the meat is rotten by JJ · 2003-07-28 10:31 · Score: 4, Interesting

This actually is a myth. That particular text and translation was taken as anecdotal in a 1964 report. I did a masters thesis on MT at the University of Chicago and my advisor (once a major figure in MT) refused to approve my thesis until I got that statement correct.

--
So long and thanks for all the fish . . . !!!
Re:The vodka is strong but the meat is rotten by owlstead · 2003-07-28 10:32 · Score: 1

Maybe so, but most people will not be able to pay for your wife's translation. Or maybe your wife would be willing to translate my web pages? At 4 o'clock in the morning?

Since expressions change, and since there is slang, and since language is in some respects an art, your wife's job will be safe for the moment. But for texts that do not need to be exact, a computer would do fine. First of all, we need language to be understood - the necessity of for instance interpunction or good speling or fluent sentences etc comes second

But we are building a huge tower of babel at this moment (it's called the world, and it is indeed far from a tower literally speaking). Some additional translations won't hurt.

It will take some time until everybody speaks English sufficiently.
Re:The vodka is strong but the meat is rotten by BZ · 2003-07-28 10:59 · Score: 1

> shure it can make some people lose their jobs
> translating math thesis,

I have to ask. Have you ever read a real math thesis, much less translated one? Trust me, computers are even worse at this than at translating Slashdot comments.
Re:The vodka is strong but the meat is rotten by vbdutch · 2003-07-28 17:33 · Score: 1

An urban legend. Check out this. Here's the quote:
The "spirit is willing" story is a bit amusing, and it really is a pity that it is not true. However, like most MT 'howlers' it is a fabrication. In fact, for the most part, they were in circulation long before any MT system could have produced them (variants of the 'spirit is willing' example can be found in American press as early as 1956, but sadly, there does not seem to have been an MT system in America which could translate from English into Russian until much more recently - for sound strategic reasons, work in the USA had concentrated on the translation of Russian into English, not the other way round).
By the way, the rest of the site (warning: doesn't work in Mozilla), especially the project stuff is interesting as well.
Re:The vodka is strong but the meat is rotten by poincare · 2003-07-28 18:11 · Score: 1

I ran the phrase through a current translating system (Prompt), and a round trip returned the result: "The spirit wishes, but the flesh is weak."
This implies that either MT has improved drastically in the past few years, or was never that bad in the first place.
Re:The vodka is strong but the meat is rotten by bogado · 2003-07-29 04:23 · Score: 1

I happen to have read math books and actually am a bacharelor in math. Math texts usualy have a few words like 'therefore', 'then', 'and', 'or' a lot of special notation (that can be translated directly with the use of a specialised dictionary) and a lot of math simbols and equations that do not require any translation. So my guess is that they are usualy easier to translate.

--
[]'s Victor Bogado da Silva Lins
^[:wq
Re:The vodka is strong but the meat is rotten by HiThere · 2003-07-31 06:38 · Score: 1

That argument could be extended to claim that translation is impossible. And, it is.

You cannot do a perfect translation. Not even from one generation to another among speakers of the same language. But you can get close.

What you probably mean to assert is that people will always be able to translate peoples affective states than machines will. And that may be correct. Certainly for the near future. But most translation doesn't need to be that .. refined. Even Bablefish is widely useful, and it ought to be possible to do better than that.

--

I think we've pushed this "anyone can grow up to be president" thing too far.
Re:The vodka is strong but the meat is rotten by HiThere · 2003-07-31 06:42 · Score: 1

No, they do a worse job of translating poetry than of translating prose. Usually. There are special cases.

One case where prose would be difficult for a machine to do a decent translation of is Finnegan's Wake. (OTOH, has a person ever done a decent translation of that opus?)

But usually prose is basically (at the surface layer) straightforward, but poetry, even at the surface layer, is highly laden with indirect symbolic references.

--

I think we've pushed this "anyone can grow up to be president" thing too far.
Re:The vodka is strong but the meat is rotten by Haeleth · 2003-07-31 22:06 · Score: 1

I believe you mean "Finnegans Wake". No apostrophe.

I'm not sure it's possible to translate it, either. Translation kind of implies that the source text is in a recognisable human language. But I've heard of a Japanese version that's supposedly quite good.

Finally, the correct approach by tuxlove · 2003-07-28 03:59 · Score: 4, Interesting

I believe that using a statistical approach like this is a step in the right direction. Manually building sets of rules, dictionaries, etc., is a waste of time and hard to do. And manuall-built systems become stale as languages evolve, unless a lot of continuing work is done.

For me the holy grail is when I can converse with a computer meaningfully. I believe a similar approach will be required for the computer to "understand" language, and to be able to formulate a coherent and appropriate response.

Re:Finally, the correct approach by jemfinch · 2003-07-28 06:03 · Score: 1

I believe that using a statistical approach like this is a step in the right direction.

A step in the right direction for translation, perhaps, but not for understanding.

For me the holy grail is when I can converse with a computer meaningfully. I believe a similar approach will be required for the computer to "understand" language, and to be able to formulate a coherent and appropriate response.

Do you really want you computer using a statistical approach to trying to understand what you're telling it to rm?

Jeremy

--
Looking for a Python IRC bot?
Re:Finally, the correct approach by tuxlove · 2003-07-28 06:38 · Score: 1

Do you really want you computer using a statistical approach to trying to understand what you're telling it to rm?

Sure, why not? If I tell you what files I want to remove, it's not deterministic that you'll understand and delete the correct files. But I'd probably trust you to do it anyway.

I do not believe that computers learning behaviors based on statistics will yield results any less problematic than computers now yield. I just think the problems will be different in nature.
Re:Finally, the correct approach by Christ-on-a-bike · 2003-07-28 06:51 · Score: 1

I believe a similar approach will be required for the computer to "understand" language

I don't see how. There's no computer language in the sense that would correspond to a natural (human) language. So how do you set up the required Rosetta data set?
Re:Finally, the correct approach by tuxlove · 2003-07-28 07:45 · Score: 1

There's no computer language in the sense that would correspond to a natural (human) language.

True enough. I didn't say I thought this particular model could be applied. But the general approach seems like a step in the right direction. I think a computer being able to truly interpret and act upon a human language is a long way off. If and when it happens, it won't be due to people typing in a billion possibilities, but rather through some "learning" system.
Re:Finally, the correct approach by Steeltoe · 2003-07-28 08:02 · Score: 1

Me: "rm my .inputrc file in my homedir"

Computer: "Okay, rm'ing .Xsession file. Error: File not found"

Me: "No, damnit! That's the file I erased two days ago! Grrrr..."

Nope, don't think statistical approach alone will do it. But it CAN help in concert with a decision-tree generating machine, or something similar.

--
http://www.debunkingskeptics.com/
Re:Finally, the correct approach by jbarr · 2003-07-28 09:00 · Score: 1

...For me the holy grail is when I can converse with a computer meaningfully...
Hell, I'd be happy just to be able to converse with my wife meaningfully!

--
My mom always said, "Jim, you're 1 in a million." Given the current population, there are 7000 of me. God help us all!
Re:Finally, the correct approach by Anonymous Coward · 2003-07-28 09:11 · Score: 0

Hell, I'd be happy just to be able to converse with my wife meaningfully!

I think making computers conversant is an easier problem to solve.

Doc Och? by securitas · 2003-07-28 04:01 · Score: 1

Isn't the Doc supposed to be in the next Spiderman movie?

Re:Doc Och? by wickedj · 2003-07-28 05:31 · Score: 1

Here's a better picture.

Am I the only one who thought Star Trek? by Alkarismi · 2003-07-28 04:01 · Score: 1

Universal translator anyone?

Er, aging geek embarrasing self again, mutter...

Re:Am I the only one who thought Star Trek? by ReelOddeeo · 2003-07-28 07:42 · Score: 1

The description of how his translator works is not all that different from the description that the Star Trek The Next Generation Technical Manual gives for the universal translator.

The ST:TNG tech manual says something about having to build up a "translation matrix" between the two languages. The UT is able to build this up by itself based on examples from the two languages. (It's been years since the read the ST:TNG Tech Manual. But this is about how I remember it.)

Yet another example of science fiction predicting technological fact.

--

Those who would give up liberty in exchange for security and DRM should switch to Microsoft Palladium!

Was this article translated? by Alton_Brown · 2003-07-28 04:02 · Score: 3, Funny

From the article: his software scored highest among 23 Arabic- and Chinese-to-English translatio systems

Oops - guess we need some more parallel data (or a few more gigs of rosetta stones).

Re:Was this article translated? by Anonymous Coward · 2003-07-28 04:07 · Score: -1

For the love of god shut up you piece of shit. I can't fucking stand people like you.
Re:Was this article translated? by fliplap · 2003-07-28 07:31 · Score: 1

Actually, the translation is correct. Translatio is mearly the drag queen version of fellatio

Related Independent article last week by Anonymous Coward · 2003-07-28 04:02 · Score: 1, Interesting

The battle for the Rosetta Stone "Things are looking decidedly rocky at the British Museum - Egypt's leading archaeologist has demanded the return of the Rosetta Stone. But the museum argues that the removal of the four-foot slab that unlocked the mysteries of the pharaohs would be disastrous"

Less is more by HarmlessScenery · 2003-07-28 04:02 · Score: 1

"Read my summary for more details."

I'd rather have less detail in a summary - thanks :)

Damn Babelfish! by Zog+The+Undeniable · 2003-07-28 04:03 · Score: 5, Funny

"Most the bay only of news of the college of southern extremity California it knows an all big contents all there is this emission annular subject, it also there is a RolandPiquepaille and it writes. The Franz taxes where his software height one lyel with lines up between the translation system quite phu the Och and this history are the summary thing their scientist. The Och "it gave the data which is parallel is sufficient in me, it spread out," inside questioning the hour 2 specialties the language which it does not do of the multi Archimedes which is the possibility which there will be a hazard translation system the doctor repulsively it talked. It approach collects the sheep which data is enormous, apply the statistical model in this data a foundation in 2 concepts which it puts. It is complete and the wool of rule lu the dictionary of grammar "the m3ethode of the Och the duplex language original and the Rosetta which agree one equivalent with computer password of noble and wise pebble epitaph adopts. Or, rather, the gigaoctets and pebble gigaoctets of the Rosetta." Detail fact compared to read the hazard my synopsis.

English --> French --> English --> Korean --> English. Of course, it helps that the first sentence is munged anyway ;-)

--
When I am king, you will be first against the wall.

Re:Damn Babelfish! by almightyjustin · 2003-07-31 09:05 · Score: 1

Well, I've found that the Babelfish Korean->English translation is really fucked compared to the other language pairs, so including it as a step is a bit misleading. There's way too many identically pronounced and written words in Korean with different meanings.

--
Omnes arx vestrum sunt adiuncta nobis.
Re:Damn Babelfish! by Genda · 2003-07-31 14:54 · Score: 1

I thought we were talking about translators... not poetry generators?

Genda Bendte

- And just what does it mean by; "It approach collects the sheep which data enormous..." you'll have every able bodied sheep farmer in Scotland looking to see if his data is enormous!!!

Integration by slusich · 2003-07-28 04:03 · Score: 3, Interesting

Sounds like a brilliant idea. Hopefully this is something that could eventually be compacted enough to fit into consumer electronics. It would be great to be able to watch TV from every country without any language barrier!

--

DeviantArt Page

NSFW

Re:Integration by ahfoo · 2003-07-28 04:39 · Score: 3, Interesting

Not to sound arrogant, but I find actually learning another language by watching foreign TV with subtitles in the original language to be even more interesting than watching the dubbed or English subtitled version. It involves commitment to get to the point where you can understand the basics, but there are rewards to making a commitment to learn something new.
I like the idea of translating sentence by sentence as opposed to grammatically and word for word. I'm sure this guy is right that at some point this will produce reasonably acurate translations in many cases, but multiple languages are one of our greatest treasures.
I have read that the single most important factor in preventing senile dementia is the difference between those who continue to create novel memories throughout their lives and those who stick to what they have already learned. Learning multiple languages is a wonderful thing and once you get well into it, it is a lot of fun. It certainly increases your options for punning and rhyming and you end up with lots of aliases.
Re:Integration by duffbeer703 · 2003-07-28 06:15 · Score: 1

You really have far too much time on your hands.

--
Conformity is the jailer of freedom and enemy of growth. -JFK
Re:Integration by Anonymous Coward · 2003-07-28 07:17 · Score: 0

guess who just patented this idea for instant messaging....
Re:Integration by Anonymous Coward · 2003-07-28 07:30 · Score: 0

I have ink on my hands. Time is relative.

Old Texts by holygoat · 2003-07-28 04:04 · Score: 5, Insightful

Firstly we could consider the enormous body of work currently available in other languages.
Having this able to be translated into English or other languages could be very valuable for scholars.

Secondly, English is not the primary tongue for the majority of people on the planet - to suggest that because a lot of people can manage to converse in it that the ability to translate between other languages isn't valuable is foolish.

Also note that the article specifically mentions Arabic and Chinese, which I don't think crossed your mind. China has the largest population on the planet, remember.

Translation is far from obsolete, especially given that the majority of the Western world, and especially America, is piss poor at being bilingual.

Re:Old Texts by Anonymous Coward · 2003-07-28 04:07 · Score: 0

If we're piss poor at 'being bilingual', its because we don't HAVE to.
Re:Old Texts by OmniVector · 2003-07-28 04:35 · Score: 2, Insightful

A friend of mine, Hani, who is from Egypt told me a joke once.
"What do you call a person that only speaks one language?" A: An american

It's quite true when you think about it. He said in when he was growing up he had a choice between going to a french school or an english school where the given language was tought just as much as arabic. Americans really need to be tought french or spanish at a MUCH younger age (say 5 right as they start kindergarden).

--
- tristan
Re:Old Texts by benoitg · 2003-07-28 04:45 · Score: 1

No, you're piss poor at 'being bilingual' because you don't have enough culture to understand there is value and knowledge in other places than your bellybuton.
Re:Old Texts by Anonymous Coward · 2003-07-28 04:58 · Score: 0

Americans really need to be tought french or spanish at a MUCH younger age (say 5 right as they start kindergarden).

I think that's the general trend these days (in the US) - start the kids learning a second language very early. When I was in school, you could only start French or Spanish in 7th grade, and then in 9th grade, my high school (a public school in the suburbs) also offered German and Latin. I agree that starting earlier would have been much more useful.

I know several elementary schools in the area where I used to live know have Spanish, Japanese, and other languages required.

I took French, and have mostly found it useless. Looking back, I would have been much better off with Spanish - I hear it every day, but the only people speaking French in New York are the tourists.
Re:Old Texts by Acidic_Diarrhea · 2003-07-28 04:58 · Score: 0

"Americans really need to be tought (sic) french or spanish "
Why? Just making a statement that something needs to be done, doesn't make it so.

--
I hate liberals. If you are a liberal, do not reply.
Re:Old Texts by lafiel · 2003-07-28 05:00 · Score: 1

I don't see why that is true in anyway. Most schools try to make their students to become bilingual, simply because having another language at your disposal is quite an advantage. Why would anyone not want to be able to speak with another race in their own tounge?

One of the things that impresses me is when a businessman can speak to me in my own (non-english) language. It's like they're one of us, not just a businessman, but a friend.

In this world where globalization has changed the way economy used to function, not knowing another language is like cutting off an arm in the business.

Simply put, your ignorance is costing you money. And if not that, then the chance at getting 'down' with foreign women. Is that something you're proud of?

More languages, more choices of women. I rest my case, if the money argument didn't reach you yet.
re: old texts by ed.han · 2003-07-28 06:26 · Score: 1

good joke, that.

and too true that second languages should be taught earlier. some schools in the US do teach other languages in elementary/primary school, but by and large, these are private schools. i live in new jersey and have never yet heard of a public school that does.

heaven knows that enough studies were done illustrating the greater capacity younger children have learning languages than older children.

ed
Re:Old Texts by Pres.+Ronald+Reagan · 2003-07-28 06:46 · Score: -1

How about this: no one gives a shit about having "enough culture (?)" to know about places where the government babies everyone throughout life at the expense of the creative and hard-working.

--

Abortion is advocated only by persons who have themselves been born.
--Ronald Reagan
Re:Old Texts by Potor · 2003-07-28 08:51 · Score: 1

Firstly we could consider the enormous body of work currently available in other languages. Having this able to be translated into English or other languages could be very valuable for scholars.

Nope. Scholars do not need translations: the only people who truly contribute to a field do so in the primary language(s) of the text.
Re:Old Texts by Potor · 2003-07-28 08:54 · Score: 1

Why would anyone not want to be able to speak with another race in their own tounge?

You are right to some extent, but do you think that the Italians and Greeks are different races?
Re:Old Texts by lakmiseiru · 2003-07-28 10:43 · Score: 2, Interesting

I'm forced to disagree. Although reading texts in their primary languages is certainly valuable, I severely doubt every single scholar who studies ancient Mesopotamia is fluent in reading cuneiform script! Also, asking scholars to be fluent in one or two dead languages is quite a lot (according to my sister, who's a medieval scholar and speaks Latin and Medieval French)- would you have them be fluent in every single language they encounter? That's unrealistic, as well as inefficient.

Although it's certainly true that many scholars can read the primary languages of the periods they study, some do not. For example, if one were studying Culture A through the medium of Culture B's records of interactions with Culture A, one would not need to read primary sources from Culture A.

It's true that many scholars do prefer to rely on personal translations of primary sources, but for many it's a simple waste of time that could be better spent. Instead of arguing that all scholars must be able to read all primary sources of the cultures they study, I would argue that they should be able to analyze the translations of others (perhaps even the translations this system produces) with regards to the culture. If 20,000 scholars all translate a primary source and their translations are all relatively accurate (errors will be corrected in time), then 19,999 of them have wasted weeks or months.

Yes. Scholars do need translations - they help verify the scholar's own translations, provide much-needed resources, give insight into the translator's view of the culture - in short, they are a resource too valuable to put aside.

--

Access denied: Not enough clue for requested operation.
Re:Old Texts by Potor · 2003-07-28 11:44 · Score: 1

i can agree only with your last paragraph. i myself am a scholar, and i would never dream of publishing on a text that i cannot read in the original. and i certainly would not read anyone who would. cheers, p.

I expect they used many Bible versions by Adam+Rightmann · 2003-07-28 04:04 · Score: -1, Troll

The article doesn't mention it, but when you need parallel texts written in many different languages, the Bible is very convenient.

Regardless of one feeling's about the Revealed Word of God, as a linguistic resource it's amazing. Look up the story of Q to see how linguistic theologians compared different versions of the Gospels, in different languages, to see what they think Jesus actually said, and what was paraphrased, added. Of course, the snake handling heretics of the Protestant Church believe every word is sacrosanct.

Today, the biggest leader in translating Bibles into other languages is the Church of the Latter Day Saints, I guess when Mormon Elders take two or three wives, the younger academic men have lots of time to learn strange languages.

--
A. Rightmann

Re:I expect they used many Bible versions by pdxmac · 2003-07-28 04:25 · Score: 1

Wow.

You can simulatneously pimp the Christian bible, lay some smack down on those Christians you don't agree with, and disprove your own point. Impressive.

Seriously, if the many translations have altered the emphasis of the reading, or even the words of the most important speaker, then does it really qualify as parallel data? Heck, I'm only familiar with English-language bibles, and only somewhat at that, and the differences are significant. Imagine the differences (i.e., non-parallel-isms) when Bibles go from Greek to Latin to English to other languages.

(Yes, yes, I'm an American, so I am assuming that everyone works through English. But, I did mod up the joke about people speaking one language as being funny - d'oh I just killed my mod. And, I'm taking the salient part of the LDS comment as true, which would imply some English-language bias at least.)
Re:I expect they used many Bible versions by ejdmoo · 2003-07-28 04:28 · Score: 3, Insightful

Actually, I think that this may be an interesting way to translate the Bible (assuming you didn't use the Bible itself as a reference...that would skew the translation).

Think about it: every translation of the Bible is always criticized for some reason. If the Bible were translated this way it could be like the Google news of Bible translations: completely independent of human bias and editing.
Re:I expect they used many Bible versions by Shamashmuddamiq · 2003-07-28 04:38 · Score: 1

...though the Church of Latter Day Saints translates the Bible according to Joseph Smith. This is an incorrect version of the less-than-perfect-than-most-would-like-to-admit translation of the King James Bible, which Joseph selectively modified to fit into his "fruitcake framework".
This isn't the same Bible that has been exhaustively studied and pored over by scholars over the centuries.

--
...just my 2 gil.
Re:I expect they used many Bible versions by Anonymous Coward · 2003-07-28 04:41 · Score: 0

"Revealed Word of God"

You mean the revealed word of the Jewish Religion, combined with the ramblings of four men who were losing their fame and fortune since they couldn't leech off Jesus anymore?

Oh, don't forget the revealed word of the Assyrians, Babylonians, Macedonians..

The Bible is really good, up until the New Testament. From there on, it looks like a sloppy kernel hack.
Re:I expect they used many Bible versions by Anonymous Coward · 2003-07-28 04:50 · Score: 0

Mormons aren't Christians...

They have severly differing beliefs at the core level, so while they *started* with that stuff, Christ doesn't hold the same position that he does in Christian religions.
Re:I expect they used many Bible versions by Anonymous Coward · 2003-07-28 04:53 · Score: 0

Today, the biggest leader in translating Bibles into other languages is the Church of the Latter Day Saints, I guess when Mormon Elders take two or three wives, the younger academic men have lots of time to learn strange languages.

That's funny....last time I checked, the majority of mormons in utah aren't polygamist. I grew up there, I should know! And no, I'm not mormon. Polygamy was made illegal when Utah was applying for statehood (Utah History to Go). And as far as I know, you can be excommunicated from the Mormon church for practicing polygamy.
As far as their missionaries (Elders) even having a wife, that is absurd. They don't get married until they return from their mission. Only the older missionaries (60+) are married, and both husband and wife go...
Don't propagate lies.
Re:I expect they used many Bible versions by amorsen · 2003-07-28 04:58 · Score: 1

Good luck getting a many-gigabyte database of Old Greek and Hebrew texts with translations. And better luck getting such a database that does not include the Bible in its contents.

--
Finally! A year of moderation! Ready for 2019?
re: i expect they used many bible versions by ed.han · 2003-07-28 06:32 · Score: 1

um, they have 'em. they're called parallel bibles, are available on CD-ROM and the most complete ones include both the original aramaic & latin texts along with hebrew and other translations to boot, like this one: http://www.powerbible.com/

ed
Re: i expect they used many bible versions by Marco+Rossi · 2003-07-28 06:57 · Score: 1

You mean that the Hebrew text wasn't the original?? Oh my GOD!!!!!!!!!!

--
- Marco
re: i expect they used many bible versions by amorsen · 2003-07-28 09:40 · Score: 1

My point was that for this program to work you need a whole bunch of text translated between the two languages. This bunch of text must not already contain the new stuff you want translated. So try to find a lot of Aramaic and Old Greek translated to something else, without using anything from the Bible. The article says you need gigabytes.

--
Finally! A year of moderation! Ready for 2019?
Re:I expect they used many Bible versions by Zaak · 2003-07-31 14:56 · Score: 1

...though the Church of Latter Day Saints translates the Bible according to Joseph Smith.

That is incorrect. The Church of Jesus Christ of Latter Day Saints uses the King James Version of the Bible in English, and in foreign languages we use a commonly used version of the Bible in that language (such as the Reina Valera in Spanish).

TTFN

TOTA by Anonymous Coward · 2003-07-28 04:04 · Score: 0

Romancing the Rosetta Stone
'Give me enough parallel data, and you can have a translation system in hours'
University of Southern California computer scientist Franz Josef Och echoed one of the most famous boasts in the history of engineering after his software scored highest among 23 Arabic- and Chinese-to-English translatio systems, commercial and experimental, tested in in recently concluded Department of Commerce trials.

"Give me a place to stand on, and I will move the world," said the great Greek scientist Archimedes, after providing a mathematical explanation for the lever.

"Give me enough parallel data, and you can have a translation system for any two languages in a matter of hours," said Dr. Och, a computer scientist in the USC School of Engineering's Information Sciences Institute.

Och spoke after the 2003 Benchmark Tests for machine translation carried out in May and June of this year by the U.S. Commerce Department's National Institute of Standards and Technology.

Och's translations proved best in the 2003 head-to-head tests against 7 Arabic systems (5 research and 2 commercial-off-the-shelf products) and 14 Chinese systems (9 research and 5 off-the-shelf). In the previous, 2002 evaluations they had proved similarly superior.

The researcher discussed his methods at a GNAA post-mortem workshop on the benchmarking held July 22-23 at Johns Hopkins University in Baltimore, Maryland.

Och is a standout exponent of a newer method of using computers to translate one language into another that has become more successful in recent years as the ability of computers to handle large bodies of information has grown, and the volume of text and matched translations in digital form has exploded, on (for example) multilingual newspaper or government web sites.

Och's method uses matched bilingual texts, the computer-encoded equivalents of the famous Rosetta Stone inscriptions. Or, rather, gigabytes and gigabytes of Rosetta Stones.

"Our approach uses statistical models to find the most likely translation for a given input," Och explained

"It is quite different from the older, symbolic approaches to machine translation used in most existing commercial systems, which try to encode the grammar and the lexicon of a foreign language in a computer program that analyzes the grammatical structure of the foreign text, and then produces English based on hard rules," he continued.

"Instead of telling the computer how to translate, we let it figure it out by itself. First, we feed the system it with a parallel corpus, that is, a collection of texts in the foreign language and their translations into English.

"The computer uses this information to tune the parameters of a statistical model of the translation process. During the translation of new text, the system tries to find the English sentence that is the most likely translation of the foreign input sentence, based on these statistical models."

This method ignores, or rather rolls over, explicit grammatical rules and even traditional dictionary lists of vocabulary in favor of letting the computer itself find matchup patterns between a given Chinese or Arabic (or any other language) texts and English translations.

Such abilities have grown, as computers have improved, by enabling them to move from using individual words as the basic unit to using groups of words -- phrases.

Different human translators' versions of the same text will often vary considerably. Another key improvement has been the use of multiple English human translations to allow the computer to more freely and widely check its rendering by a scoring system.

This not coincidentally allows researchers to quantitatively measure improvement in translation on a sensitive and useful scale.

The original work along these lines dates back to the late 1980s and early 1990s and was done by Peter F. Brown and his colleagues at IBM's Watson Research Center.

Much of the improvement and expansion

no big deal by bongoras · 2003-07-28 04:06 · Score: 1

Star Trek's had a universal translator for years...

Dialects? by dethl · 2003-07-28 04:06 · Score: 2, Interesting

How can this system compensate for the different dialects of all of the different languages?

--
"Some fight for law. Some fight for justice. What will you fight for? One day, you will see."

That's not insightful by Anonymous Coward · 2003-07-28 04:09 · Score: 0

That's lazy pseudo-philosophical intellectual masturbation, and, depending on how you choose to interpret it, is either wrong or obvious.

"Give me enough parallel data, and you can have a translation system for any two languages" has never been a true statement before.

Well, so? by k98sven · 2003-07-28 04:09 · Score: 3, Funny

What is the novelty of this?

It's hardly news that you can always find correlations in two sufficiently large sets of data.

Reminds me of the Steve Martin joke:

"Chicks go for the intellectual types. I figured the best way to impress 'em was to read a lot of books. But hey, do you know how many books there are? Why, there must be, hundreds of them. But I was already a pretty smart guy. I didn't waste my time reading all those books. Heck no.
I read, the dictionary. Hey--I figure it's got all the other books in it."

Oh, please no... by Noryungi · 2003-07-28 04:11 · Score: 1

Another IT masters thinks he can invent a perfect translation system, simply based on 0s and 1s.

I have said it before, on /. and elsewhere, machine translation does not work.

A good translation is based on several non-quantifiable parameters:

Context.
Grammar.
Vocabulary.
Nuance.

Example:

"My controller has failed. He is going to be replaced" can mean:

My HDD controller is dead. I need to replace it, so that my computer can access its hard disks (For the slashdot crowd).
The financial controller of my company has failed in his/her duty. I need to fire this idiot before the SEC realizes the mess the finances of my company has become (CEO/PHB/Enron crowd).

OK, maybe the above example is not perfect, but you get my drift... Machine Translation? Bah! Humbug.

That was my "machine translation" rant/flamefest of the month. Carry On.

--
The right to offend is far more important than the right not to be offended. (Rowan Atkinson)

Re:Oh, please no... by Anonymous Coward · 2003-07-28 04:22 · Score: 0

Using "He" if you meant a hard drive controller? Yucky. Despite what most people on here think, the HD Controller is an "it".
Re:Oh, please no... by pdxmac · 2003-07-28 04:28 · Score: 1

"My controller has failed. He is going to be replaced" can mean:

* My HDD controller is dead. I need to replace it, so that my computer can access its hard disks (For the slashdot crowd).

Truly disturbing that us /.ers would assign gender to our hard disk controller. Truly. Disturbing.
Re:Oh, please no... by radish · 2003-07-28 04:43 · Score: 3, Insightful

You're right, traditional machine translation is difficult, primarily due to context. However, you're also right that the example you gave is a bad one - in english it only has one meaning (the second one you give). A HDD controller would never have an assigned gender. Of course in German for example, it would (not sure which though - neuter?).

However you're missing what I think is the most important point. If an example is so ambiguous as to confuse an "ideal" machine, it would confuse us too. What you're really saying is "it is possible to write sentences with ambiguous meaning in most languages" - which is of course true. That doesn't however make it impossible to create a machine which is at least as good as a human at translating (and wouldn't that be good enough?). When you read something you interpret it according to a set of learned rules. Obviously there's the basic syntax and vocab, but then you add context like the other clauses in the prose, the identity of the author, the subject matter. We're a long way off getting those concepts into a machine reader, but I would be very hesitant to say we'll never get there.

Besides, the artical is about taking a different approach to the problem - one which should be quite happy with ambiguity. They're looking at essentially pattern matching, so provided your sample data sets include enough info to describe the ambiguity it should have a decent enough chance of working it out.

--
---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"
Re:Oh, please no... by Anonymous Coward · 2003-07-28 04:44 · Score: 0

3. I was playing Xenosaga, and ogling KOS-MOS a bit too hard. My poor controller named Bob, I must go to K-Mart and buy a new one.

Don't forget all the variations on the actual sentence, too.

"I broke my controller, now I need a new one."
"My controller's befukt, gotta go grab another."*
"That controller doesn't work, I must buy a replacement."

*Befukt and gotta aren't proper English, of course, but the nation doesn't speak proper English, now does it?

In short, I agree. Machine translation is the CS equivalent of cold fusion. It ain't happening.
Re:Oh, please no... by zenyu · 2003-07-28 06:02 · Score: 1

"My controller has failed. He is going to be replaced" can mean:

You are right, I doubt there will be a system to translate a Nabokov novel before we have machines that think and reason and hope and doubt. But I think you are ignoring the huge utility of even simple glosses like babblefish. You can read one of those and get a good idea of what the writer meant. A gloss and a small understanding of the culture gives you about as much understanding as you would have after a year of studying the language. This system is better than a gloss with grammatical rules because it is easier to construct and it takes whole sentences into account so another posters "hydrolic ram" would not get translated to "water sheep". But mostly because it is easier to construct, wouldn't it be nice to have even a Babelfish type translator for Quechua and Finnish and Icelandic instead of just the usual suspects?

These can also be an aid to people that speak both languages, but aren't translators. Many multiple language speakers think in whatever language they are reading or speaking at the moment, this is not good for translation. But if they could read a passage in one language for the meaning, and then fix machine translation adding back ambiguities and poetry with the machine translation as a memory aid this would be good. For me looking at the original every couple lines would cause an unwanted context switch. I personally don't quite understand how translators can do it. I get so confused going back and forth between languages that I find myself reading the captions on an American movie when visiting family overseas even though I could understand the spoken English just fine.
Re:Oh, please no... by Cobralisk · 2003-07-28 12:02 · Score: 1

That doesn't however make it impossible to create a machine which is at least as good as a human at translating...

I believe you are referring to the Turing test, or at least something close to it. Do that, and you have created (arguably) true artificial intelligence. I for one welcome our... ah screw it.

--
Waiting for ad.doubleclick.net...
Re:Oh, please no... by radish · 2003-07-28 20:59 · Score: 1

Well it's the first part of the turing test. The test requires a machine which can take a natural language phrase, interpret it, and then form a response. That of course requires a very complete conceptual understanding of the input - the machine needs to know what the sentence means. It then has to figure out what to reply - in my (utterly inexpert!) opinion, it's the reply which is the really hard part. Besides, it's perfectly possible that the best way of translating doesn't require actually understanding the input at all - the method described in the article doesn't rely on interpretation, but rather straight substitution.

--
---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

Statistical approach looks promising by TwistedGreen · 2003-07-28 04:11 · Score: 4, Insightful

"One of the great advantages of the statistical approach," Och explained, "is that most of the work goes into components that are language-independent. As long as you give me enough parallel data to train the system on, you can have a new system in a matter of days, if not hours."

This statistical method is probably the best approach to computerized translation. It seems to approximate how the human mind will translate a give sentence most efficiently. Language can get awfully complex, and individual words often have, at best, an ambiguous meaning when interpreted alone. One must take into account the context of that word to specify and refine its meaning. This obviously leads to a huge number of permutations to represent a huge variety of thoughts, but the relative size of this number is diminishing as computers become more powerful.

Therefore, instead of playing with messy grammars and sentence structures, we can simply have a catalogue of thoughts as represented by words, and correlate that catalogue with a different set of words to facilitate translation. This software would operate on a deeper level than it would if it operated with the words and symbols themselves. It would utilize a map of the deep structures of language, instead of a map of the less-meaningful words and grammars.

I really like this method, and while it may seem like a brute-force hack applied to translation, the simple fact that languages do not contain elegant patterns must be accepted. It also appears to be a most efficient method, as the simple comparisons involved would bring the speed of translation into realtime.

Re:Statistical approach looks promising by Anonymous Coward · 2003-07-28 04:47 · Score: 0

language is the building block of thought.

A less useful (?) but perhaps more interesting application would be to compare the mappings of one language to another to look for interesting locations. In this manner one could generate statements such as:

"japanese speak/think more about the aesthetics of weather and season than do americans."

One could look for statements (and thus thoughts) that exist in one language but not another. In this manner people could be exposed to new ideas, and thus their minds expanded.

I am interested in how language effects thought.
Re:Statistical approach looks promising by Suidae · 2003-07-28 05:05 · Score: 1

language is the building block of thought.

Its easy to think that (no pun intended), but consider that small children function before they learn language. Also there is the case of that autistic woman who thinks entirely in pictures (or at least that is the best way for us to think of it), in order to speak to us, she must first translate her thoughts into words.

We like to link thought and language very closely since most people tend to subvocalize, talking to themselves in their head, but that is certainly not the only way to think.

This is a well covered philosophical area if you care to look up some references.
Re:Statistical approach looks promising by rodentia · 2003-07-28 05:08 · Score: 1

I agree. In fact, I began to wonder about the basis for Chomsky's universal grammar as I read this. I always have had a problem with a biologically encoded grammar. It seems to me that any UG could only really be constituted by a handful of conceptual rubrics like object, action, etc. Really a matter of epistemology rather than linguistics. This research would seem to point to a way out of the UG box. There is other research about the neurological basis for language that lends itself to this conclusion. The leap to language would seem to be constituted not by the adoption of a formalism, however biologically determined, but by an advance in pattern recognition, at which big, parallel systems like the brain excel.

Far from a hack, a system like this accommodates the dynamism of language: idiomatics and tropes far more gracefully than grammar chopping code.

--
illegitimii non ingravare
Re:Statistical approach looks promising by Anonymous Coward · 2003-07-28 05:25 · Score: 0

It seems to me that any UG could only really be constituted by a handful of conceptual rubrics like object, action, etc.

Why? Because you say so? Because biology can't encode complex mechanisms? Hmm?

Sounds like you're favoring this idea because it seems to match your pre-formed opinions. On closer inspection, there's not really any linguistic problem here, considering the original data set has to come from human translators. Suggesting that human beings process language by comparing every utterance we hear with every utterance we have every heard and then performing ranking has been postulated (years ago) and is not really supported by data. Language does have structure, or at least we act like it does. Whether we understand the structure well enough to use it in machine systems is what is at issue. Right now, the statistical method seems better.
Re:Statistical approach looks promising by rodentia · 2003-07-28 06:43 · Score: 1

No. Because there are clear problems with the structure of language. Because, in fact, we do not act as though language has structure, we act as though we impose structure on language as an ordering act, an imposition which is highly contingent. All this method does is minimize the impact of that contingency upon the success of machine translation. My point is that it raises some interesting questions about a problem that contemporary linguistics would like to have put away.

There is always a linguistic problem. That academic linguistics is at such pains to make itself a science, either empirical or statistical, does not make the problem of language go away. This is why philosophy has had to take up the slack.

Language is both structured and uncentered. The communicative act partakes of a formal stucture and a generic function. Not Either/Or, Both/And. The problem with UG is not its biological determinism, but its reductio.

--
illegitimii non ingravare
Re:Statistical approach looks promising by JJ · 2003-07-28 10:36 · Score: 1

Small children do function, albeit minimally, before they acquire language, however, when they are adults, they don't consciously remember this pre-language period.

--
So long and thanks for all the fish . . . !!!
Re:Statistical approach looks promising by Suidae · 2003-07-28 16:30 · Score: 1

I seem to recall reading something about that. Something like the memories are there, but difficult to access because of how heavily we depend on language to aid recall. I'd have to look it up.

When will this stuff reach consumer level? by Anonymous Coward · 2003-07-28 04:13 · Score: 0

I want to be able to play all those crazy Japanese games that come out, but I dont understand the jibberish picture doodles they pass off as a language.

But! by Anonymous Coward · 2003-07-28 04:13 · Score: -1, Troll

What does it mean for the fish?

Re:But! by Anonymous Coward · 2003-07-28 06:57 · Score: 0

That needs no translation, the goat is understood universally.

translatio? by Lady+Jazzica · 2003-07-28 04:14 · Score: 2, Funny

University of Southern California computer scientist Franz Josef Och echoed one of the most famous boasts in the history of engineering after his software scored highest among 23 Arabic- and Chinese-to-English translatio systems, commercial and experimental, tested in in recently concluded Department of Commerce trials.

Maybe what Dr. Och should do next is write some software to double-check the work of whoever translates his press releases from the original Latin. The translator seems to have missed a few words here and there.

Re:translatio? by panda · 2003-07-28 04:33 · Score: 1

He needs a translatio studii.

(For the medievalists.;-)

--
Just be sure to wear the gold uniform when you beam down -- you know what happens when you wear the red one.

unlike you by DrSkwid · 2003-07-28 04:14 · Score: 1

English : correctly forming sentences in it I can.

--
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter

Copyright issues by PhilHibbs · 2003-07-28 04:14 · Score: 1, Insightful

I wonder if the resultant translation engine could be considered a derivative work of the texts that populated it. This system is standing on the shoulders of all the translation efforts that went in to it. I think it's a great idea, but in the current IP climate, could well be shot down in flames. How much dual-language text is available in the PD or on open content licence?

Re:Copyright issues by blahedo · 2003-07-28 08:39 · Score: 1

IP is definitely considered in these things. While you could probably obtain an illicit copy of most of the text corpora out there just to play around with it, if you ever intend to publish your research, you need to "buy" the corpus, which gives you the right to use it to build translation systems, parsers, question-answerers, or whatever else.
Typically, academic licences are a lot cheaper than commercial ones, although the base price can vary all over the place. The Canadian Hansards (parliamentary proceedings, in French and English---a major corpus used in statistical machine translation work, including Och's) will run you $5k; the ECI Multilingual Corpus 1 is about $35. Usually, corpora are made available through either the Linguistic Data Consortium in Philadelphia or the Evaluations and Language resources Distribution Agency in Paris, although some of the free corpora are distributed elsewhere, typically from the website of the research lab that developed it.
Two major costs go into the creation of corpora: content and markup. The former is often responsible for the majority of the cost, as LDC or ELDA negotiate with the copyright holder for a redistribution licence, although the markup costs can be significant for more a elabourately-annotated corpus, such as a treebank (which contains parse structure and more for all the sentences in it). However, assuming you can get enough free content, or negotiate for a free licence to the content, there's no theoretical reason there couldn't be an open corpus repository....

--
``This, too, shall pass.'' ---Eastern proverb
Re:Copyright issues by Jedi+Alec · 2003-07-28 11:46 · Score: 1

when you develop a car, do you get IP issues over the books on mechanics you studied during your education? In this case I'd say the texts are study material...

--

People replying to my sig annoy me. That's why I change it all the time.
Re:Copyright issues by PhilHibbs · 2003-07-28 20:54 · Score: 1

In this case I'd say the texts are study material...
I think that's wishful thinking. There's a difference between learning from a book that was written for people to learn from, and taking a dual-language text and building a translation engine based on the linguistic correlations. The translator contains a large proportion of the text, copied directly from it.

"The vodka is strong, but the meat is rotten" by quantum+bit · 2003-07-28 04:14 · Score: 5, Funny

You know, that actually does sound like something that would be a Russian aphorism...

Re:"The vodka is strong, but the meat is rotten" by Enonu · 2003-07-28 07:24 · Score: 1

In Arabic, I think you can say something along the lines of "my meat is sour" to mean that you're pissed off.

A poor analogy, and a poor method by jd · 2003-07-28 04:15 · Score: 3, Informative

The Rosetta stone encoded three languages, not two, where two were known in advance. Indeed, there have been many three-way translations of treaties found, now.

The use of three languages is critical. Grammar isn't consistant, and words have multiple meanings. By using two known languages, you can eliminate many of the errors thus introduced, because the chances of some error fitting both known languages in the same way is much smaller.

If you double the number of known languages, you more than quarter the number of errors, because although errors can occur in either or both, they're unlikely to be the same error. Once more information exists, you can re-scan the same text and fill in the blanks.

Me, personally - I'd require four languages, three of which were known. The number of texts required would be considerably smaller and the number of residual errors would be practically non-existant.

They chose two languages for the obvious reason: It's simple. It's easy to find a student who knows two languages. At least, easier than finding one who knows four.

However, the price of simplicity is bad science. The volume of information they require makes their system little better than an infinite number of very smart monkeys with text editors and a grep function. That they're being paid signficant money on such stuff is a joke.

If they offered me the same money (and one of those Linux NetworX clusters) I could have a superior system in a month, although (as stated above) it would require more than one known language.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Re:A poor analogy, and a poor method by femto · 2003-07-28 04:51 · Score: 1

Don't forget this is version one. Surely future versions will be able to take in millions of works written in hundreds of languages, simultaneously? Tell it which work is a translation of which (or let it figure it out for itself) and it will then be able to translate from any one of the languages to another. In the translation process, it won't just take into consideration the relationships between the two languages being translated, but the relationships between all of the languages fed in.
For example, if translating fron langauge A to B, useful information might also be gleaned by considering influences from the paths A->C->B, A->D->B, A->C->E->B and so on, where C, D and E are also languages.
The closest analogy I can think of is a device called a 'multiuser detector' from information theory.
Re:A poor analogy, and a poor method by Abcd1234 · 2003-07-28 04:51 · Score: 5, Insightful

If they offered me the same money (and one of those Linux NetworX clusters) I could have a superior system in a month, although (as stated above) it would require more than one known language.

LOL! If this problem was so friggin' easy, why are these researchers the first to demonstrate a working system using this technique (which blows away all existing systems, BTW)? Hell, if it's as easy as you say, this whole "translating text" thing must be a breeze. I wonder why so much money is spent every year on R&D in this area? Hell, why didn't they just hire you to whip up a system in a month?

Why? Because it ain't that easy and you have no idea what you're talking about. Given these are world-class researchers, I'm sure they've considered the multiple-translation route, and subsequently rejected it for very good reasons (likely far more complex than your simplistic "it's easier" excuse). Moreover, the really hard work in this area is the statistical modelling necessary to generate a working system, something which would, I suspect, be far more complex if a multiple-translation route were taken. But, hey, that's just some number crunching, right? What's so hard about that?
Re:A poor analogy, and a poor method by William+Tanksley · 2003-07-28 04:59 · Score: 4, Insightful

If you double the number of known languages, you more than quarter the number of errors

Your post is reasonable and interesting (using three-way parallelism would give better translations), but you're missing something important here.

First, none of these languages are "known" to this interpreter program. The program reads parallel texts, and when you feed it a text without a parallel, it generates the parallel for you. In other words, it can translate either way. So you don't have two known languages and one unknown; all you have is three text corpuses. (Well, in this case you have two, but you know what I mean.)

Second, yes; three would be FAR better than two; but two is also useful, and in more situations. You don't always have a Rosetta stone.

They're doing well here. Yes, there's an obvious next step to take; but no, the existance of a "next step" doesn't destroy the usefulness of this step.

-Billy
Re:A poor analogy, and a poor method by John+Harrison · 2003-07-28 05:35 · Score: 1

Why your discussion of treaties is interesting, applying those thoughts to machine translation seems misguided. If you have a system that translates from Hindi to English what is the use in training it on Spanish as well? Using texts written in all three languages would have little benefit, since the most accurate translations would come from Hindi->English matching.
If you had a large body of texts in Hindi and Spanish and another large body of texts in English and Spanish and finally, if you lacked a sufficient number of texts in both English in Hindi then a system that translated Hindi->Spanish->English could be useful. Another situation is if the text you wanted to translate were availible in both Hindi and Spanish but not English. However, the error reducing traits that the three language treaty have are not present in this situation.
If the goal is a universal translator then there would be a point, but that wasn't the stated goal of the project.

--
Lasers Controlled Games!
Re:A poor analogy, and a poor method by micromoog · 2003-07-28 05:37 · Score: 1

If they offered me the same money (and one of those Linux NetworX clusters) I could have a superior system in a month, although (as stated above) it would require more than one known language.
Bitching on Slashdot about it is not likely to get your ideas implemented. Contacting the researchers and asking to get involved may (assuming you're in some way qualified and not just totally full of shit, that is).
Re:A poor analogy, and a poor method by Draxinusom · 2003-07-28 05:54 · Score: 2, Insightful

RTFA. The method described in the article is a purely statistical method, NOT a semantic one; it has zero "knowledge" of grammar, syntax, or meaning. So having more than one "known" language to start with would not help in the slightest, because the advantages that you describe are only applicable to semantic methods.

I agree though that the analogy to the Rosetta Stone is a poor one.
Re:A poor analogy, and a poor method by gmarceau · 2003-07-28 06:18 · Score: 1

The NSF funds this kind of research (assuming you are in the States). In Canada, Nserc does. If you can build a better system, write it up in a grant application, and they will give you money. It is as simple (and as hard) as that.
From the article: The original work along these lines dates back to the late 1980s and early 1990s and was done by Peter F. Brown and his colleagues at IBM's Watson Research Center.
IBM's pioneering work was written up in a student-friendly workbook available online. Feel free to try coding it and see how well you do. Do remember though, the state of the art has progressed a lot since IBM's work. This workbook only covers the basics.
You will find that debugging statistical translation system is really hard. You can write test cases, but they take one hour to run each time. You can look at the result of your test cases, but since you cannot work the answer out by hand, you can never by sure if the numbers you are computing are correct. As an example of how tricky it can get, in Brown university's cs241 last fall, amongts the four teams, only two teams managed to correctly implement Model-3, and the workbook goes up to Model-5.
There are two reason why a three way translation is a bad idea. First, it is already difficult to find large amounts of text translated two-way and available in digital format. Restricting your approach to three-way translated text would reduce the amount of text you could train on so much, it would offset the advantage you would get from the three-way text.
Second, training for statistical translation is really expensive. If running one single test case can take an hour, running a full training can take a whole week. Under these conditions, you are always very careful how you spend your cpu cycles. Until better cpus come along, training three-way and cross referencing each language with the other could well take a month of processing (or two).

--
This post was compiled with `% gec -O`. email me if you need the sources
Re:A poor analogy, and a poor method by Anonymous Coward · 2003-07-28 07:05 · Score: 0

"If this problem was so friggin' easy, why are these researchers the first to demonstrate a working system using this technique"

Because few other people are allowed to mess with this and make it economically viable. Copyright law protects translated works, even in part, and fair use only applies to certain groups, one of which is researchers. Researchers can get grants. A commercial effort would have to gain legal permission from various copyright holders; this alone is the bottleneck and pretty much stops all reasonable efforts (unless you want to break the law, which I think is stupid in this case).

A regular person is not allowed to accumulate massive amounts of copyright text and enter them into a system, then sell it commercially, without permission from the copyright holders. You need a massive amount of information, which you can overcome by a lot of hard work. However, if a copyright holder or publisher says "No", you're done. Getting 7 parallel texts of a work isn't that hard, but getting 7 copyright notices and contracts for nonexclusive use of that work is damn near impossible.

Put another way, there is no mandatory licensing.

As you can see, overcoming the legal obstacles is a totally different matter, particularly if you are pushing for a commercial system. Even DARPA, by the letter of the law, isn't allowed to use this system actively for their own uses; even the US government has to obey copyright law in this situation. But there is enough leeway for the research to be done. Just don't see it viable anytime soon, unless that somehow found a stash of public domain or liberally licensed works. Even Gutenberg doesn't have this.
Re:A poor analogy, and a poor method by Abcd1234 · 2003-07-28 07:28 · Score: 1

Yes, but that doesn't change the fact that this is a *hard* problem. Research, by researchers at non-commercial institutions, in the area of text translation has been going on for *years*. Heck, its one of the many areas investigated by researchers interested in AI (well, specifically, language specialists), which has been a rather hot topic in the past. Hence, it's not as if this is untrodden territory. Which brings me back to my comment:

"If this problem was so friggin' easy, why are these researchers the first to demonstrate a working system using this technique"

Why? Because it's hard, and they've made breakthroughs. Yeah, sure, it might have gone faster if commercial interests could have gotten involved (although I would question that logic... privatization doesn't not guarantee quicker results), but that doesn't change the fact that this is a hard problem, and certainly not one that could be solved in a month by some amateur who happens to know a little about the Rosetta stone.
Re:A poor analogy, and a poor method by pz · 2003-07-28 07:45 · Score: 1

Although the research reported on in the article sounds like a substantial step forward, I recall reading about a similar corpus-based system to translate between French and English. This would have been in the late 1980s or very early 1990s. The key, for the research I'm vaguely remembering, was that by Canadian law, all legislation had to be published in both languages. Voila, a HUGE hand-translated corpus that, because it was law, needed to be accurate. I was, and remain, impressed by the idea. Anyone else know more?

--

Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
Re:A poor analogy, and a poor method by blahedo · 2003-07-28 08:09 · Score: 1

Yup, that's right---the Brown et al paper (and a lot of the work since then) was on translating between French and English; these were chosen because the Canadian parliamentary proceedings (known as the Canadian Hansards---"hansard" being a general term for parliamentary proceedings within the British Commonwealth) are in both languages. Oddly enough, the corpus isn't as gargantuan as you might imagine; in order to be useful to these systems, it needs to be *at least* sentence-aligned (i.e. "this English sentence corresponds to this French sentence") and preferably word-aligned as well, but of course the originals aren't, and this needs to be done by hand.

What they are is expensive. You can order them from the Linguistic Data Consortium, but it'll run you a cool $5K---and that's if you're doing academic, non-commercial research. The nice thing about Natural Language Processing, though, is that that's the *only* real expense beyond a medium-powerful computer; no special hardware required. :)

--
``This, too, shall pass.'' ---Eastern proverb
Re:A poor analogy, and a poor method by koi88 · 2003-07-28 08:43 · Score: 1

"They chose two languages for the obvious reason: It's simple. It's easy to find a student who knows two languages. At least, easier than finding one who knows four."

The can't see a reason why three languages are necessary. Some translations are bad, even wrong, but the sheer amount of data the system checks eliminates most errors.
It learns like a human child-- if a child hears somebody making a mistake, it still doesn't necessarily make the same mistake. As long as the majority of translations is correct, errors don't matter.
However, if there are translation errors that are more frequent than the correct translation, the system will think that this ist correct.

Just like children do.

--

I don't need a signature.
Re:A poor analogy, and a poor method by Anonymous Coward · 2003-07-28 08:56 · Score: 0

I believe you will find that a Google search for "parallel corpus" will lead
you to a pleasant surprise, and I promise that that's just the tip of the
iceberg. There are gigabytes of freely-available parallel text to be had, in
tens if not hundreds of languages.

If you want corpora parallel in more than two languages, it's obviously more
limited, but you can still find millions of sentences from the UN and the
European union. And don't forget the bible, available in a huge number of
languages, and with multiple translations into many of them.

On top of this, remember that the copyright law has to do with redistributing
works. Even if all of the available data were under copyright, there is
nothing illegal about downloading it to your own hard drive and doing whatever
computation you want. If you then want to distribute you machine translation
system, the only remnants of the copyrighted work are in the internal tables of
the system (translation probabilities or what have you). These data are
clearly not under copyright. (Compare the clear legal precedent that compiling
a list of the words used in a copyrighted work does not infringe on the
copyright.)
Re:A poor analogy, and a poor method by bahamat · 2003-07-28 09:27 · Score: 1

Maybe I'll take a lot of flak for saying this, but...

The Bible is the most widely translated textual work, and Bible translators are very meticulous that as much of the origonal meaning and intent is conveyed in each translation.

Just taking the number of languages Zondervan has translated the Bible into, should give a pretty consistent translation matrix, especially using this method. I'd be very interested to see how well this program would work at being able to translate any literary work into any language once it's been trained with every lingual translation of the Bible.

Of course, the real test would be to take a particular Bible translation and convert into various langages and back to see how closely it matches to the origonal. Then for some real fun, we'll directly translate the Greek and Hebrew versions to whatever language and see how it compares to the "official" translation for that language.
Re:A poor analogy, and a poor method by jd · 2003-07-29 00:52 · Score: 1

The bible is a classic example of the method I was discussing. Biblical scholars use Aramaic and Greek texts from a number of sources, in order to determine the most probable meaning of uncertain segments.

A glance through the International version, which includes a quick summary of the alternative possible translations, shows that direct 1-1 translation isn't always possible, due to linguistic ambiguity, and that even multi-sourced translations can be extremely difficult to get right.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:A poor analogy, and a poor method by jd · 2003-07-29 01:27 · Score: 1

Instead of using a single high-power CPU, it is generally more cost-effective to use clusters of lower-power CPUs.

You don't want to "debug" translation systems, because you want to use them on unknown texts. Since part of the input is an unknown, it is impossible to determine if the analysis is correct.

A far more practical approach is to use a self-organizing system. Certain classes of neural net fall into this category, which is good because neural nets are very parallelizable.

Another parallelizable aspect is that text is usually broken down into discrete units. In most modern languages, this unit is the sentance. By parallelizing at the sentance level, it doesn't matter if you have one page or a million.

The pre-process time is going to be ((number of sentances) * (time per sentance))/(number of processors). This gives you a rough idea of what information is present. You then need to correlate the information in a given sentance with that of the other sentances, in order to identify how that information is connected. This connectivity is all that the statistical analysis really looks at. There's no magic involved.

Once you've pre-processed the data and produced a simple entity-relationship model, you only have to identify one of those entities in order to infer the identity of all of them. This is where the other languages come in.

Mapping identical texts as ER structures in several languages allows you to identify not just one but many points in the system. You can then verify your mapping by seeing if those points correctly map in the unknown language. If they do, the model is correct, and you can then deduce the translation of all other entities.

If they don't, you need to re-parse the unknown language for alternative ER solutions to the regions which are incorrectly mapped. If there's a workable alternative, then that's the one that should be used. If there isn't, then the system has to mark that region as unmappable on available data.

Once the mapping of identified entities is complete and verified, the rest is easy. You have the relationships in both the unknown and the multiple known sources. Because ER is independent of grammar, it is merely representitive of the sum total of the information present, regardless of where and regardless of what attributes (adverbs, adjectives, etc) are used. Thus, relationships should be very similar to identical in all languages. How something is written is eliminated from the picture.

Because of this property, you simply need to match up relationships, with preference going to the most exact match. This tells you what the unknown entities must be.

Through this process, we can map out the complete ER diagram, although we don't have the "attributes" for each entity. To get those, we must re-parse the texts, this time looking for properties not yet identified, and what they repeatedly group with.

This is the only statistical part of this method. By examining the confidence with which you can relate an unknown with a known, you can determine the best-fit for those unknowns. These become the "attributes" of the entities.

By doing this repeatedly with many texts (it has to be statistically meaningful), you can improve the confidence of your identification of entities, relationships and attributes.

In the same way as a single 2-way analysis of variance is superior to performing many 1-way analysis of variance tests and combining the result, by parallelizing this entire task you essentially merge many statistical tests (each with their own probability of error) into a single test with a much lower probability of error.

The multiple knowns gives you ample opportunity to perform sanity checks on your results, to back-out erronious conclusions, and to verify your results.

All in all, we're not talking about weeks, or even days. A grid computer should be able to munch on a 3-way or 5-way mapping & statistics problem in minutes or hours, because any given node only has a very tiny chunk of data to process.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

It hasn't done anything hard yet by Felonius+Thunk · 2003-07-28 04:15 · Score: 1

Translating from well-known languages such as Hindi or Arabic are all well and good, but they're already pretty easy to translate (the rules are well-known, translations are easy to check). This may still be a good way to do such translations faster, but it won't help you with: new or obscure languages (not enough translated data to feed into it), quality translation (no mention of the results, so I assume no one's going to be relying on this for publishing or journalism), or very pragmatically dissimilar languages (the rules of conversation rather than grammar). It's a good use of number-crunching, but would you want, say, your wedding vows, free lance article, or software specs done this way?

You don't get it, do you? by mossr · 2003-07-28 04:16 · Score: 2, Interesting

***WHAT THE FUCK ARE YOU THINKING?***

Look, seriously, even if everyone did speak English, there are still tonnes of literary works in other languages - the original texts of the Ancient Greek classics, for example. To read in the original language is often a much more rewarding experience. Besiders, relying on past translations of non-english material can lead to errors. And consider how many different English translations of the Bible there are.

Almost everyone can speak, read and write at least tolerable english

Almost everyone can communicate using gestures, facial expressions and grunts, but is that any reason to use that as our primary communication method? I mean, to really stretch a metaphor from human languages to programming languages, we can write any computer program "tolerably" in assembler (it's Turing-complete), but that doesn't mean it's the best way to do it. If I can only speak one language "tolerably", but another exceptionally well, which one is better for conveying my ideas?

most young people can have full fledged discussions in it

I don't think we can rely on "d00d, u r so l33t" to teach people true literacy. Young people are increasingly using SMS and online chat and are actually losing their ability to correctly spell words or write grammatically correct sentences. The number of young adults I see who cannot distinguish correctly between there, their and they're is ABSOLUTELY TERRIBLE. Literacy is a major problem in English-speaking nations.

Just look at Slashdot, I'm quite sure I'm not the only one who doesn't have english as primary language

that doesn't mean you can use it well. Take a good look at slashdot - many, many people mangle the English language. The American people are probably the biggest infringers here... :)

It's not that farfetched idea that in the (near) future everyone uses or at least knows english well enough to make translations meaningless

Human languages don't map to each other 1:1. Some languages have words that basically cannot be translated without a serious loss of accuracy. (I guess you could ssay that no human language is Turing-Complete, in that it can't totally express every conceivable human thought). Having everything translated to english is NOT a solution. Brevity, language tricks (such as puns, rhyming, etc) cannot always be substituted across languages.

If it wasn't 2:15am in Melbourne right now, I'd try to order my thoughts and express them more clearly, but after 4 hours of Java debugging I'm off to get some sleep before uni tomorrow. Goodnight.

--
The PowerPC includes for this purpose two instructions called SYNC and EIEIO.

Re:You don't get it, do you? by Shads · 2003-07-28 04:25 · Score: 1

> Young people are increasingly using SMS
> and online chat and are actually losing
> their ability to correctly spell words
> or write grammatically correct sentences.

This is called language evolution. It's always frowned on and it always happens in the end d00d. There are some factors slowing it down right now, most specifically a lack of teritorial conquest with assimilation of populace.

--
Shadus
Re:You don't get it, do you? by technothrasher · 2003-07-28 04:41 · Score: 2, Interesting

I don't think we can rely on "d00d, u r so l33t" to teach people true literacy. Young people are increasingly using SMS and online chat and are actually losing their ability to correctly spell words or write grammatically correct sentences. The number of young adults I see who cannot distinguish correctly between there, their and they're is ABSOLUTELY TERRIBLE. Literacy is a major problem in English-speaking nations.

Get off your high horse already. Unless you use English like that below, then (by your rules) your grasp of English is also "ABSOLUTELY TERRIBLE":

Hwæt! Ær issum dæge seofon wintra and hundeahtig, ure ealdfaederas acennodon on issum lande niw rice, geacnod on freodome and gegiefen to æm geohte, æt ealle menn beoð gelice gesceapen.

(Hint: Language is an evolving tool for communication, not a political weapon to keep the ruling elite in power)
Re:You don't get it, do you? by Warbeck · 2003-07-28 05:45 · Score: 1

I think that should be "ond" not "and". Let's keep the language pure. And did Lincoln say "What!" at the beginning of the Address ?
Re:You don't get it, do you? by brwski · 2003-07-28 07:16 · Score: 1
technothrasher wrote:

Get off your high horse already. Unless you use English like that below, then (by your rules) your grasp of English is also "ABSOLUTELY TERRIBLE":

Er...yeah. Methinks you missed the point of the post to which you responded. "l33t" speech:
- belongs to a small subculture of a subculture;
- is used for either quick communications (shorter phrases, etc.) or obfuscation for the uninitiated;
- does not lend itself well to documents longer than a few sentences;
- is not used as the preferred form of language for newspapers, magazines, or their electronic equivalents.
Knowing and using a dialect or a slangset of a language is not a bad thing. In China, for example, there are dialects that vary considerably from village to village, even if they are only five miles apart. They like their hometown way of speaking, as that is what they are comfortable with and it is a perfectly fine means of communication. That, however, does not mean that those who use those dialects do not know and use the commonly-agreed-upon official "dialect" (known as putonghua). To know only one's home dialect is to be left out of the loop and isolated from the rest of society.

The same is true for native English speakers. All it takes is one generation of functional illiterates for a good number of citizens to be left in the dust economically and otherwise. Students who due knot no how two distinguish between homonyms and students who cannot spell properly will not be hired by the people who do know how to distinguish between homonyms and by those who do know how to spell properly, no matter the l33t skillz of the l33tsp33krz.

Good language skills pay off all across the board: they give access to books that might otherwise be impenetrable; they demonstrate that you are more than a wildling; they also remove barriers between yourself and those who read what you write. Often poor grammar or spelling gives the reader reason to toss what they're reading to the side and pick something else up. If you want to be read by those who are not just from your chosen subgroup, give grammar a chance.

brwski
--
brwski
"Because without beer, things do not seem to go as well''
Re:You don't get it, do you? by technothrasher · 2003-07-28 07:56 · Score: 1

Methinks you missed the point of the post to which you responded.
Right back at ya.
You're making the exact political argument that I illuded to in my original message: If you don't speak the dialect of the ruling class, you will be kept down. This was exactly my point. Grammar Nazi's are predjudiced a-holes who are afraid of losing control. If a generation of 1337 speakers manages to gain social, political, or economic control, then you may suddenly find yourself no longer speaking the 'official' dialect.
The "I'm concerned about the welfare of these young people" line is a load of crap, and you know it. So either admit that you're afraid of change, or instead be genuinely interested in the evolution of language. The way "lose" is becoming "loose", and "they're", "their", and "there" are blending into one word. To me, those things are absolutely fascinating.
I know you had a few other strange arguments in there about localized dialects (which is actually the opposite of what's happening on the internet), and somehow you tied the act of speaking a certain dialect into not being able to comprehend other dialects. But I don't see how those points are relevent.
Re:You don't get it, do you? by BiggerIsBetter · 2003-07-28 23:59 · Score: 1

You're making the exact political argument that I illuded to in my original message:

I think you mean alluded.

illude - To play upon by artifice; to deceive; to mock; to excite and disappoint the hopes of.

allude - To make an indirect reference.

This post brought to you by Grammar-Nazis-R-Us.

--
Forget thrust, drag, lift and weight. Airplanes fly because of money.
Re:You don't get it, do you? by technothrasher · 2003-07-29 00:23 · Score: 1

I think you mean alluded.

Hook, line, and sinker...
Re:You don't get it, do you? by brwski · 2003-07-29 01:31 · Score: 1

Yes, you will be kept down. This is not, however, because people are "afraid of losing control". It is because there are standards, and those standards --- especially when they have to do with communications and how one appears to peers/potential clients/etc., etc. --- are learnable, usable, and chosen for the purpose of facilitating communication.

Have you ever had to grade a paper? Have you seen first hand just how terrible the writing of most college students (not just high school or grade school students) is today? When they cannot differentiate between "they're", "their", and "there" it does not fascinate me. It saddens me. These students are the students who come to me complaining that their reading is too difficult, that they are being asked to do terribly hard assignments, such as writing a three-page paper. They have already placed themselves in the position of being unable to partipate in the public arena as citizens other citizens will listen to.

[Side note on homonyms: this is not a "blending into one word": this is the result of reasserted orality in our culture. They cannot tell the difference amongst the three forms of the word that sound just like `there' not because they are changing in meaning, becoming one word that somehow can mean any of the three, but because they can barely make out differences amongst any of the words on a page. Changes in the oral use of language are fascinating. Illiteracy is not.]

Being concerned about the young folks is anything but "a load of crap". If these illiterates are going to make any difference at all in the future, they are going to have to be able to communicate their positions effectively. Look at Malcolm X --- he was, if anything, more articulate than his opponents. Did he simply reject that way of speaking because it was "of the man"? No! He spoke to "the man" in his language, and with skill "the man" couldn't match. Was he happy about that? Hard to say. But he certainly would not have had the audience he did on both sides of the issues he dealt with if he had not used the dominant language-form. It comes down to audience: do you want to speak only to your sub-sub-culture, or do you want to your speech to have a (possibly) wider effect?

The localized dialects argument is relevant: though these dialects function perfectly well within villages, cities, or provinces, they fail as means of effective communication outside of their home areas. Therefore a common tongue is needed, promulgated, and used. If you want class/social/economic barriers between language groups, then by all means move to the UK, take on a low-class dialect, and then try to get yourself a job in a top-flight company. Best of luck. It's unfair, but it is how the world works. You make it clear that you know this yourself when you write: "If a generation of 1337 speakers manages to gain social, political, or economic control, then you may suddenly find yourself no longer speaking the 'official' dialect." Precicely so. Then everyone else would be in the position of the "l33t" today, and we could be having this same arguments, but perhaps we would switch sides.

If obfuscation is your goal, then by all means communicate solely in your chosen subgroup's dialect. If clarity is your goal, know how to communicate well in both, if for no other reason than that it will broaden your horizons more than a little. Dialects are not bad things, to be stomped out. At the same time, however, they ought not be the limits of one's language.

brwski

--
brwski
"Because without beer, things do not seem to go as well''
Re:You don't get it, do you? by technothrasher · 2003-07-29 02:22 · Score: 1

Yes, you will be kept down. This is not, however, because people are "afraid of losing control". It is because there are standards, and those standards --- especially when they have to do with communications and how one appears to peers/potential clients/etc., etc. --- are learnable, usable, and chosen for the purpose of facilitating communication.
You still seem to be missing my point. You are claiming that using the ruling dialect will facilitate greater communication. Yes, I agree. You are claiming that not using the ruling dialect will keep you down. Yes, I agree. You are arguing that education is a good thing. How could I disagree?
My argument is not with any of those points. It's that Grammar Nazis are people who come into a discussion with the sole purpose of pushing the 'standard' dialect onto people who are communicating just fine. They have a (perhaps unrealized) political agenda.
I also think you need to explore the ideas behind a 'standard chosen for facilitating communication' a little further.
one word that somehow can mean any of the three
It's called a homograph.
They cannot tell the difference amongst the three forms of the word [...] because they can barely make out differences amongst any of the words on a page.
I've been enjoying our discussion so far, but that 's just such an unneeded, unsupported, and arrogant claim that I think I'm about finished.
Being concerned about the young folks is anything but "a load of crap".
You're twisting my words. Being concerned about young folks in admirable. Claiming that Grammar Nazi behavior is due to such a concern is what is a load of crap.
If obfuscation is your goal, then by all means communicate solely in your chosen subgroup's dialect.
And if political suppression is your goal, then by all means come into my chosen subgroup and tell me my dialect is "wrong".
Re:You don't get it, do you? by brwski · 2003-07-29 05:20 · Score: 1

1.
And if political suppression is your goal, then by all means come into my chosen subgroup and tell me my dialect is "wrong".
Everything else aside, I think this is the keystone. For someone to tell speakers of a functional dialect that their dialect is somehow "wrong" is not right. We are in agreement about that.
What I am objecting to (and I don't think this is your position) is the idea that the dialect can be used in common discourse outside its home and have the one using that dialect expect to have their speech be automatically treated by those not in their group as worth listening to. Use the dialect, I don't care. Just don't expect everyone you run into to recognize what you have to say as being important. Use l33t speech for your article on the editorial page. Go for it. Just don't expect to be accorded the same level of interest as someone who doesn't. Speaking so as to be understood by one's audience isn't "selling out" --- it's common sense!
Is that right? Probably not. But I know that I have the tendency drilled into me from too many years of school to put down something I come across that is badly written. If the grammar is sloppy, odds are the thinking is as well, and I'm not going to spend my time on something that will probably not pay off. It's a bias, and likely an unfair one. There's no denying that. It's not always accurate, but it is the filter that I and many others use.
This is not about political suppression. This is about political realities. The dominant group does use their language as a club. But that club can be picked up and used just as well by those who are not dominant, and is often the only weapon that will get the attention of those in charge. Demonstrate in the street all you like --- fill your sign with obvious spelling mistakes and you've lost from the get-go.
2.
I've been enjoying our discussion so far, but that 's just such an unneeded, unsupported, and arrogant claim that I think I'm about finished.
Too bad. It was something of an overstatement, certainly. But it does reflect much of what I've seen as a teacher --- many, many students have been taught how to read words but they have not been taught how to read. Give them an article of some sort and have them spell out the argument of the author. On average (from my experience), seven out of ten will have an awfully hard time piecing the whole together. They will more often than not latch onto a portion of an argument instead of paying attention to the whole thing, which can lead to some interesting misunderstandings. Literacy is much more than reading words and sentences. It's learning the mental skills to put it all together. Oftentimes this is related to their being stuck in their own "dialect", if you will: if something is presented in a way they are unfamiliar with, they just don't know how to deal with it.
If thinking that makes me a Grammar Nazi, then you haven't met a real one yet. I think that there are standards that can be set and can be met, standards that are not too difficult to meet if one wants to be heard. Does that mean you can't use a dialect, or a slangset? No! It just means that as a writer, it is important to know one's audience and how to communicate with them. One can't expect a audience to bend to the author's will unless they have been given an aufully good reason to do so.
[For example, no one would have paid a whit of attention to Joyce if Finnegans Wake would have been his first book. He started with perfect English, but his stories were what was interesting. Then he began to play and play with English, and his audience was overjoyed to go along with him! But it took convincing, and he did not convince everyone that his direction was right. No one, however, would have been convinced if he had started there, instead of worked towards that place.]
It all comes down to whether or not someone wants to be understood or not. Some

--
brwski
"Because without beer, things do not seem to go as well''

What about copyrights? by The+Lord+of+Chaos · 2003-07-28 04:17 · Score: 1, Interesting

The big problem I see with this scheme is how do you collect the Gigs of data (ie content) without wholesale copyright violation or licensing (big bucks). Sure you can get lots of content whose copyright ran out from the Guttenburg project. But that's gonna be +70 year stuff.

Add the fact that the Mickey Mouse Copyright Extension act and related legislation threaten to extend copyright terms for infinity minus a day and you're never gonna have much content available that reflects CURRENT usage of the languages you're trying to translate.

Re:What about copyrights? by Anonymous Coward · 2003-07-28 04:32 · Score: 0

This is Slashdot, nobody here cares about copyright unless it is open source garbage.
Re:What about copyrights? by The+Cydonian · 2003-07-28 05:43 · Score: 1

Not quite. While Dr Och might not have popular American literature, the vast majority of published writings are still copyright unprotected, and can be easily harvested.
For instance, consider all the modded-up responses to this very story [to remove certain graphically-descriptive ASCII art that keeps popping up in -1 comments ;-) ].

--
More than mere navel gazing.

Ranking System by freeze128 · 2003-07-28 04:18 · Score: 2, Interesting

Even existing translation programs could benefit from a ranking system. Wouldn't it be helpful if you could tell just how confident the translator is about a certain phrase or word? That way, you could rephrase your sentence before you foolishly ask someone to "taste" you....

Ha! by Anonymous Coward · 2003-07-28 04:18 · Score: 0

Och's ability to work quickly was tested recently in June, 2003, when researchers all over the country (and in England) raced in a "Surprise Language" exercise sponsored by the Defense Advanced Research Projects Agency to create machine translation tools to deal with texts in Hindi.

Hmm, I wonder why they chose Hindi...

The quality of his Hindi system is now being evaluated against those created by other scientists at the same time.

Ahh. So they can get some cheap Indian labor!

What? by Anonymous Coward · 2003-07-28 04:19 · Score: 0

It's hardly news that you can always find correlations in two sufficiently large sets of data.

Hello? Did you think about this AT ALL? Of course you can FIND correlations between translated works, but how are you going to use them to translate OTHER works?

(hint: it's not easy.)

Four words by Anonymous Coward · 2003-07-28 04:19 · Score: -1, Offtopic

Bite the wax tadpole

If you want a universal translator... by flicken · 2003-07-28 04:20 · Score: 4, Interesting

...here is a link to the Universal Networking Language (UNL). UNL is a computer markup language that allows the author of the text to specify how exactly the text should be translated (i.e. what the precise definition of the words in the text are). Taking this specification, a machine is able to produce a readable version of the text in a variety of languages.

It's not quite done yet, but the system does show promise. Dictionaries have already been created in Spanish, English, German, Japanese, Italian, French and several other languages.

--
20 mil and I will! Learn Esperanto with 20M others.

Re:If you want a universal translator... by IceAgeComing · 2003-07-28 06:55 · Score: 1

Note, however, that the statistical method doesn't employ grammars, whereas UNL does and doesn't go any further.

It may seem at first that grammars would greatly aid translation, but the article implies that grammars are statistically derivable, and strange quirks in phraseology are better described by mapping idioms straight across, instead of going through a some kind of grammatical parser.

The proof will come when we can compare the two methods on the same text and see which is generally more readable.

ignoring grammar seems strange by meshko · 2003-07-28 04:20 · Score: 2, Insightful

I understand that this is a cool idea for building automatic translators, but is it practical? Basically what they are doing is taking a well-researched domain of languages and trying to make something new and cool in it by completely ignoring the domain knowledge. My intuition tells me that "always use as much domain knowledge as posssible" is an engineering axiom.

--
I passed the Turing test.

Re:ignoring grammar seems strange by One+Louder · 2003-07-28 04:42 · Score: 1

I think the project is making the point that translating based on context is a very hard problem, such that a simple pattern-matching method is doing a better job than the more complex "smarter" systems that attempt to figure out context and grammar.
On the other hand, one could speculate that there's a certain amount of context already there because the texts it's basing the translation on are parallel, and that it is more likely to find a correct contextual match ithe more data it's fed.
Re:ignoring grammar seems strange by blahedo · 2003-07-28 08:22 · Score: 1

You're absolutely right---if we're talking about a finished system. For research purposes, we need to find out just how far a "dumb" system will take us; and then we can add in the domain knowledge later. If we can get a "dumb" system to outperform one with lots of domain knowledge, then A) it is _clear_ that it is the algorithmic framework of the dumb system that is better, and not just that it had better domain experts, B) if that's the level of performance you want, why waste time with the domain experts?, and C) odds are good that you can improve performance even more by integrating domain knowledge into the dumb system, if you can only figure out how.

That last part is important. More often than not, it seems, enhancing a totally stupid statistical algorithm with domain knowledge actually *hurts* at first, until you figure out just the right way to provide the knowledge to the system without breaking the statistical algorithms. This is an active and very interesting area of research....

--
``This, too, shall pass.'' ---Eastern proverb

Several Missing Details by Flwyd · 2003-07-28 04:21 · Score: 5, Interesting

As press releases tend to do, this leaves much to be desired for folks who are familiar with the discipline. As I read it, it seems to imply that the main driver is phrase-matching. What does it do with phrases it hasn't seen before? The problem is solved by throwing lots of data at it -- how much data is needed for a reasonable system? How well does it generalize to text outside the domains of the training data?

Incidentally, had my brother been a girl, he was in serious danger of being named Rosetta Stone.

-- Trevor Stone, aka Flwyd

--
Ceci n'est pas une signature.

Re:Several Missing Details by rcs1000 · 2003-07-28 12:19 · Score: 1

This is going to sound like a troll but...

Be grateful she wasn't called Tawnee...

(Apologies to Autopr0n...)

--
--- My dad's political betting
Re:Several Missing Details by compling · 2003-07-28 21:00 · Score: 1

from what i remember, it extrapolates from known instances to unseen ones using a number of metrics; the least number of deletions/insertions/subs, searching for the phrase that requires least transformation to match etc. i'm sure he must be using some decent smoothing techniques too, as he has worked for a while with Ney, who is famous for his work on the subject.

if you need more details check his papers. http://www.isi.edu/~och/

Statistics by Anonymous Coward · 2003-07-28 04:22 · Score: 0

It seems statistics is becoming a major force in our lives. Bayesian algos keep spam views down to almost tolerable levels, statistical analysis of texts helps with translations, weather is predictable at the macro level, Tivo collects data in the aggregate, etc. See any connections?

The Obviousness Nazi

Wordrank by chronos2266 · 2003-07-28 04:23 · Score: 2, Interesting

I always thought it would be interesting if google applied its page rank algorithm to provide a translation service. Like poll the top 5 translation service sites for a translated sentence and then based on what each of them return, generate a 'average' or best possible result for that sentence.

MOD PARENT UP by Anonymous Coward · 2003-07-28 04:24 · Score: 0

+1, Insightful

Give Me Enough Hot Grits!!! by brakk · 2003-07-28 04:24 · Score: 0, Offtopic

In Soviet Russia, enough gets you!

WTF are you talking about? by NDPTAL85 · 2003-07-28 04:25 · Score: 1

There are 1.5 Chinese people and 4.5 billion other humans. How do you figure the Chinese outnumber the rest of humanity?

--
Mac OS X and Windows XP working side by side to fight back the night.

Re:WTF are you talking about? by Kintanon · 2003-07-28 04:46 · Score: 1

You == Stupid.
But if you want to be pedantic the other poster should have said "The single largest cultural linguistic group is the Chinese."
They outnumber any other single group even if they don't outnumber all of the groups combined.

Kintanon

--
Check out JoshJitsu.info for Brazilian Ji
Re:WTF are you talking about? by Xentax · 2003-07-28 04:53 · Score: 1

Right. So he should have said that Chinese form a *plurality*, not a majority.

And, from what I've heard (granted, that's pretty flimsy), you can't lump all of China into one or even just a few lingual groups. I'm honestly not sure how big a hole that kicks in the argument.

China has a lot more than just an oppressive goverment to deal with, though...

Xentax

--
You shouldn't verb words.

GPG by Gothmolly · 2003-07-28 04:26 · Score: 1

GPG or similar, or using large, one-time pads will always work. Of course, then they just make encryption illegal. What are you trying to hide, eh? Only dishonest people need privacy! Eh!

--
I want to delete my account but Slashdot doesn't allow it.

Or a "culturally superior" Parisan Frechman. by raehl · 2003-07-28 04:26 · Score: 1, Interesting

When I lived in Europe, a friend and I went to Paris. We're both bi-lingual; myself German, him Spanish, but unfortunately neither of us knew French. We had occasion to ask which train we neededto be on to get somewhere; and asked (in French) if the person we were asking for directions knew Spanish, English or German. We went through a good ten people before we found someone willing to admit that they spoke something other than French.

I'm sure they thought they were being all "Ha-ha, I will not let these Americans get away with not speaking French!" but our interpretation of the situation was "We're americans, we speak two languages, what's wrong with you?"

--
paintball

Re:Or a "culturally superior" Parisan Frechman. by Dunkalis · 2003-07-28 04:44 · Score: 1

Most French people think I'm German. Therefore, I get responses in really broken German. To which I respond with pretty fluent German (no, not fluent). If they think I'm American, that shuts them up quickly.

I seriously believe they are not trying to mess with you, but that they only speak French. We were in a restaurant, and we Americans were speaking American to each other. They started by asking us a question in German. Well, it wasn't really a question, but a SINGLE WORD. This guy knew a few words and the numbers. Thats it. He seriously thought we were speaking German to each other. This was on the German border. Now THAT is sad.

This technology seems pretty cool, but it is definitely not a good tool for verbal communication.

FYI: I've never looked at a French rail schedule, but I'm guessing its like a German one. Look at the time you wish to leave, then look for a train that takes you to the proper town. Look for the word "Track" in French at the top, then look down to your train. Run to that track. Of course, it could be really complex, knowing the French.

--
Slashdot is a waste of time. I enjoy wasting time.
Re:Or a "culturally superior" Parisan Frechman. by TheLink · 2003-07-28 04:56 · Score: 1

Uh, how about reversing the situation - a french tourist going to the US and expecting locals to know French.

What works for me is to speak to the French person some very very basic french and then work the rest out with some sign language (while it might help if you got your friend to speak a bit of Spanish in the background, I doubt German would help things ;) ). Anyway if people don't bother to learn even very basic French, they shouldn't expect the French to speak English for them.

The Metro wasn't too bad. Fortunately I didn't ever have to deal with numbers and counting ;).
--
- Too many replies beneath your current threshold
Re:Or a "culturally superior" Parisan Frechman. by bogado · 2003-07-28 04:58 · Score: 1

I've been in Paris last april, I myself am Brasilian and speak a fair english. when I needed directions or something I would aproach people and ask politely "Bonjur" ou "excusez-moi" (The only two words I knew in French) after that I would ask "english?". Usualy my answer was "a little" and I would ask what what ever I needed to know in english. The direction usualy came out in a mix of bad english and french, but with a little of gestures added it usualy helped.

Is my understand that when dealing with french people you must always be polite, when starting a conversation (we Brasilians usualy don't require these formalities). Otherwise they think you are being rude, and treat you rudely.

Anyway for all that I was told about the French people, I was expecting to be treaded with public humiliations or beatings. But my experience was the oposite. People were not as friendly as we are here. But none of them were rude or unwinlling to help when I needed.

--
[]'s Victor Bogado da Silva Lins
^[:wq
Re:Or a "culturally superior" Parisan Frechman. by Catskul · 2003-07-28 05:03 · Score: 1

I had the same problem. I speak English and Spanish. When I was visiting Paris, I arrived in the train station during this year's strike. I could not find a train schedule and the information desk was closed. I asked every train station employee I could find if they could speak English: "Parle vouez Anglais" and none would admit that they did, although I suspected outherwise. Finally I got in the long ticket line despite the fact that I had a ticket already, in a last resort attempt to find a train schedule. I guess there were alot of people who had the same idea, and so the station had a representative going through the line and weeding out all the people not buying tickets. When she came to me and I asked if she spoke English, she grudgingly said that she spoke only a very little bit. When I began to explain my situation, it became obvious that she new English just fine. I felt very angry that so many people would be so rude as to refuse to answer a simple question, or to reply that they didnt know where I could get the schedule. It really gave me a bad impression of the French, expecially since everyone else on my trip through Europe was so friendly and helpfull. So I say thankyou to the Swiss, the Spanish, and the Italians, and to the French, I say: learn some manners.

--

Im not here now... Im out KILLING pepperoni
Re:Or a "culturally superior" Parisan Frechman. by Catskul · 2003-07-28 05:15 · Score: 1

I think the situation is not symetrical.
France in the midst of a rather small, yet very multicultural continent. It is more of a necessity for French (and other Europeans) to be multi-lingual, for business, and even social reasons. Since most other European contries' school systems teach English as the "Common Language" It would also make sense for French to have learned English. Whereas, residents of the United States a in a fairly mono-cultural environment, and so multilinualism is not so much a necessity. However, many US residents know Spanish, which is the second largest language group in the US. I think you would find if you were Spanish and asked in Spanish, you would find very little trouble finding help in most large cities, esp in the South. I dont think it is unreasonable for someone to expect to be able to find help in the language that is most used in cross cultural communication, in the capital of France.

--

Im not here now... Im out KILLING pepperoni
Re:Or a "culturally superior" Parisan Frechman. by Anonymous Coward · 2003-07-28 05:22 · Score: 0

Let me turn that around a bit.

I grew up in the US, until high school, when my family moved to Hong Kong. I'm Chinese, and can speak Cantonese and English fluently.

Around my senior year of high school, I was going over to a friend's apartment, when this tourist approaches me. He was the stereotypical American tourist, hawaiian shirt, camera, and all. He comes up to me and goes:

"Do you speak English?"

Before I have a chance to say yes, he goes:

"ENGLISH. YOU KNOW ENGLISH? EN...GLISH"

basically screaming into my ear (Why do people think speaking louder will suddenly make them understood?) He then starts waving his hands around in some bizarre sign language he must've made up on the spot before his wife pulls him away muttering "I don't think he knows the language, dear"

Now, I'm American. I know English. And yet this guy, going only by my looks, figured I must not know any English. In that situation, do you think that I would be encouraged to talk to him, or even give him the time of day in English? I didn't think so.
Re:Or a "culturally superior" Parisan Frechman. by Jedi+Alec · 2003-07-28 11:22 · Score: 1

this just in...the french government has decided to banish the word "e-mail" from all official documents and releases. It will be replaced by courriel, which is a correlation of courrier electronique(electronic mail).

And the worst part is, this isn't even a troll...

--

People replying to my sig annoy me. That's why I change it all the time.
Re:Or a "culturally superior" Parisan Frechman. by BiggerIsBetter · 2003-07-28 23:43 · Score: 1

Is my understand that when dealing with french people you must always be polite, when starting a conversation (we Brasilians usualy don't require these formalities).

You've nailed an important point there. Not specific to the French, this is a BIG THING when travelling, and it's something folks who don't leave home rarely have to deal with.

People have different cultures with different ideas about how to behave. Ever heard the term "Loud American" or "Arrogant Swede" or any number of similar phrases? Most of the time, it's simply cultural differences that you weren't aware of. What's rude to you might be perfectly normal to me, and vice-versa.

Just something to think about next time your travelling or dealing with a foreigner.

--
Forget thrust, drag, lift and weight. Airplanes fly because of money.

The obligatory Esperanto reference by flicken · 2003-07-28 04:26 · Score: 1

If everyone learnt an international second language, such as Esperanto, then the need for a translators and translating programs would be greatly reduced.

For those of you wanting to learn a language that is spoken by approximately 2 million people around the world, start learning Esperanto today!

--
20 mil and I will! Learn Esperanto with 20M others.

Re:The obligatory Esperanto reference by Anonymous Coward · 2003-07-28 04:47 · Score: 1
How about:
- Mandarin - 1 billion+
- Hindu/Urdu - 600 million
- English - 500 million
- Spanish - 350 - 400 million
- Russian - 275 million

Can it translate Bob Dylan? by SphynxSR · 2003-07-28 04:28 · Score: 1

If it can translate Bob Dylan, then it can do anything.

--

I don't suffer from insanity, I enjoy every minute of it.

Translate Pascal To C and Such by Potpatriot · 2003-07-28 04:28 · Score: 4, Interesting

How about piping in various algorirhtms encoded in Pascal and C into the thing and seeing what it does to convert arbitrary sources. Where Can I get the soource? Pawel

Re:Translate Pascal To C and Such by Daniel_Staal · 2003-07-28 04:45 · Score: 1

If that works, try this: pipe in various programs in $languageofchoice and their complete English descriptions...

It would be fun to see how close to useable code it could provide.

--
'Sensible' is a curse word.

Nederlands... by MsGeek · 2003-07-28 04:30 · Score: 0, Offtopic

It's interesting how English-like Dutch is. A Dutch friend of mine, Annamiek, has a Mac running the Dutch version of MacOS 9.2.2. I had no problems navigating around on it, and the menus and dialogue boxes were fairly sensible to me even though I had to ask her about a few particular words.

Dutch is basically a cousin of English, with both being heavily influenced by Low German. Yiddish also came from the same source, in this case influenced by all the countries the Jewish Diaspora passed through, like Russia and Poland. English is basically Low German with lots of stuff that came to us from Latin-derived languages like French and Spanish and Italian.

Oh yeah, in Holland, most people speak English. A lot of people in the Netherlands speak English better than we Yanks do. For that matter, so do most non-rural Pakistanis and Indians. I wouldn't be surprised if, in my lifetime, American English morphs even further. Enough to where non-Americans who speak English will, basically, only be able to function with it like I did when I was helping Annamiek fix her computer.

--
Knowledge is power. Knowledge shared is power multiplied.

What about C++? by MobyDisk · 2003-07-28 04:30 · Score: 4, Funny

So, can I train this program with a bunch of requirements documents, and a bunch of implementations, and have it learn how to code? :-) If so, I think I am obsolete. *poof*

Re:What about C++? by dsplat · 2003-07-28 05:42 · Score: 1

I know you intended that as humor, but the serious answer is "no".

Requirements never capture every detail. There are a number of reasons. The most obvious of them are: people writing requirements often do not have specialized programming knowledge; requirements often refer to the analog real world rather than the digital world of the code; there are some things that are implementation specific; details are discovered after the requirements are written.

Using an automated system like this would be exactly like using a compiler to generate machine code with two nasty differences. C++ (or any other high-level programming language) looks like a formalized symbolic language for what the program should do. English and other languages for human communication do not. It would be too easy to assume vocabulary that the translator didn't have. Second, there is no specification for the mapping of the requirements to the code other than the examples. There is no way of easily stating what some statement in the requirements will equate to.

Imagine some worst case situations. The code adds constraints that are specific to the implementation platform, such as bounds checking for the representation of the data. For example, had this been done 30 years ago, imagine that all of the code examples represented the year as a 2 digit number. Consider the possibility that several of the implementations don't implement some statement in the requirements. Even when the implementation is complete and correct, this could happen. Just have two statements in the requirements one of which is a stricter version of the other. The stricter one is implemented and the less strict one is not. The translator may not be able to tell which one was implemented. And it may treat one of them as a null statement.

Don't even ask me how you would debug the stuff. I don't have any idea.

--
The net will not be what we demand, but what we make it. Build it well.
Re:What about C++? by luisdlc · 2003-07-28 17:50 · Score: 1

I can appreciate the funny perspective of this comment, nevertheless, it got me thinking:
May be, it is not so far away the time when you can plug a translator in you ear (or at least the headphone of your PDA) and *poof!* suddenly 'human translator' is not a well enough remunerated job...

A programmer may not lose his/her job, but many others will, the thing is, would this be a dangerous condition? Maybe not, but could still be a negative factor in the unification of language, after all you could then even understand klingon.

Of course, the same reason that keeps us from traveling on pilotless planes or accept surgeonless surgery; could very well prevent general reliance on this technology.
Re:What about C++? by lfourrier · 2003-07-28 22:34 · Score: 1

you suppose that implementations respect requirements?

How do you figure? by raehl · 2003-07-28 04:31 · Score: 1

This automates the task of converting from oe language with a large body of existing translations to another. If you're relying on putting your secrets in French and not having anyone who knows French read it to keep your secrets secret, you're an idiot.

As for used in a court of law, maybe, but who cares? We have an advesarial judicial system - if the translation is wrong, get someone who actually knows the language being converted to/from and refute it.

Speaking in a language that can be translated back to English isn't any more private than speaking in English. There's no loss of anything here.

--
paintball

Re:How do you figure? by Anonymous Coward · 2003-07-28 05:17 · Score: 0

As for used in a court of law, maybe, but who cares? We have an advesarial judicial system - if the translation is wrong, get someone who actually knows the language being converted to/from and refute it.

Yes but if it sounds like my voice but says something I didn't, then how am I to prove I actually said something different?

Liuke what if it translates me telling a confidante "I did not kill the president" and translates it back to me in my voice saying "I killed the president" but sounds like me and matches my voice print and am I suppose to just say "Oh that's false I never said that"

Doesn't work in a court of law sorry, BUDDY

Yeah, that's a spectacular idea.... by raehl · 2003-07-28 04:36 · Score: 1

If you want all those foreign web pages to come back translated into english from 1648.

--
paintball

It is called... by www.sorehands.com · 2003-07-28 04:38 · Score: 1

It is called the Bible. Not only has the translation have been done into many language, but you are dealing with discreet, labelled paragraphs.

Back in 2000, there was a professor at U of MD that was working on using the Bible as a language source.

--
Fight Spammers!

Re:It is called... by blahedo · 2003-07-28 08:44 · Score: 1

The Bible is an excellent and important resource for getting started on MT (machine translation) and NLP (natural language processing) in general, especially for languages with smaller speaker bases.

The problem is, the language style used is very specific to the Bible. Even in translations that don't feature lots of "thou shalt not" and "thus spake", you get some really strange constructions that make it unsuitable for most tasks, unless you really have nothing else to work with.

--
``This, too, shall pass.'' ---Eastern proverb
Re:It is called... by Warped-Reality · 2003-07-28 13:18 · Score: 1

Or you don't use the King James version...

--
This is not the greatest sig in the world, no. This is just a tribute.
Re:It is called... by blahedo · 2003-07-28 19:05 · Score: 1

Or you don't use the King James version...

Eh? What part of "Even in translations that don't feature lots of 'thou shalt not' and 'thus spake'" did you not understand? While using the KJV certainly enhances the problem, even the more colloquial translations use a very different sort of English than that used for nearly anything else.

--
``This, too, shall pass.'' ---Eastern proverb

not a new technique by Anonymous Coward · 2003-07-28 04:38 · Score: 1, Informative

IBM tried this statistical technique years ago, it's not a new approach. They used the texts of Canadian parliamentary discussion, which is kept in both English and French. See here or just search Google for "IBM tranlslation canadian parliament" or the like.

Not that terribly new by Anonymous Coward · 2003-07-28 04:39 · Score: 0

Just like all kinds of other things on Slashdot, this is early 90's technology that people here are just starting to hear about, cf. any number of citeseer refs on the subject of statistical machine translation. It even says so in the article, though I'm sure most of us didn't bother reading the whole thing.
You do need parallel texts to make this work, i.e. things like the Canadian parlimentary transcripts (french and english), or computer/car/equipment manuals that were translated into several languages.
I'd bet anyone a pretty penny that this is only an incremental improvement upon what everyone's been working towards the last few decades.
It's annoying that the article was so laudatory for Mr. Och b/c it just does him and everyone else who's working on these problems a disservice when naiive people expect more than they were promised.

I's from the future by newt_sd · 2003-07-28 04:39 · Score: 1

I is ahead of my time I's been completly ignoring grammer rulz for years now.

--
***I GOT NUTHIN***

Re:I's from the future by Anonymous Coward · 2003-07-28 04:45 · Score: 0

TEACHER: Johnny, can you give me a sentence that starts with "I"?

JOHNNY: I is...

TEACHER: No, no, no, never start a sentence with "I is", always say "I am".

JOHNNY: I AM the ninth letter of the alphabet.

Programming Languages? by The+Raven · 2003-07-28 04:40 · Score: 5, Interesting

I wonder how this would fare putting two computer languages side by side? I mean... take a few thousand programs, coded using the same algorithms but different computer languages... would his language translation software translate between them? Would it be able to differentiate between languages that manually allocate memory and those that use garbage collection? How about between procedural langauages like C, and more esoteric and oddly structured languages like LISP?

An interesting challenge, eh?

Would there be any benefit to this?

--
"I will trust Google to 'do no evil' until the founders no longer run it." Hello Alphabet.

Re:Programming Languages? by Dan+Crash · 2003-07-28 04:55 · Score: 1

It's an interesting idea, and I imagine it might work well for translating between very similar languages, such as PHP to ASP. Broadly speaking, though, I can't see it translating efficiently between one arbitrary language and another. Coding techniques for individual languages are often so different that you may not be able to construct corresponding statements between them. Or rather, the statements you could construct might be so complex, nonintuitive and inefficient that the code wouldn't be worth using.

It would be an interesting challenge, though.

--
He who refuses to do arithmetic is doomed to talk nonsense.
Re:Programming Languages? by micromoog · 2003-07-28 05:41 · Score: 1

Translating between computer languages is a much, much easier problem. There are already many utilities that do a bang-up job of this.
And I doubt their software would have any success . . . computer languages rely very heavily on perfect and exact "punctuation", whereas in human languages it's not nearly as important.
Re:Programming Languages? by gmarceau · 2003-07-28 06:29 · Score: 1

Let's see, these translation system require about one gigabyte of two-way translated text to train. They get them from gouvernemental diaries from bilingual countries and from online bilingual newspapers.

Where would you find one gigabyte woth of manually translated code to train with? The Great Programming Language Shootout and the 100 bottles of beers page together hardly add to one gigabyte.

Coding a compiler to translate between coding languages is much easier then manually translating one gigabyte worth of code. Plus the compiler will always give correct result (modulo bug), whereas the statistical approach is merely "likely" to be correct.

Sorry to burst your bubble.

--
This post was compiled with `% gec -O`. email me if you need the sources
Re:Programming Languages? by Anonymous Coward · 2003-07-28 15:23 · Score: 0

The folks at Parrot might be your best bet.

You are probably wrong by heironymouscoward · 2003-07-28 04:44 · Score: 1

People have denied the automatability of human skills for the last ten thousand years. "They could not hit a barn at that dista...". Famous last words.

The human brain does not emply magic. It uses strategems, hard-coded guesses, models, and logics, filled with and tuned by accumulated information and knowledge.

All this is automatable. It just takes a _lot_ of investment.

My prediction is that within ten years we will have machine translation that speaks significantly better than average people, although not nearly as well as professional translators.

And it will be as banal as playing chess against a piece of software.

--
Ceci n'est pas une signature

translation obsolete? by Anonymous Coward · 2003-07-28 04:44 · Score: 0

Wat dacht je hiervan dan? Ik kan wel engels spreken maar dat wil niet zeggen dat ik daar altijd zin in heb!

Did you not hear about the web becoming increasingly mutlilingual - not less? With the advent of things like Unicode the use of other languages than good old english will only increase. I think in software at least the predominance of english is a historical artifact.

I also think you are also somewhat mistaken when you say that translations will only matter in the most complicated subjects. Personally studying astrophysics I find that the more specialized the subject the more likely it is that there is only English literature on it. More mundane subjects almost always have dutch (my mothertongue) translations. English is the lingua franca of the sciences but for day to day things I am more likely to use dutch. Plus there are some things better said in dutch than in english (and vice-versa ofcourse).

English as a sort of universal newspeak? No thanks!

Give me enough Slashdot antries... by Pac · 2003-07-28 04:45 · Score: 5, Funny

...and I will make pseudo-insightful comments based on the headline text without reading any of the source articles, until my karma is excellent?

Re:Give me enough Slashdot antries... by Josuah · 2003-07-28 08:09 · Score: 1

...and I will make pseudo-insightful comments based on the headline text without reading any of the source articles, until my karma is excellent?

And you'll also look like a fool because all your posts will match the bad grammar and spelling of all the other "antries". (And I know it's often considered bad form to start a sentence with 'and'.)

But Can It Do Klingon? by opti6600 · 2003-07-28 04:45 · Score: 2, Interesting

Now that would be cool.

Seriously though, this leaves only the odd tribal languages of African (and perhaps South American?) tribes that are comprised entirely of clicks and gutteral sounds as not easily comprehended. Could this system's approach finally result in a Babelfish-like universality even for languages such as Chinese and Japanese? The added complexity makes it much more challenging for things like Babelfish, but if this system can do it, it's going to be a landfall discovery.

Anybody have any further research by this guy? I'm interested! Who knows, maybe I could have gotten a better grade in French thanks to this research...

NOT obsolete. Absolutely. by zedmelon · 2003-07-28 04:47 · Score: 1

Almost everyone can speak, read and write at least tolerable english

Even though your definitions of "everybody" and "tolerable" differ greatly with mine, at first glance, it appears that you're making a good point. But there are two huge problems with your post.

Firstly, while English is fairly widespread, it is by FAR not any sort of "universal" communications medium. It's my primary language (there's also the Spanish that I rarely get to use, so it languishes pitifully), but I know there are literally BILLIONS of people who don't speak a word of English, and hundreds of millions who have never been--and will never be--exposed to spoken or written English in their lifetimes.

Secondly, even if English were perceived as a solution to inter-cultural communications, the thought abhors me. As ignorant as many Americans are, and as difficult as it would be to get some Americans to consider using ANY other language, English should definitely not be the first choice as a unified "Official Planetary Language."

A worldwide movement to eliminate the use of other languages (or even let them fall to a state where they might meet the same fate as my Spanish skills) would signify an intolerable level of neglect toward so much of the diversity that makes our world interesting and colorful.

Sure, if you speak your native tongue throughout your day and type English when you browse Slashdot, English will continue to be a "secondary language." But you can't deny that if English were the language of choice in every public forum, atrophy would eventually overtake any attempts to retain fluency in one's other languages.

Language barriers can prove frustrating at times, but there are MUCH better ways to improve efficiency. Buy a pocket translator if you're hurting for ideas.

--
Mom says my .sig can beat up your .sig.

Not to mention.. by k98sven · 2003-07-28 04:47 · Score: 3, Interesting

The Rosetta stone itself did not do much in the way of our knowledge of the egyptian language.
What it did do, was provide insight into their method of writing.
It was the latter discovery of the the relation between Coptic and Egyptian that revealed most of the actual language.

(IIRC)

Re:Not to mention.. by LenE · 2003-07-28 06:44 · Score: 3, Interesting

For those who don't know, Coptic is Egyptian written in Greek, or at least the Greek alphabet. It would be similar to transcribing a language that uses glyphs for words by recording them with the phonemes and alphabet of another language.
A more modern example is what happened with the slavic Croatian language. The original speakers had a glyph based alphabet called Glagolitic, through the middle ages. This would be as foreign as Egyptian hieroglyphs to people today, and could stand in nicely for an alien text in any sci-fi movie.
Through falling under different feudal states (Venice, Austro-Hungary) the language was cast under both the Cyrillic and Roman alphabets. Today Croatian uses an accented Roman alphabet (like French), but each letter has only one pronunciation, like Russian.
-- Len

Possible users of automatic translation systems by ThufirHawat · 2003-07-28 04:47 · Score: 1

May I point out that the biggest user for this, if it works, are unlikely to be secret services (US or not) or religious folks?
The European Union will have 25 members on the 1 May 2004, if all goes well, and European legislation will then have to be translated into 21 different languages (not 25, some Member States share the same language).
Can you begin to imagine how many battalions of translators we're going to need?
There hasn't been a substantial breakthrough in automatic (i.e. unaided) translation in at least 15 years, and, if Moore's law holds, I'll take this with thanks, as I would be but too happy to throw processing power at it...
I have seen it all: SYSTRAN, originally used to read Russian confidential messages, EUROTRA (sort of son-of-SYSTRAN), METAL (the Siemens system, good if the writer is a cyborg who uses standard building blocks-maybe the oilman in the White House might fancy it...), whatever.
There might be some promise in this approach, as the problems in parsing weird languages (Estonian, for instance) seem at present unsolvable.
This fellow deserves watching, methinks...

--
Thufir Hawat
Part-time Mentat

Re:Possible users of automatic translation systems by Anonymous Coward · 2003-07-28 05:26 · Score: 0

I don't think parsing Estonian will be a problem. Estonian is very close to Finnish, and there has been a lot of successful work on the parsing of Finnish. In fact, the two-level grammar approach developed for Finnish has been successfully applied to a variety of other language families that are not Indo-European.

thine ignorance doth appal me by blach · 2003-07-28 04:48 · Score: 1

You know, I just don't get why christian-bashers tend to be so bleeding ignorant.

I mean, when *I* make fun of someone, I make sure I'm educated enough about them that I don't sound like a total fool.

Maybe the Old King James (or "Authorized Version") was written in the language of the 1600s, but not only is there the New King James, there are plenty of modern-english translations, including those who have completely translated idioms into modern-day English idoms, which makes for much more interesting sit-down leisure-reading than the stuffy old king james version.

Consider yourself educated.

Scientific Papers by acoustiq · 2003-07-28 04:50 · Score: 4, Informative

Being an undergrad hoping to do research in this area in the next few years, I've already read a few of Och's papers and others in the field. Some of the best that I remember are:

Improved Statistical Alignment Models (2000) - Franz Josef Och, Hermann Ney, which investigates and compares several models
A Syntax-based Statistical Translation Model - Yamada, Knight (2001), which tries to treat sentences structurally instead of just a stream of words
A Finite-State Approach to Machine Translation - Bangalore, Riccardi (2001), which uses a different way of looking at the problem than usual

Kevin Knight prepared an excellent (if now somewhat outdated) introduction to statistical machine translation that you can see in HTML or RTF (the formatting was corrupted when the RTF was converted to HTML - I recommend the RTF).

--

--
I romp with joy in the bookish dark

statistics is the key by gemseele · 2003-07-28 04:53 · Score: 5, Interesting

Time for inflamatory reasoning. The statistical approach will beat out the grammar and rule based ones, at least for English, is for the simple reason:

English is not a language

Or rather, it resembles one but is more not than is, IMO. It is a large collection of idiomatic expressions that changes quite rapidly (and not only in colloquial forms, just look at what the political-correctness movement has done to phraseology). You know the story... more exceptions than rules, things that are legitimate to say language-wise are considered incorrect anyways, and vice versa, etc. etc.

That's not to say it doesn't have advantages; it's relatively easy to learn the basics of communication since it's weakly conjugated, has genderless articles, fairly simple uncased sentence structure. But, it is monstrous to master and I suspect most native speakers aren't true masters (not to mention the orthographical nightmare; is English the only language with spelling bee contests?)

The reason it's the new lingua franca (or should it be lingua angla now?) is techno-socio-political as is always the case. Stop harping on Americans for being largely mono-lingual. "Why didn't the Romans learn the local languages when they controlled Europe? Because they didn't have to." If every state spoke a different language, which would be more akin to Europe, then there would be need.

Re:statistics is the key by The+Cydonian · 2003-07-28 06:11 · Score: 2, Interesting

English is not a language... [because it]... is a large collection of idiomatic expressions that changes quite rapidly

Fair enough, English changes rapidly alright, but how would you define a language? A set of logical syntactic and semantic rules that haven't changed for the past few thousand years? I can think of only two languages like that, Latin and Sanskrit.
Nope, I can't agree with your assertion; language is much more than mere (unchanging) grammar. In many multi-cultural places, it is a strong factor for socio-political identities; throughout history, communities have fought against great powers to assert their linguistic identities.
Stop harping on Americans for being largely mono-lingual. "Why didn't the Romans learn the local languages when they controlled Europe? Because they didn't have to." If every state spoke a different language, which would be more akin to Europe, then there would be need.
Actually, there are 329 languages spoken in the United States, many of which are spoken only in the US and nowhere else.
Of course, like in other countries, most of these languages will probably end up as an anthropologist's museum specimens, but really, mono-lingualism of most educated Americans is not because you speak only English in the US. It's mostly because the numbers of other languages aren't quite there.
Which brings us to a very interesting conjecture; I'm no American, (nor have I visited the area in question, so I appreciate responses on this) but if I may hazard a guess, by 2030's, learning Spanish will be essential to live in most of south and south-western US. That is to say, I assert that the current pre-dominance of English in the US is only a historical accident, one that will change with shifting demographics.

--
More than mere navel gazing.
Re:statistics is the key by gte910h · 2003-07-28 07:03 · Score: 1

Which brings us to a very interesting conjecture; I'm no American, (nor have I visited the area in question, so I appreciate responses on this) but if I may hazard a guess, by 2030's, learning Spanish will be essential to live in most of south and south-western US. That is to say, I assert that the current pre-dominance of English in the US is only a historical accident, one that will change with shifting demographics.

I grew up there (Southern California). My grandparents ran a 7-Eleven that I occasionally worked at until I moved away at 12. You quite quickly picked up a chunk of spanish in an envronment like that (many migrant workers coming in and out). I remember being able to count back change in spanish, as well as understand many of the items behind the counter they wanted.

Most people are able to understand a chunk of Spanish who live anywhere near the suburbs, especially middle class through poor social strata. I know most people take it as their high school language because they already know 10-40% of what they'll teach in the class.

. Strangely enough, I don't remember much of what I knew, just that I used to be able to do it. Then again, a couple friends of mine from Russia and Europe (moved at an early age) say you often forget them when you move that young, especially if you weren't fluent.

--
Want to see every step I took to start my company? http://www.rowdylabs.com/blogs/pitchtothegods
Re:statistics is the key by Jeremi · 2003-07-28 07:20 · Score: 4, Insightful

English is not a language. Or rather, it resembles one but is more not than is, IMO. It is a large collection of idiomatic expressions that changes quite rapidly

You are actually arguing that English is not a dead language. Every language that is actually in use by large numbers of people is as you describe.

--

I don't care if it's 90,000 hectares. That lake was not my doing.

Never by griblik · 2003-07-28 04:55 · Score: 1

Almost everyone can speak, read and write at least tolerable english

That may be the case on /. , because english is the primary language of the site's main audience - stands to reason that english speakers would look at an english language site. On top of that, most people here are geeks, and english is (as far as I know) the human langauge most programming langauges are related to.

I think it more likely that we'll end up with a lingua franca for the net combining useful bits from any language that has something useful to offer. Look at 'english' as it is now; it's heavily laden with words borrowed from the latin group, nordic languages, even chinese and japanese. And where would we be if you couldn't say karaoke? ;)

Personally, I think it'd be a terrible loss to for the whole species if we all spoke the same language. There are ideas and concepts you just can't express in english that appear naturally in other languages, and I'm sure english has much to offer people who don't speak it as their first language.

Variety in all things... :)

--
Warning: May contain nuts

Huh? by Sanity · 2003-07-28 04:55 · Score: 1

If you want to transmit something secretly then encrypt it! If you idea of secret communication is speaking in another language then you sorely need to learn more about cryptography.

How dare you ask by Anonymous Coward · 2003-07-28 04:57 · Score: 4, Funny

But Can It Do Klingon?

How dare you question the honor of this program! I should kill you where you stand!

It's just goes to show.. by apetime · 2003-07-28 04:58 · Score: 1

..that all my brilliant ideas have already been thought of by others.

I had this idea a few weeks ago after reading a biography of NEC founder of Koji Kobayashi, whose dying wish to NEC engineers was to have a machine that could instantly interpret English and Japanese speech by the date of his hundredth birthday in 2007. I wrote down my brilliant idea to use only statistical matching and a huge database of texts to make a translation on a napkin from the coffee shop I was in. I spent a few days thinking about it some more, but decided I would put it off until I had a better computer (better than my P266MMX) and could actually program. I guess I can forget about that now though... Sigh. And to think this guy had me beat before I'd even thought of it.

A poor but working method better than none... by Pac · 2003-07-28 04:58 · Score: 1

You may have methodological problems with their approach, but it works. Simple approaches are not fundamentally bad, specially simple approaches no one has tried before. From the small amount of information available, it looks like a very promissing path once you have enough storage and processing power. Guess what, we now have both at consumer level prices. So why not try the "dumb" method? Specially if it works better than all other methods available.

The alcohol is arranged... by Anonymous Coward · 2003-07-28 04:59 · Score: 0

Using babelfish with English->Spanish, then
Spanish->English we get:

the alcohol is arranged but the meat is weak

Seems similar to Bayesian spam filter programs... by jetsetscoot · 2003-07-28 05:05 · Score: 2, Insightful

... where the more available examples of actual spam and actual non-spam the better the accuracy of the result, and where you basically let the computer work out the probability, rather than feeding it hard and fast rules up front.

Can anyone say if the two procedures are technically related?

Translate THIS G! by Anonymous Coward · 2003-07-28 05:08 · Score: 0

I would love to see an Ebonics translation of Harry Potter...

Simulating persons' way of speech? by ivoras · 2003-07-28 05:09 · Score: 2, Interesting

Given the statistical data, this could probably be used to simulate a text written by a specific person, for example Shakespeare.

"You look nice..." --> "Shall I compare thee to a Summer's day..."

--
-- Sig down

Better translation will spark revolution. by BelugaParty · 2003-07-28 05:10 · Score: 1

I know this is a dramatic title. But better translation systems will do as much for cross cultural communication as the internet has done for cutting through geography.
I am waiting for the day when I can read Middle Eastern texts without having them selected, censored, and biased by american publishers and editors.

Translating Bibles/Amazon.com/etc... by HanClinto · 2003-07-28 05:15 · Score: 1

Today, the biggest leader in translating Bibles into other languages is the Church of the Latter Day Saints

Because of the large volume of required input language into the system, I don't think that this system will be good for translating Bibles into new languages (think Wycliffe USA).

The advantage of this system, as it would pertain towards helping a particular religious community, is to make it easier to translate the large amount of books on religion into other languages more accurately.

The Czech schoolboys who worked so hard to make a Czech translation of the Harry Potter books would no longer have to wait so long for translations. Rather, the book could be fed through the statistical system and in under a day (minus proofreading) there could be a very nice translation into any mainstream language you wanted.

The advantage here would not be in translating Bibles into new languages, but rather translating massive amounts of books on the subject of Christianity to various languages.

Perhaps this technology would even have a use with Amazon's online digital book project in allowing all of the books on their site to be effeciently translated into other languages and marketed digitally. Interesting concept, to have all of the resources of Amazon.com in digital format for any mainstream language. That could do amazing things for the cross-country circle of ideas and thoughts.

Just my 3.14159 cents...

Respectfully,
clint :)

In SOVIET RUSSIA ... by MlBruehlly · 2003-07-28 05:16 · Score: 1

The Rosetta Stone translates YOU!

translation by BigBir3d · 2003-07-28 05:20 · Score: 1

Human style translation is nice. Machines are starting to be programmed to tackle problems like normal people do, not just like programmers do. About time.

Re:Or a "culturally superior" American. by William+Baric · 2003-07-28 05:25 · Score: 1

We went through a good ten people before we found someone willing to admit that they spoke something other than French.

And what made you think they did actualy speak anything other than French? I'm French and I can say from personal experience that the percentage of adults who can speak anything other than French is quite low. Sure we all learned two foreign languages in school but after a few years most of us don't remember anything... about the only thing I remember from my Spanish class is "me llama guillermo" and I'm not even sure this is correct (it was 19 years ago).

I find it funny (kind of) when an American think of Frenchman as arrogant because they don't speak English... I guess what you're really thinking is : "Ha-ha, I will not let these Frenchman get away with not speaking English in France!".

So my interpretation of the situation is "what's wrong with YOU"

In Soviet Russia.... by Anonymous Coward · 2003-07-28 05:26 · Score: 0

...the language translates YOU!!!

It's not the French. by raehl · 2003-07-28 05:26 · Score: 1

It's the PARISAN French.

French people were very nice/helpful everywhere else I went in France OTHER than Paris. As it turns out, the rest of France doesn't much like Parisans either. ;)

--
paintball

Re:It's not the French. by mirko · 2003-07-28 23:00 · Score: 1

Yep, but let's watch it differently :
The denser the population, the highest the probability to meet a monolingual person.

In the Parisian metro, you get easily until 5 person/square meter ;)

--
Trolling using another account since 2005.

Well.. by raehl · 2003-07-28 05:31 · Score: 1

I don't expect anyone in a non-English speaking country to speak english. (Hell, sometimes, I find that expecting even native-born Americans to speak english is a bit much.)

The amusing part is that most of the people we talked to almost certainly knew one of the languages that we did, but preferred to cop the Parisan "I'm better than you because I know more than one language!" attitude - not realizing that they were essentially pretending to be stupid, not conveying that they were 'enlightened' like they were trying to.

--
paintball

been done before by Fratz · 2003-07-28 05:32 · Score: 2, Informative

They've had the same technology at CMU's LTI for years now, called EBMT. This officially stands for Example-Based Machine Translation, but those of us who worked with it called it Extremely Bad Machine Translation because it took millions of example sentences before it started to not suck, and even then it required manual tweaking and the addition of primitive grammar rules.

So yeah, this method learns fast, but it generally learns to a useless level for anything other than a rough assessment of some of the phrases that were in the original text.

--
-- Fratz, human

Actually.. by raehl · 2003-07-28 05:35 · Score: 1

I know enough (or at least I did back then) of Germanic/Romance language roots for transportaion-related stuff to read train schedules in pretty much any (Western) European country. There was just some particular nuance (coupled with the fact we were in a bit of a rush) that made asking necessary.

Another trick is to make use of the automated ticket purchasing kiosks. They generally let you make your purchase in a few different languages. Problem there is they can't get you the full range of iteneraries (wierd connections, stops, layovers, etc) you can get at an agent; but the agent may not know a language you can speak...

So we would just buy tickets to somewhere in a language we knew, cancel the order, do it again in the local language, write down the words that corresponded to the words in the previous language, and then just present that order on paper to the agent. Worked pretty well.

--
paintball

sounds like work done at Fluent Machines by E Abir by camelcai · 2003-07-28 05:35 · Score: 1

The idea seems very similar to Eli Abir's, now commercialized at the company of

Fluent Machines.

--
jpenguin AT the google email service

It's called a compiler by IncohereD · 2003-07-28 05:36 · Score: 1

A compiler (or interpeter, for those of you into that sort of thing) takes programs written in your preferred language and translates them to machine code.

The benefit to using compilers allows you to see what language/compiler produces the most efficient code. Ideally all compilers should be able to produce the most efficient code, but they'll each have their own strengths depending on what they're designed to do.

Hindi's a problem?? by Bushcat · 2003-07-28 05:37 · Score: 0

The article mentions that Hindi's a problem because ...Hindi is written in a non-Latin script, which has numerous different digital encodings instead of one or two standard ones.... Yet cited successful translation pairs include Chinese (with Big5 and GB encodings), with a much larger character set than Hindi, and Arabic (with at least ISO 8859-6 and CP-1256), which has a smaller character set. Does this mean the text is being romanized before use? If so, this itself can be a major task. For example, Japanese has various encodings of its character sets including utf-8, shift-JIS, iso-2022-jp, euc-jp and of course unicode, with romanization systems including Hepburn and Kunreeshiki.

I'd hazard a guess that this system will also have trouble with a high-context language such as Japanese.

If all the effort is expended at the point of accumulating the parallel texts, then that's simply lots of computer time but if the text has to be massaged by the user to suit the system at translation time, then that could still be a lot of work.

The approach sounds rather like Translation Memory (as used in Trados and other systems) on a grand scale: "here's a sentence I translated earlier", as Blue Peter would say.

Can't knock it if it works, of course, which it appears to do.

You're missing the point. by Eevee · 2003-07-28 05:38 · Score: 1

None of the languages you give were designed for ease of learning. Nor are they free of a whole load of cultural baggage.(1) The concept behind Esperanto is to provide a neutral, quick learning experience for newcomers.

-------

(1)Now, Esperanto does have a decided tilt towards European languages for its base, so I'm sure there's room for improvement for inclusiveness. But at least it's a higher level of cultural bias.

Keep those scientists off the streets by bigattichouse · 2003-07-28 05:38 · Score: 1

Due to unprecedented developments in technology, A call was sent out today from DARPA and MIT jointly to all aspiring sci-fi writers and directors to get off their butts. Having now created preliminary versions of just about every StarTrek, Asimov, Clarke, etc device possible, DARPA and MIT both are running out of cool ideas and will need to revert to evil geshhhtaaaapo technologies if we can't find something cool to work on.

Joe Schmo said, "Keep a scientist off the streets, write a story. WIthout cool sci-fi to keep them up at night they'll be building super-world-destructo bombs, and other "evil genius" devices..."

--
meh

Re:Keep those scientists off the streets by BelugaParty · 2003-07-28 09:02 · Score: 1

Well, scientists haven't finished the matrix yet. So, I'll keep my pen at my side until then.

Similar project using Analogical Modeling by Anonymous Coward · 2003-07-28 05:41 · Score: 0

A linguistics grad student (D. Hatch) at Brigham Young University has been working on a similar project using Analogical Modeling for parallel text in machine translation. Dr. Royal Skousen developed Analogical Modeling as an "exemplar-based general theory of description that uses both neighbors and non-neighbors (under certain well-defined conditions of homogeneity) to predict language behavior"

The results have been quite successful so far.

Well then... by raehl · 2003-07-28 05:44 · Score: 1

You need to get my church to invest in the latest versions.

Although personally (not like I've done an exhaustive comparison) I've found that the more modern translations lose a lot from the King James version. Seems like the goal with the King James version was accurate translation, while most of the modern translations are geared towards reaching the masses.

Regardless, I wasn't Christian bashing. It was meant to be half-funny in the "Oh look, we got the wrong english" sense, and half-serious, in the "using a particular text where the 'timestamp' for the known languages used may be different by hundreds of years, and where one 'known' language's translation may be based on an older translation from another 'known', but not source, language, might not work as well as you'd hope." sense.

--
paintball

re: well then... by ed.han · 2003-07-28 06:36 · Score: 1

um...the KJV was translated from the vulgate bible, which was written in latin. this is a second-hand source, as the new testament was originally retained orally, in aramaic and a form of pidgin greek (koine greek) and was not committed to written form until perhaps a century or so after the passing of christ from the world.

i'll confess the KJV does sound good, but as far as authenticity of translation, it leaves something to be desired. you might find the att'd of interest: http://www.zondervanbibles.com/translations.htm

ed

Re:Or a "culturally superior" American. by raehl · 2003-07-28 05:49 · Score: 2, Insightful

For starters, we specifically target young people when asking questions where a non-native language will be required. 3-4 of the people were employees, indicating at least a passing knowlege of "What track is this train on?" in a few European languages might be a job-relevant talent. Additionally, the sneer. Attitude is attitude regardless of what country you're in.

We don't expect people to know foreign languages. We *DO* find it amusing when people who are razzing *US* for not knowing THEIR language do not know any foreign languages.

--
paintball

A plan for translation? by Jeremi · 2003-07-28 05:50 · Score: 3, Insightful

Actually this system reminds me a lot of the good old Bayesian Spam detector algorithms... but instead of trying to determine what category of content an email contains, the statistical classifier is trying to determine (e.g.) what English phrase a Russian phrase most closely matches.

Given the impressive progress made by Bayesian algorithms in spam detection, I wouldn't be surprised to see impressive results from this method either.

So bravo for Franz Och! He's taken what appeared to be an intractible problem requiring magic AI to solve, and perhaps found a way to solve it effectively using the stupid brute force methods computers are so good at.

--

I don't care if it's 90,000 hectares. That lake was not my doing.

Re:A plan for translation? by durian · 2003-07-28 06:12 · Score: 1

Hey, I wrote a system like that in 1993, 1994, but because of lack of data it was never very good. Bravo to him for making a better system though...

-peter
Re:A plan for translation? by HiThere · 2003-07-31 06:48 · Score: 1

Can you prove that something like this *isn't* the basis for intelligence? Consider that all effective intelligence might be a layered system that does thinks like this. (It would clearly need to be a layered system, but perhaps seven layers would be enough. Part of the problem is automatic chunk detection, and each layer would need to do that in ways analogous to the preceeding layer, but the lower layers would use the upper layers as their inputs...)

--

I think we've pushed this "anyone can grow up to be president" thing too far.

Needs Improvement by corgicorgi · 2003-07-28 05:51 · Score: 1

This program can be made better if it first builds the grammer and dictionary as a foundation. Then, it gets fed with the parallel text data to build the statistical references between the two languages.

It is the combination of definition and context that makes a translation more accurate. In fact, that's how humans learn. We first learn to reference words/sentences to what we see (like parallel data). Then, we also learn to understand grammer and word definitions. We wouldn't learn by just one way without the other.

On another note, I think this will be great step in achieving speech AI. Just as this program translate by making statistical parallels between language, it can go further and make statistical parallels responses to a sentence. Reason why I think this program is better is because it goes beyond plain definition, the program has to "understand" the input text by searching thru its database, its "knowledge". It's still far fetch, but i think this is one step further towards AI.

Re:Units of Measure, -1 Funny by teamhasnoi · 2003-07-28 05:51 · Score: 1

Ouch. Don't need a translator for that.

Another neat application for this technique. by attaboy · 2003-07-28 05:58 · Score: 2, Funny

1: Create a set of "Rosetta Stone" data by taking thousands of recorded phone calls to customer service/operators, etc.
2: For each call, track what the customer service rep/operator typed into their computer terminal.

The result would be natural language voice-recognition that would probably achieve a high degree of accuracy because it would be limited in scope (e.g. asking for a credit line increase, reporting a lost card, checking your balance, etc.) and be based on real queries from real customers.

Since the biggest majority of calls are for very simple problems (I forgot my password is the most common tech support call we get) this should be pretty useful.. you could probably automate "Level 1 Tech support"!

--
The facts have a liberal bias. --The Daily Show

Women. by Grendel+Drago · 2003-07-28 06:00 · Score: 1

Ah, so the DoD must be secretly staffed by slash/shonen ai-loving women. I get it now.

--grendel drago

--
Laws do not persuade just because they threaten. --Seneca

Porn sites using Slovio by yanestra · 2003-07-28 06:06 · Score: 1

Some people have decided to use Slovio (a language) on their porn sites. Their idea is based on the fact there are 400 million potential customers of which only a very small part is speaking English.

How's that news? by Yurka · 2003-07-28 06:14 · Score: 3, Interesting

This has already been done some years ago in Canada, where the translation system was fed the complete text of parliamentary debates for umpteen years (required by law to be translated by humans into French, if originally in English, and vice versa). I don't know how it fares when presented with a sample of parliament-speak (I concede, this is not a fair approximation of human language), but it fails miserably on a simple rhyme. Read your Hofstadter, guys.

--
I can assure you, the best way to get rid of dragons is to have one of your own.

TROLL?!!?!? by gerf · 2003-07-28 06:15 · Score: -1, Troll

the parent to your post (my post) got trolled?! my god, what kind of yuppie faggot loves harry pooper that much?! YES i blow karma, but fuck that. YOU ARE A BUNCH OF BLEEDINGASS FAGGOTS, YOU FUCKING MODS!

With great power... by Anonymous Coward · 2003-07-28 06:17 · Score: 0

Am I the only one who's glad Doc Och is working in the field of language and not robotics?

It may be Spanish Slashdot... by Anonymous Coward · 2003-07-28 06:17 · Score: 0

... but the comments are all the same.

(Well, the subject line is anyway)

DARPA is paid to find ways to kill people. by Anonymous Coward · 2003-07-28 06:18 · Score: 0

DARPA is paid to find ways to kill people and destroy property, but the employees sometimes become distracted and do something good for the world.

Re:Hindi's a problem?? -- maybe not by koi88 · 2003-07-28 06:24 · Score: 1

"I'd hazard a guess that this system will also have trouble with a high-context language such as Japanese."

I think that's exactly what his system is about, it also analyzes context, so it works similiar to the human thinking. It takes the surrounding words into consideration and checks given translation.

E.g., wrong translations for the German word "Bank" which can mean both "bench" or "bank" are less likely, as a context like "sit, park..." would favor the translation "bench" while a context with "money, stock..." would lead the system to assume the meaning "bank" is more likely.

This way, it should be far superior to word-to-word based translation systems.
I hope this is true, but that's how he explained it to me about 7 years ago when he started the project :-)

--

I don't need a signature.

obligatory grognard post? by ed.han · 2003-07-28 06:29 · Score: 1

goodness, i certainly didn't think that when i first logged into /. today that i would be seeing old english...

it's astonishing just how germanic it looks to me.

ed

Sounds like A million monkey to me by EvilTwinSkippy · 2003-07-28 06:32 · Score: 1

Unfortunately a machine based on this principle will never "understand." It will, at best, manipulate tokens. It will only understand the rules of language, not the rules of reality.

Here are some things that language allows, but reality doesn't:

Jake's late uncle bought a new car
Molly pushed her car with a short peice of rope
I would like the wall painted in a blacker shade of white
The can of tuna opened Fred

IANAL (linguits), but you also run into peculiarties of language where on language lacks a concept. English does not impart gender in objects. Ancient Chinese is largely written in the present tense. Some african languages have no concept of "Should". In order to translate information you need to add, remove, or complete rephrase certain ideas.

Finally, you run into the problem of ambiguous concepts. There is no german word for "luck" or "happy". They are combined in one word "glucklich". The chinese use one character to represent both danger and oppertunity. To know which you are speaking of requires context, and once you start adding state to a statistical model it starts to become a differential equation.

A far better approach would be a digital "Esperanto". Linguist would design a universal language, and then design a filter to translate each language (and all its quirks) into the universal. Each language would also need a filter to translate FROM the universal. Even then, you still would have stuff that just plain doesn't translate.

For giggles, try picking up a copy of Sun Tsu's Art of War, or Lao Tsu's Tao Te Ching. Better yet, pick up 2 different translations. In order to make any sense out of it, you have to constantly read between the lines.

Computer are notoriously BAD at reading between the lines.

--
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming

Re:Sounds like A million monkey to me by HalfFlat · 2003-07-28 09:11 · Score: 1

It does pose the question: what constitutes understanding? Is there any externally observable difference between a particularly good token manipulator and a translator who understands the material?

It could be that understanding is having a suitably sophisticated token processing system (plus sufficient data), or at least be isomorphic to it.
Re:Sounds like A million monkey to me by EvilTwinSkippy · 2003-07-28 15:58 · Score: 1

It could be that understanding is having a suitably sophisticated token processing system (plus sufficient data), or at least be isomorphic to it.
I would argue it is not. Understanding is the ability to cheat and improvise new rules when the existing ones don't fit.
To borrow from Taoism again:
... Therefore when Tao is lost, there is goodness. When goodness is lost, there is kindness. When kindness is lost, there is justice. When justice is lost, there is ritual. Now ritual is the husk of faith and loyalty, the beginning of confusion. Knowledge of the future is only a flowery trapping of Tao. It is the beginning of folly. Therefore the truly great man dwells on what is real and not what is on the surface, On the fruit and not the flower. Therefore accept the one and reject the other. -Lao Tzu, Tao Te Ching, Chapter 38

In other words, simply passing tokens around is WAY at the bottom of the thought process. At least in Eastern Thought.

--
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
Re:Sounds like A million monkey to me by HalfFlat · 2003-07-28 17:35 · Score: 1

One can make a distinction though between rules that we consciously apply, and rules which may govern our consciousness.

To misappropriate the opening phrase of the dao de jing, "the path which can be followed is not the eternal path; the name which can be named is not the eternal name". Interpreting this in the light of a symbolic computation model of the mind, one could say that there are rules which govern our nature, but they are not rules that are consciously followable, and that any attempt to codify them at a conscious level will by nature be inadequate.

Which is interesting given the topic: the translation software that works so well seems to be the one which does not rely upon explicit rules. Instead the rules come unbidden through statistical modelling. Indeed, how much difference is there between "yes" and "no" (cf. verse 20)? We can't explicitly say, but it seems that this translator may do a better job without such explicit rules.

I've heard of information theoretical and epistemological interpretations of the dao de jing. That one can do so probably points to one of the reasons why it is such a classic, and has survived so.

Have to say also though that there are a whole bunch of extremely free translations out there too :)
Re:Sounds like A million monkey to me by Wirr · 2003-07-28 19:46 · Score: 1

There is no german word for "luck" or "happy". They are combined in one word "glucklich".

This ist not really true.
luck is "Glück" in German. And happy is 'Froh'. The ending '-lich' is for making the nouns to adjectivs/adverbs.
They're just not used as in English. In some cases they're interchangable in others not.
Re:Sounds like A million monkey to me by EvilTwinSkippy · 2003-07-29 01:31 · Score: 1

At which point we arrive back at the original question. At what point does a computer progress from simply following rules to walking the path?
I am hesitant to say simply having a large enough corpus of data is enough. If you note the great Lao Tsu emphasizes that words get in the way of understanding. And yet here we are designing a machine that can ONLY relate in words! Can you picture what these phrases are going to do to your statiscal model:
All in the world recognize the beautiful as beautiful. Herein lies ugliness. All recognize the good as good. Herein lies evil. Therefore Being and non-being produce each other. Difficulty and ease bring about each other. Long and short delimit each other. High and low rest on each other. Sound and voice harmonize each other. Front and back follow each other. Therefore the sage abides in the condition of wu-wei. And carries out the wordless teaching. Here, the myriad things are made, yet not separated. Therefore the sage produces without possessing, Acts without expectations And accomplishes without abiding in her accomplishments. It is precisely because she does not abide in them That they never leave her. -Lao Tsu, <i>Tao Te Ching</i> Chapter 2

Every line is a self-contradiction. A computer trying to weight one word against the other would simply decide from the grammer that the two concepts are comparable. But they are not comparable, that is the point. The great Lao is trying to make the reader think non-verbally.
Now if you can get a statistical model to chew through that AND still produce meaningful^H^H^H^H^H fluent results, I will be impressed. Of course the system at that point would probably be demanding equal rights and hitting the talk-show circuit.

--
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming

mod parent up by Anonymous Coward · 2003-07-28 06:36 · Score: 0

I was going to post this but this nails it, so now I don't have to.

Urban legend by Arjen · 2003-07-28 06:49 · Score: 1

I hate to repeat myself, but this is an urban legend. According to MACHINE TRANSLATION: An Introductory Guide:

The `spirit is willing' story is amusing, and it really is a pity that it is not true. However, like most MT `howlers' it is a fabrication. In fact, for the most part, they were in circulation long before any MT system could have produced them (variants of the `spirit is willing' example can be found in the American press as early as 1956, but sadly, there does not seem to have been an MT system in America which could translate from English into Russian until much more recently --- for sound strategic reasons, work in the USA had concentrated on the translation of Russian into English, not the other way round). Of course, there are real MT howlers. Two of the nicest are the translation of French avocat (`advocate', `lawyer' or `barrister') as avocado, and the translation of Les soldats sont dans le café as The soldiers are in the coffee. However, they are not as easy to find as the reader might think, and they certainly do not show that MT is useless.

BTW, since this book is no longer available in the stores, the whole contents is placed online, though the server appears down right now. I recommend reading this book to anyone who is interested into the subject of MT. It really is a nice introduction into the subject.

Re:Urban legend by zptdooda · 2003-07-28 07:08 · Score: 1

I'd say if you're repeating yourself with a period of 4 years, you're doing a whole lot better than I'm averaging with my little kids.

Thanks for the correction. I'll try not to use the quote again, at least not until 2007.

--
Esteem isn't a zero sum game
Re:Urban legend by HiThere · 2003-07-31 07:01 · Score: 1

I did hear about it in the 1950's, but I heard about it as an experimental machine translation.

Remember, the early programmers didn't all do only practical stuff. The "perfect checker" player came out of this same period (Semelweiss?). So, yes, they tried machine translation long before it was practical. And not all they did was well documented. But this doesn't mean it didn't happen. Now as to whether it happened at MIT...I don't remember after this long where it was attributed to. MIT did a lot, but certainly not everything. (I can accept that the author of Machine Translation doesn't believe it. And since I was quite young then I don't remember much about reading about it. But the article that presented it presented it as fact...and an example of how things could be more complicated than a naive approach would assume.)

(Also remember that Eliza did fool a person into getting quite angry at the idiot on the other end of the teletype. So Eliza was written, but nobody would consider it a reasonable attempt at AI. [I understand it was an example of how NOT to do AI.])

--

I think we've pushed this "anyone can grow up to be president" thing too far.

Re:Article text (in Babel-German) by Anonymous Coward · 2003-07-28 07:04 · Score: 0

Romancing der Rosetta Stein

' geben Sie mir genügende parallelen Daten, und Sie können ein Übersetzung System in den Stunden haben'

Universität des südlichen Kalifornien Informatikers Franz Josef, den Och ein vom berühmtesten widerhallte, rühmt sich in der Geschichte der Technik, nachdem seine Software stark unter 23 die arabische und Chinesisch-zu-Englischen translatio Systeme zählte, kommerziell und experimentell, geprüft innen in vor kurzem gefolgertem Handelsministerium Versuche.

"geben Sie mir einen Platz zum Standplatz an, und ich verschiebe die Welt,", nachdem dem Zur Verfügung stellen einer mathematischen Erklärung für den Hebel sagte den großen griechischen Wissenschaftler Archimedes.

"geben Sie mir genügende parallelen Daten, und Sie können ein Übersetzung System für alle mögliche zwei Sprachen in einer Angelegenheit von Stunden haben,", sagte Dr. Och, ein Informatiker in der USC Schule des Informationswissenschaft-Instituts der Technik.

Och sprach nach den Prüfstandversuchen 2003 für die maschinelle Übersetzung, die im Mai und Juni dieses Jahres durch das National Institute of Standards and Technology der VEREINIGTE STAATEN Handel-Abteilung durchgeführt wurde.

Übersetzungen Ochs prüften gut in den 2003 head-to-head Tests gegen 7 arabische Systeme (5 Forschung und 2 Kommerziell-weg-dregal Produkte) und 14 chinesische Systeme (9 Forschung und 5 ab Lager). Im vorhergehenden 2002 Auswertungen hatten sie ähnlich überlegenes geprüft.

Der Forscher besprach seine Methoden an einem NIST Postmortemseminar über das Benchmarking gehalten Juli 22-23 Johns Hopkins an der Universität in Baltimore, Maryland.

Och ist ein Herausragend-Exponent einer neueren Methode des Verwendens der Computer, um eine Sprache in andere zu übersetzen, die in den letzten Jahren erfolgreicher geworden ist, während die Fähigkeit der Computer, große Körper der Informationen anzufassen gewachsen ist, und das Volumen des Textes und der zusammengebrachten Übersetzungen in der digitalen Form hat, auf (zum Beispiel) mehrsprachigen Zeitung oder Regierung Netzaufstellungsorten explodiert.

Methode Ochs benutzt zusammengebrachte zweisprachige Texte, die Computer-kodierten Äquivalente der berühmten Rosetta Steinbeschreibungen. Oder eher Gigabytes und Gigabytes Rosetta Steine.

"unsere Annäherung benutzt statistische Modelle, um die wahrscheinlichste Übersetzung für einen gegebenen Eingang zu finden," Och erklärt

"sie ist zu den älteren, symbolischen Annäherungen zur maschinellen Übersetzung ziemlich unterschiedlich, die in den meisten bestehenden kommerziellen Systemen verwendet wird, die versuchen, die Grammatik und das Lexikon einer Fremdsprache in einem Computerprogramm zu kodieren, das die grammatische Struktur des fremden Textes analysiert, und produziert dann Englisch, das auf harten Richtlinien," er basiert, fortfuhr.

"anstelle, vom Computer erklärend, wie man, wir ließ ihn selbst ihn heraus darstellen übersetzt. Zuerst ziehen wir dem System es mit einem parallelen Korpus d.h. eine Ansammlung Texte in der Fremdsprache und ihre Übersetzungen ins Englische ein.

"der Computer verwendet diese Informationen, um die Parameter eines statistischen Modells des Übersetzung Prozesses abzustimmen. Während der Übersetzung des neuen Textes, versucht das System, den englischen Satz zu finden, der die wahrscheinlichste Übersetzung des fremden Eingang Satzes ist, basiert auf diesen statistischen Modellen."

Diese Methode ignoriert oder rollt eher über, finden ausdrückliche grammatische Richtlinien und sogar traditionelle Wörterbuchlisten des Wortschatzes zugunsten des Lassens des Computers selbst matchup Muster zwischen einer gegebenen chinesischen oder ara

yeah, yeah... by ed.han · 2003-07-28 07:05 · Score: 1

wish i could edit...

i didn't wanna lump it in w/ the other stuff in the list: it's sorta in a different class by itself, IMHO.

ed

Och--and a few dozen other research groups by 73939133 · 2003-07-28 07:08 · Score: 1

Neither the approach nor the results are particularly unique: statistical models are all the rage in natural language processing--and for good reasons. There are probably a few dozen research groups working on these kinds of systems.

But such systems also have well-understood limitations. Translation often does require a deeper understanding of the subject matter, and you simply cannot get that from aligning two large corpora.

Mod on crack alert... by Anonymous Coward · 2003-07-28 07:18 · Score: 0

Like, how is this offtopic? We're talking about languages and linguistics and how better computer programs can be built for translating. Someone made a comment about Dutch and about needing to know Dutch to read a few websites. MsGeek just made the comment that she could get by in Dutch simply because of Dutch's similarities to English. How the fuck is that offtopic?

A language is merely a dialect with an army by Anonymous Coward · 2003-07-28 07:19 · Score: 0

Written language and spoken lang are different beings. Written lang is usu standardized & does not nec reflect dialect.

AFAICT (having RTFA,) the software works only on parallel texts, not parallel recordings.

- anonymous linguist

Anyone notice the languages mentioned? by tarkovsky2002 · 2003-07-28 07:30 · Score: 1

Arabic, Chinese, and Hindi? It is pretty easy to imagine why DARPA would be interested in the first two with the War on Terrorism and China being a pseudo-enemy. But Hindi? Conspiracy theorists please comment...

How to speak to a Parisian by Jhan · 2003-07-28 07:43 · Score: 1

You don't get the situation. French feel enormously threatened by the Americans, culturally speaking. Maybe the feeling is even founded on something...

If you approach them with English, or even saying "Do you speak English?" in broken French, they would rather watch you being eaten live by sharks then understand your cries for "Help!".

What you do is, ask the French person what-ever you need to know in your best French. Probably he will not even understand what the hell you're trying to say. Speak to him (in French) again. And again. Bring out your phrase book and shuffle the pages, speaking uselessly at the french guy.

Maybe he WILL understand parts of it. This will just cause him to unleash torrents of speedy French in your direction, which you can't understand. Look stupid. Stupidity is key.

Soon, it will be obvious that you don't know shit about the french language. NOW, and ONLY NOW, ask in your very brokenest French (no, NOT English, you will ruin everything you just accomplished!) if maybe he knows just a little English.

Of course he does. After all, all the kids study English in school, and have so for many years. This way he can feel superior about helping the daft tourist idiot.

NB. I am not american, I'm swedish. I had the exact same problem in Paris until I developed this method.

--

I choose to remain celibate, like my father and his father before him.

Re:How to speak to a Parisian by Darth · 2003-07-28 10:40 · Score: 1

and if you dont want to demean yourself to facilitate someone else's insecurity problem you can do this:

ask him in french whatever you need to know (if possible).
then ask him if he speaks english (or your native language).
If he says no, say something horribly insulting about his mother, him, france, the food, etc.
when he gets really pissed at you, tell him you knew he spoke english, he's an asshole for pretending not to, and punch him in the face.

seriously though, it obviously isnt a fear of Americans if they are unwilling to speak German, or Spanish. It especially isnt about America when they refuse to help a Swedish person.

--
Darth --
Nil Mortifi, Sine Lucre

GREAT! We'll need this when SETI locates a signal by Anonymous Coward · 2003-07-28 07:50 · Score: 0

And all those cpu cycles will soon be paid off.

esperanto by diesel_jackass · 2003-07-28 07:50 · Score: 1

can't we all just speak Esperanto?

(i suppose i should've written this message in esperanto to illustrate my point, oh well)

--
THERE IS NO DATA. THERE IS O

Re:esperanto by Anonymous Coward · 2003-07-28 08:55 · Score: 0

nice sig, it had me fooled for a few seconds.
Re:esperanto by diesel_jackass · 2003-07-28 09:15 · Score: 1

muhahahahahahhahaaa

;-)

--
THERE IS NO DATA. THERE IS O

I REPEAT by gerf · 2003-07-28 07:53 · Score: -1, Flamebait

the parent to your post (my post) got trolled?! my god, what kind of yuppie faggot loves harry pooper that much?! YES i blow karma, but fuck that. YOU ARE A BUNCH OF BLEEDINGASS FAGGOTS, YOU FUCKING MODS!

it's mods like you who keep the GNAA and other real trolls posting, and gaining more people every day. You lead a sad sad life mr. mod.

Free of cultural baggage... by Rorgg · 2003-07-28 08:05 · Score: 1

and free of culture. The problem with a construct language is that people don't WANT to learn it -- there's no inherent literature, film, history, etc. that becomes available to you by doing so.

Yeah I know there was an Esperanto movie. Exactly my point of free from culture.

Doesn't all languages translate to english... by Anonymous Coward · 2003-07-28 08:06 · Score: 0

You just have to
S P E A K
L O U D
A N D
S L O W L Y. . .

Article text (in Babel-German-back-to-English) by Wraithlyn · 2003-07-28 08:17 · Score: 4, Funny

I just had to. Besides, I think it's proving a point, or something.

--

Romancing of the Rosetta stone

' you give me sufficient parallel data, and you can have translation a system in the hours '

University southern California of the computer scientist Franz Josef, which Och of most famous against-resounded, praises itself in the history of the technology, after its software counted the Arab strongly under 23 and Chinese English translatio systems, commercially and experimentally, examined inside in recently concluded Ministry of Trade of attempts.

"you indicate a place to me to the location, and I shift the world,", after to to order a mathematical explanation for the lever said the large Greek scientist Archimedes place.

"you give me sufficient parallel data, and you can have translation a system for all possible two languages in an affair of hours,", said Dr. Och, a computer scientist in the USC school of the institute for information science of the technology.

Och spoke after the benchmark tests 2003 for the machine translation, which was accomplished in the May and June of this yearly by the National Institute of Standards and Technology United States of the trade department.

Translations Ochs examined well into the 2003 head ton head tests against 7 Arab systems (5 research and 2 commercial away dregal products) and 14 Chinese systems (9 research and 5 from stock). In preceding 2002 evaluations had examined it similarly superior.

The researcher discussed his methods held at a NIST Postmortemseminar over the Benchmarking July 22-23 of John Hopkins at the university in Baltimore, Maryland.

Och is an outstanding exponent of a newer method of using the computers to touch in order to translate a language into other one, which became more successful in the last years, while the ability of the computers grew, large bodies of the information, and the volume of the text and the brought together translations in the digital form has, on (for example) multilingual newspaper or government net places of assembly explodes.

Method Ochs uses brought together bilingual texts, the computer-coded equivalents of the famous Rosetta descriptions of stone. Or rather gigabytes and gigabyte Rosetta of stones.

"our approximation uses statistic models, in order to find the most probable translation for a given entrance," Och avowedly

"it is rather different to the older, symbolic approximations for the machine translation, which in most existing the commercial systems is used, which try, to code the grammar and the encyclopedia of a foreign language in a computer program the grammatical structure of the strange text analyzed, and produced then English, which on hard guidelines," it is based, continued.

"employs, explaining from the computer, how one, we left it it out explains translated. First we draw the system it with a parallel korpus i.e. an accumulation of texts in the foreign language and their translations into English.

"the computer uses these information, in order to co-ordinate the parameters of a statistic model translation of the process. During the translation of the new text, the system tries to find English sentence which is the most probable translation strange entrance of the sentence, be based in these statistic models."

This method ignores or rolls over rather, finds express grammatical guidelines and even traditional dictionary lists of the vocabulary in favor of leaving the computer matchup samples between given Chinese or Arab (or any another language) texts and English translations.

Such abilities grew, while computers improved, by making possible for them, from using the individual words as the fundamental unit on using the groups of words to move -- cliches.

Versions of the different human translators of the same text change frequently considerably. Another key improvement was the use of repeated English human translations to permit the computer too its transmission by an ana

--
"Mind, as manifested by the capacity to make choices, is to some extent present in every electron." -Freeman Dyson

Re:Article text (in Babel-German-back-to-English) by fehlschlag · 2003-07-28 10:09 · Score: 2, Funny

Wow, that reads very similar to a lot of the /. posts I see... but with better spelling.

Ouch, stop throwing things at me!
Re:Article text (in Babel-German-back-to-English) by ScrewMaster · 2003-07-28 13:00 · Score: 1

Better watch out for that large Greek scientist.

--
The higher the technology, the sharper that two-edged sword.
Re:Article text (in Babel-German-back-to-English) by Dread_ed · 2003-07-29 04:25 · Score: 1

This reminds me of the first time I sat with a world known Greek and Hebrew scholar and heard him translate Bible verses from some of the oldest known manuscripts.

The King James Version in my hand suddenly became almost worthless.

--
When the only tool you have is a claw hammer every problem starts to look like the back of someone's skull.

English is not a language by Anonymous Coward · 2003-07-28 08:21 · Score: 0

Are you barking mad, stupid or just trolling?

English enables communication between people based on a shared code. It is therefore a language. Do you understand what I'm saying? If you do, you've proved that you are wrong.

It may be a syncretic, mongel, illogical one, admiting degrees of mastery that few reach, but that describes most human languages to some degree.

Re:Or a "culturally superior" American. by William+Baric · 2003-07-28 08:48 · Score: 1

3-4 of the people were employees

I think it's a mistake to judge all Frenchmen based on your experience with government's employees. Particularly the one from SNCF! Everyone will tell you they are the worst!
Ok... Jokes aside, most of them are good people but we always remember the arrogant bastard. Also the truth is a lot of people blame them for whatever reason and they tend to be very defensive.

Additionally, the sneer. Attitude is attitude regardless of what country you're in.

Attitude is closely related to culture, so you must never base your interpretation of an attitude on your culture. For example if you met me one day, you would probably think I'm a cold person : I won't greet you with a big smile and certainly won't invite you for dinner with a tap in the back. But it doesn't mean I despise you! It only means I don't want to impose myself. For me, the guy who greets me with a great smile, shake my hand and invite me to dinner the first time we meet is a self-centered asshole.

We *DO* find it amusing when people who are razzing *US* for not knowing THEIR language do not know any foreign languages.

Well, I'm sure you know that most people in France never use soap... Same kind of thing. It's always fun to bitch against someone different so we can feel good about ourselves. Don't take this to seriously.

Similar to natural learning? by Bodrius · 2003-07-28 08:49 · Score: 2, Interesting

Interesting method.

It seems to me this is more similar to natural learning of a language (usually at a young age) by exposure and immersion, as opposed to scholar learning of a language in classrooms, etcetera.

It shouldn't be surprising that in humans, the first method also works best at acquiring fluency in multiple languages. As a matter of fact, it's the only method through which we come to understand our FIRST language, which is in almost every case the one we command the best.

I think most people get, by consuming huge amounts of information, a feeling of "what sounds right" and "what sounds wrong" that is more effective for them at predicting the unwritten rules and exceptions, both in translations and in original sentence-creation, than memorizing a set of grammar rules which, in the end, are just codifications of the current state of the language.

I don't think the success of the approach means the symbolic methods are pointless for this endeavor, any more than the formal study of languages and their grammars is for human translators.

Professional writers and translators do study such rules to dramatically improve their command of the different languages, and do get much better results.

But it seems to me they are more successful going from "statistical matching with massive real-use data" to "optimized grammar rules matching the data" than going backward, from "scholastic grammar rules" to "consumption of massive data to acquire exceptions, and correct and complement the rules".

What would be interesting, I think, is if one can study the state of the system after it's performing well and extract/deduct grammar rules, algorithmically.

It would be interesting to see the results of a program doing that, collecting (and correcting) the grammar using the data, and using the grammar rules when no match in the dictionaries is found to, say, apply a greater weight to the gramatically-correct choice among the alternatives.

If the results were good with this approach, one could consider decreasing the size of the database as the grammar gains stability. Use that memory for other processes, other languages, or new sample data that could not be examined before.

--
Freedom is the freedom to say 2+2=4, everything else follows...

Actually, it operates on a *shallower* level... by Jerf · 2003-07-28 08:53 · Score: 3, Informative

This software would operate on a deeper level than it would if it operated with the words and symbols themselves. It would utilize a map of the deep structures of language, instead of a map of the less-meaningful words and grammars.

Actually, as a result, it operates on a shallower level. In fact, it's almost like you wrote this comment for an article in a parallel universe where statistical translation was the norm, and somebody was just now proposing symbolic translation, so much so that it's almost spooky.

This translation technique is so shallow it doesn't even particularly care what languages it works with. In a way, it can't really be said to be "translating" in the traditional sense; it's just correlating phrases with no clue what they are.

Traditional symbolic translation is better described by what you said:

Therefore, instead of playing with messy grammars and sentence structures, we can simply have a catalogue of thoughts as represented by words, and correlate that catalogue with a different set of words to facilitate translation.

Word(/phrase) -> symbol -> word(/phrase) is traditional tranlation. This is word -> word translation.

It's working better because we've had little or no success creating the middle part of the symbolic translation; matching the symbology used in our head has proven impossible to date. This works better by skipping that step, which introduces horrible distortions by forcing the words to fit into an incredibly poor symbology (compared to what we're actually using).

However, in theory, traditional translation should still have a brighter future; this is a hack around our ignorance, perhaps even a good one, but eventually we will want to extract the symbols.

(Incidentally, it's also why this same technique can't be used to match words -> symbols; we don't know how to represent the symbols yet! This kind of technique could eventually potentially be hybridized with something else to attack that problem, but simple, direct application can't result in the complicated relationships between symbols that exist, and we'd want a computer to "understand" those relations before we'd say it was truly translating or understanding English.)

Anyways, just flip your comments around 180 degrees and you're pretty close.

El DEB est� muriendo by cpeterso · 2003-07-28 09:46 · Score: 0, Offtopic

Es oficial; Netcraft ahora confirma: * El DEB está muriendo un bombshell que lisia ma's golpeó * comunidad del DEB cuando IDC confirmó que * la cuota de mercado ya cercada del DEB ha caído con todo otra vez, ahora abajo menos que una fracción de 1 por ciento de todos los servidores. Viniendo en los talones de una encuesta sobre reciente Netcraft que indica llano que * el DEB ha perdido más cuota de mercado, servicios de estas noticias reforzar a lo largo de lo que hemos sabido todos. * El DEB se está derrumbando en desorden completo, según lo apropiado ejemplificado fallando absolutamente pasado [ samag.com ] en la prueba comprensiva reciente del establecimiento de una red del sistema Admin. Usted no necesita ser un Kreskin [ amazingkreskin.com ] para predecir * el futuro del DEB. La escritura de la mano está en la pared: * El DEB hace frente a un futuro triste. En hecho no habrá ningún futuro en todos para * DEB porque * el DEB está muriendo. Las cosas están pareciendo muy malas para * el DEB. Tanto de nosotros esté ya enterado, * el DEB continúa perdiendo la cuota de mercado. La tinta roja fluye como un río de la sangre. FreeBSD es puesto en peligro más de ellos todos, perdiendo el 93% de sus reveladores de la base. Las salidas repentinas y desagradables de los reveladores largos Jordania Hubbard y Mike Smith de FreeBSD del tiempo sirven solamente para subrayar el punto más claramente. Allí conserve sea no más de largo cualquier duda: FreeBSD está muriendo. Guardemos a los hechos y miremos los números. El líder Theo de OpenBSD indica que hay 7000 usuarios de OpenBSD. Cuántos usuarios de NetBSD hay? Veamos. El número de OpenBSD contra los postes de NetBSD en USENET está áspero en el cociente de 5 a 1. Por lo tanto hay los cerca de usuarios 7000/5 = 1400 de NetBSD. Los postes de BSD/OS en USENET están sobre la mitad del volumen de los postes de NetBSD. Por lo tanto hay cerca de 700 usuarios de BSD/OS. Un artículo reciente puso FreeBSD en cerca de 80 por ciento * del mercado del DEB. Por lo tanto hay (7000+1400+700)*4 = 36400 usuarios de FreeBSD. Esto es constante con el número de los postes del USENET de FreeBSD. debido a los apuros del cala de la nuez, las ventas abismales etcétera, FreeBSD salió de negocio y fue asumido el control por BSDI que venden otro OS preocupado. BSDI es también muerto ahora, su cadáver turned.over a otra casa de charnel. Todos los exámenes importantes demuestran que * el DEB ha declinado constantemente en cuota de mercado. * El DEB es enfermo y sus perspectivas a largo plazo de la supervivencia son muy déviles. Si * el DEB es sobrevivir en todos lo que estará entre dabblers del dilettante del OS. * El DEB continúa decayéndose. Nada de un milagro podía ahorrarlo brevemente a este punto en tiempo. Para todos los propósitos prácticos, * el DEB es muerto. Hecho: * El DEB está muriendo

--
cpeterso

Re:statistics is the key, twice thru Babelfish by ballpoint · 2003-07-28 09:55 · Score: 1

The hour infers inflamatory for this. This statistical method will beat besides the grammar and this rule basis, at least will be England, will be favors this simple reason:

England is not the language

or something, similar one but it compared to it are, the IMO. Is idiomaticas expression was not more great collects its absolutely rapid change (and not only by familiar form, right glance does movement this politics corrects fraseologia). You compared to know to historical... are more exceptions is the rule which legitimately thinks, the matter language wise person is considered incorrectly in any event, with vice versa, and so on.

One does not have not to think its advantage; Its relatively easy academic society correspondence foundation because of it by conjugate weak, to be had article genderless, quite simple uncased official lecture structure. But, he are strangely dominate and I suspect the most localities are partial I are not the real master (do not mention this orthographical nightmare; The only language and the spelling bee's competitor is English)

This reason is frank new lingua (or it must now be lingua angla) he always is the techno partner politics likely case. Stops harping uses the American is to big scale scope monolingue. " why does Romans do not have the academic society place language when they controlled Europe? Because they do not have. " if each condition speaks one different language, that with Europe related, there then is necessary.

--
Flourescent (adj): smelling like ground wheat.

Never by Pac · 2003-07-28 10:43 · Score: 1

Correct spelling in Slashdot? You gotta be kidding, they would revoke by club membership.

(I hope it is obvious I just thought it was too small a comment for me to need preview - the most common fatal error. But I haven't started anything with "end". What do you think the two "..." mean?)

English-to-Russian Babelfish Hack by JeanPaulBob · 2003-07-28 11:53 · Score: 1

I wanted to use Babelfish to do a English-Russian-English translation on that phrase. Imagine my shock when I discover that while there's a "Russian to English" option, there's no "English to Russian" option.

"That can't be right," I thought. So, I took the code from the "Add Babel Fish Translation to your site" link, created a web page, pasted it, and added the following option to the list:

<option value=en_ru>English to Russian

Wonder of wonders, it worked! I have no idea why Babelfish isn't displaying the E-to-R option, but this is a functional workaround.

In case you were curious, the results were disappointingly mundane. "Spirit is willingly ready but flesh it is weak."

Here's the complete HTML document.

<html>
<head></head>
<body>
<FORM ACTION=http://jump.altavista.com/searchbox4.go name=mfrm>
<input type=hidden name=doit value=done>
<table width=200 border=0 cellspacing=0 cellpadding=6 bgcolor=#93b2dd><tr>
<th colspan=2 bgcolor=#FFFFFF><a href=http://www.altavista.com>
<img src=http://a12.g.akamai.net/7/12/282/13/av.com/sta tic/i/af/box_logo.gif border=0 width=118 height=45></a><br>
</th></tr>
<tr><td colspan=2><img src=http://a12.g.akamai.net/7/12/282/13/av.com/sta tic/i/bf/Bfishheading.gif width=192 height=20><br><font size=2 face=arial,helvetica,sans-serif color=#FFFFFF>
<small>Type or Paste text or Web address<br> (beginning with http://) here:<br> </small>
<textarea cols=20 rows=2 name=urltext></textarea>
</td></tr>
<tr><td colspan=2><font face=verdana,arial,helvetica,sans-serif size=2 color=#FFFFFF><small>Translate from:<br></small></font>
<select name=lp>
<option value=en_zh>English to Chinese
<option value=en_fr>English to French
<option value=en_de>English to German
<option value=en_it>English to Italian
<option value=en_ja>English to Japanese
<option value=en_ko>English to Korean
<option value=en_pt>English to Portuguese
<option value=en_es>English to Spanish
<option value=en_ru>English to Russian
<option value=zh_en>Chinese to English
<option value=fr_en>French to English
<option value=fr_de>French to German
<option value=de_en>German to English
<option value=de_fr>German to French
<option value=it_en>Italian to English
<option value=ja_en>Japanese to English
<option value=ko_en>Korean to English
<option value=pt_en>Portuguese to English
<option value=ru_en>Russian to English
<option value=es_en>Spanish to English
</select></font></td></tr><tr&gt ;
<td><input type=submit value=Translate style=font-family:sans-serif;font-weight:bold;colo r:#FFF;background-color:#990000;cursor:hand;margin -bottom:-1px;width:85px;></td>
<td><font face=arial,helvetica,sans-serif size=2 color=#FFFFFF><small>Powered by Systran</small></font></td>
</tr></form></table>
</script>
</body>
</html>

Are the French really assholes? No. by zedmelon · 2003-07-28 12:16 · Score: 1

Very well-said, raehl. Based on my own trip to France in February 2002, I have a some things to add...

1. Do the French hate Americans?I was told in advance by several people that the French look down upon everyone else--especially Americans--with a certain air of distaste. For the most part, I didn't find this to be the case. As long as you remember you're not at home (America, in my case)--that you're a guest--and to behave accordingly, there's no problem. I saw a few people staring as if they didn't want us there, but not many.

I'm a pretty gregarious guy, easy going, laid back. For example, saying "Bonjour" with a friendly smile before asking for help came naturally to me. I would be surprised if my accent fooled anyone

(however, I was mistaken for a random Frenchman three times in that week...heh),

but I think they appreciated the effort. No doubt they could tell immediately that my "Parlez-vous Englais" was stilted by a from-across-the-Atlantic accent, but they also realized I respected the fact that I was in THEIR country. I may have been "just lucky," but nearly everyone I encountered in Paris was "just fine," attitude-wise. We enjoyed spending 30-45 minutes just chatting with the guy who managed our hotel the last three nights.

2. It's a major city. Go to any large city in the world, and you'll find people with ugly personalities. New York, Los Angeles, Tokyo, etc. Their local economy is largely tourism-based. I'm sure some people don't mind that fact, but others resent it, yet they continue to live there. And some people are just plain assholes. Any large city is bound to have a higher concentration of them. Also, to paraphrase a point someone else made today, you remember the one jerk out of ten nice people.

3. Are the French arrogant? I wonder if a large part of this perceived animosity stems from a phenomenon becoming increasingly prevalent in the United States; many of the people I meet here at home have a mentality that drips with egotism. They're not bashful at all when expressing the idea that they're not treated well enough by everyone else, even when nationality is a non-issue. I can only imagine trying to deal with this type of person and trying to remain civil.

I didn't get this impression from reading posts by raehl, by the way. I'm betting that he just ran into the wrong people. And raehl makes a valid point about expecting fluent English from the average American.

Going to the mall with a girl I used to date: When she was behind the wheel, she would impatiently complain about people walking too slowly in front of her as she circled the parking lot to find the absolute closest spot, and if she could get by without waiting, she would. Yet she was intolerant of motorists who didn't slow down to let her walk in front of their vehicles. Clearly her mode of transportation wasn't all that was pedestrian. More and more Americans have displayed this sort of behavior. I don't know if it's worldwide, but I have a feeling it's not. The only reason this feeling bothers me is that it frightens me to be part of THE country known for one-sidedness such as this.

Incidentally, the worst of the handful of bad experiences occurred when a woman approached me outside the train station a few blocks from Notre Dame. She asked me a question that sounded like a French request for directions. I asked her "Parlez-vous Englais?" (am I spelling that correctly?), After a moment of shock, she looked at me as if I had just attempted to rob her, and then she stormed away.

One of the nicest experiences started the same way: My friend and I were asked for subway assistance on Champs Elysees by two attractive young French women visiting Paris for the first time (sorry, no "dear Penthouse" story here). Like the other woman, they clearly thought I was French; maybe it's because I don't own any Hawai

--
Mom says my .sig can beat up your .sig.

I doubt it's genuine by gidds · 2003-07-28 13:11 · Score: 1

That example has been around so long that I'd always assumed it was a joke rather than an actual result.

Another one that gets mentioned is "Out of sight, out of mind" which gets reverse-translated into "Invisible idiot".

--

Ceterum censeo subscriptionem esse delendam.

Re:I doubt it's genuine by HiThere · 2003-07-31 06:52 · Score: 1

That wasn't Russian. The "Out of sight, out of mind" to "Invisible idiot" was via a chinese intermediate.

--

I think we've pushed this "anyone can grow up to be president" thing too far.
Re:I doubt it's genuine by Anonymous Coward · 2003-07-31 12:57 · Score: 0

No, it was an "blind idiot".
If you have "Out of" it translates to the slavic equivalent of without.
Re: I doubt it's genuine by gidds · 2003-07-31 13:28 · Score: 1

So how do you know what I've heard?!
In fact, I've heard both, but I prefer the alliteration of the version I gave :)

--
Ceterum censeo subscriptionem esse delendam.

Culture of the whole world by flicken · 2003-07-28 13:33 · Score: 1

Learning Esperanto gives you access to the culture of the entire world! One day, read a Chinese newspaper; the next, listen to music from a Danish/Polish/Bosnian music group.

The whole world is literally at your finger tips. Here are a few examples.

--
20 mil and I will! Learn Esperanto with 20M others.

I couldn't have said it better myself by flicken · 2003-07-28 13:37 · Score: 1

So i won't try. (-;

--
20 mil and I will! Learn Esperanto with 20M others.

Grammar? We ain't got no stinkin' grammar! by tool462 · 2003-07-28 14:29 · Score: 1

things that are legitimate to say language-wise are considered incorrect anyways

This is the distinction between prescriptive grammar and grammar as it is used. This includes the notorious double negative and end of sentence prepositions. See Stephen Pinker's The Language Instinct for a detailed description of the issue.

And you think it's bad in English? Try it in German or French where the governments are trying to control language evolution with legislation, instead of just textbooks.

Let's give him a real challenge.. by vudufixit · 2003-07-28 14:49 · Score: 1

The Voynich Manuscript!

A flawed approach by Oryx3 · 2003-07-28 17:51 · Score: 2, Insightful

And where are you going to find gigabytes of parallel Klingon-English texts?

No seriously, this is the fallacy behind any statistical approach to automated translation.The news release gives the telling comment:

"Different human translators' versions of the same text will often vary considerably. Another key improvement has been the use of multiple English human translations to allow the computer to more freely and widely check its rendering by a scoring system. This not coincidentally allows researchers to quantitatively measure improvement in translation on a sensitive and useful scale."

This paragraph just doesn't make any sense to me. Either it's badly explained, or the entire approach is flawed:

You have to start with correctly human-translated and aligned texts to begin with. How many versions of the same text are you willing to pay for?
Most likely, you will have some texts well translated, and some badly translated. How do you rate the relative quality of each version? How many translators does it take to revise gigabytes of text? (One to screw in the lightbulb...)
A large percentage of existing translations are mediocre. So you are going to get mostly bad translation out, since they don't even attempt to build any linguistic knowledge into the system. GIGO rules!

Statistical methods just cannot deal with the subtlety of meaning to be found in natural language texts. It's a little like believing that you can always win at chess if you can just look ahead far enough. I believe that this approach is inherently limited and any apparent success is illusory. This news release hasn't changed my opinion.

Sorry to be a party-pooper, but that's how I feel.

Re:A flawed approach by daBum · 2003-07-29 02:02 · Score: 1

From my understanding of the article:

The sample texts are not supposed to be perfect. They provide a set of basic info for the system to get its initial parameters for the language. After reading a few versions of the same translation (by different authors), the system has populated a database of "x=y" type data (chicken = egg). It can flag duplicates (chicken = egg, ova = egg), especially for the ones that are very dissimilar, and have a human look at the flagged table & make a decision. Or perhaps it takes the duplicates & ranks them (there were 4 nouns that = egg, and 1 adjective. probably it's a noun), and generates a "most likely" solution, which it applies in new translations.

Or maybe there's a small tibetian monk inside who has spent eons learning all possible languages, and the whole thing is a sham.

--
I am dyslexia of borg - your ass will be laminated.

After a few more trips through The Fish... by Anonymous Coward · 2003-07-28 18:44 · Score: 0

Those first few sentences are embued with a new mystical quality of exposition that simply can't help but to move a reader's soul to new dimensions of awe at the simple, yet complex, enlightenment of just how *bad* most translation software genuinely is. I think the translation speaks (muddily) for itself:

--

The danger that will inside load the place inside of the order that is enough Pierred U RomancingRosette ', the counterbalance that gives, it gives the system? Finally and of the translatio of software and history commerce technician, chow and 23 peoples or compared with the enemy the quality of the test that recently finishes in arabian it system of the concentration of the British inquiry and it commerce of this reading of it, ' probably California the one main parcel Och the real quality who is the college south of specialist of the hour of shipment of the Franz inside of dataprocessing, will sound to it, approximately nobleman and compliment that scholar inspected for respecting, translates. of the "o my interior that is being been situated the handspike of the emission regarding the Archimedes of the scientist of the danger the world it place of the grease, in the future the end to think interior d explaining the place, and to request it" and the visible age in the inquiry and this water index of the clause. The danger that will load the counterbalance that is enough, gives it and "it gives and it must translate the Och that inside says to the doctor of specialist dataprocessing of the school of the USC of the association of the danger all the language that 2 things are possible in the hour and the interior of the danger the company burden of the system,", it the information science technique that it is a possibility.

--

Have you ever really stopped to ponder "the counterbalance that gives, it gives the system?" Are these the words of a condescending gray-bearded UNIX guru? Have you yet implemented this counterbalance on your system? If so, do you recommend Perl?

Don't forget: "it the information science technique that it is a possibility."

Searching, and searching by corian · 2003-07-29 01:55 · Score: 1

Where's this great title we were supposed to find?

That's kinda funny... by raehl · 2003-07-29 03:45 · Score: 1

I had the same problem - having lived in Europe for a year, I was not wearing the typical brightly-colored or college/professional sports team-themed American tourist clothes, and was frequently asked directions etc. by the locals, or other tourists assuming I was a local. Always a blast when someone from the British Empire approaches you and asks in the local language if you speak english.

And for the record, once we were outside of Paris, virtually everyone we met in France was great.

But next time, you should put a little more effort into it when two attractive women ask you for directions. ;)

--
paintball

Re:That's kinda funny... by zedmelon · 2003-07-29 04:24 · Score: 1

yeah, we had been there about eight days, and they had just arrived, so we briefly considered (at least I did) offering to "show them around," but we were planning on meeting our other friends, so we bailed.
Also, I said they were attractive, not stunning. Might've been more motivation if...
;)
On the other hand, none of the super-mega-hotties asked me for directions.

--
Mom says my .sig can beat up your .sig.

Re:Are the French really assholes? No. by Anonymous Coward · 2003-07-31 08:22 · Score: 0

>"Parlez-vous Englais?" (am I spelling that >correctly?)

"Parlez-vous Anglais ?" is the right spelling.
"Anglais" is the tongue spoken by the Angle (Angli in Latin) tribe.

486 comments