Automatic Translation Without Dictionaries

← Back to Stories (view on slashdot.org)

Automatic Translation Without Dictionaries

Posted by Soulskill on Saturday September 28, 2013 @09:42AM from the baby-steps-to-the-universal-translator dept.

New submitter physicsphairy writes "Tomas Mikolov and others at Google have developed a simple means of translating between languages using a large corpus of sample texts. Rather than being defined by humans, words are characterized based on their relation to other words. For example, in any language, a word like 'cat' will have a particular relationship to words like 'small,' 'furry,' 'pet,' etc. The set of relationships of words in a language can be described as a vector space, and words from one language can be translated into words in another language by identifying the mapping between their two vector spaces. The technique works even for very dissimilar languages, and is presently being used to refine and identify mistakes in existing translation dictionaries."

31 of 115 comments (clear)

Min score:

Reason:

Sort:

My hovercraft is full of eels! by Anonymous Coward · 2013-09-28 09:45 · Score: 5, Funny

My nipples explode with delight!
Re:Pun + Her attitude arbitrary pleases me too. by mynamestolen · 2013-09-28 09:52 · Score: 2

hmmm?? slashdot doesn't easily accommodate unicode.

--
work in progress
Re:how would by Anonymous Coward · 2013-09-28 10:02 · Score: 5, Funny

how would 'tight pussy" be translated?
"Tight pussy" would be translated automatically, and without dictionaries. This is answered right in the headline.
Darmok and Jalad at Tanagra by Vanders · 2013-09-28 10:03 · Score: 4, Interesting

Finally, the team point out that since the technique makes few assumptions about the languages themselves, it can be used on argots that are entirely unrelated.
Once again, Star Trek is ahead of the curve.

--
Syllable : It's an Operating System
1. Re:Darmok and Jalad at Tanagra by Samantha+Wright · 2013-09-28 14:23 · Score: 2
  
  Incidentally, real life caught up—fortunately there's not much worth translating with such a low-bandwidth form of communication.
  
  --
  Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!
Hofstadter? Isn't this AI, not translation? by Etcetera · 2013-09-28 10:11 · Score: 5, Interesting

Reminds me a lot of the Fluid Concepts and Creative Analogies work that Hofstadter led back in the day.
I don't see this directly working for translation into non-lexographically swappable languages (eg, English -> Japanese) very well, because even if you have the idea space mapped out, you'd still have to build up the proper grammar, and you'll need rules for that.
That being said.... Holy cow, you have the idea space mapped out! That's a big chunk of Natural Language Processing and an important step in AI development. ... Understanding a sentence emergently in terms of fuzzy concepts that are an internal and internally created symbol of what's "going on", not just using a dictionary and CYC-like rules to figure it out, seems like a useful building block, but maybe I'm wrong.
Very cool stuff. Makes me want to go back and finish that CS degree after all.

--
Hire a Linux system administrator, systems engineer,
1. Re:Hofstadter? Isn't this AI, not translation? by phantomfive · 2013-09-28 11:18 · Score: 4, Interesting
  
  I don't see this directly working for translation into non-lexographically swappable languages (eg, English -> Japanese) very well, because even if you have the idea space mapped out, you'd still have to build up the proper grammar, and you'll need rules for that.
  According to the paper, this translation technique is only for translating words and short phrases. But it seems to work well for languages as far apart as English and Vietnamese.
  
  --
  "First they came for the slanderers and i said nothing."
Re:Sounds good, but we need a robust plug by Finallyjoined!!! · 2013-09-28 10:14 · Score: 4, Funny

it gets full of lint

What's it got in its pocketses?

--
If I had an Ass, I'd call it Fanny Bottom, then I could slap my Ass; Fanny Bottom, on the Arse.
Re:Sounds good, but we need a robust plug by caseih · 2013-09-28 10:16 · Score: 2

Agg. firefox put me on the wrong story... bye bye karma
Re: make that the cat wise! by Anonymous Coward · 2013-09-28 10:17 · Score: 2, Funny

Yes exactly. For sayings google translate works not so good now. But perhaps with this technique it will be to plums in the future.
Re:Sounds good, but we need a robust plug by icebike · 2013-09-28 10:23 · Score: 3, Insightful

Firefox had nothing to do with it.
It was PEBCAK, pure and simple.

--
Sig Battery depleted. Reverting to safe mode.
Dolphinese Will Now Be Understood by MacroSlopp · 2013-09-28 10:34 · Score: 4, Funny

With this technology we should be able to understand Dolphin-talk.
It should also allow us to detect future ape rebellions before they happen.
1. Re:Dolphinese Will Now Be Understood by Vanders · 2013-09-28 11:02 · Score: 2
  
  This has to be done.
  
  --
  Syllable : It's an Operating System
Re:how would by Jane+Q.+Public · 2013-09-28 10:56 · Score: 4, Funny

"tight pussy" be translated?
"The cat has drunk a saucer of wine."
Old idea, new implementation? by Theovon · 2013-09-28 11:00 · Score: 5, Interesting

When I was in grad school, studying linguistics, compitational linguistics, and automatic speech recognition, I recall it mentioned more than once the idea of using latent semantic analysis and such to do this kind of translation. So am I correct in assuming that this hasn't been done well in the past, and Google finally made it work well because they have larger corpora of translated texts?
Old news by richwiss · 2013-09-28 11:03 · Score: 4, Informative

This is old news, going back to 1975. Yawn. http://en.wikipedia.org/wiki/Vector_space_model
Re:the spirit is willing but the flesh is weak by icebike · 2013-09-28 11:17 · Score: 3, Interesting

Yes, the pretty vectors (nothing but lists of words) still have to be assembled by humans for the most part. Maybe not EVERY association, but enough of them such that you can build relationships and associations in-directly, and achieve a round-about translation, even if you end up having to go through 2 or 3 related languages to get there.
After a few words of context are translated you can, perhaps deduce the rest. But the idea you can do so without a dictionary is ridiculous. And putting your dictionary into digital forms and calling it a vector doesn't change the fact that you still have a dictionary associating an english word with a french word and a Mandarin word.

--
Sig Battery depleted. Reverting to safe mode.
Re:Cat by blue+trane · 2013-09-28 11:21 · Score: 3, Insightful

jazz musician
Re:Summary wrong (again) by hey! · 2013-09-28 11:26 · Score: 4, Insightful

Simply because you embed your dictionary in something you choose to call a vector doesn't make it any less of a dictionary.
True, but calling a dictionary a vector space doesn't make it so. For example how "close" are the definitions of "happiness" and "joy"? In a dictionary, the only concept of "closeness" is the lexical ordering of the word itself, and in that sense "happiness" and "joy" are quite far apart (as far apart as words beginning h-a are from words beginning with j-o are in the dictionary). But in some kind of adjacency matrix which show how often these words appear in some relation to other words, they might be quite close in vector-space; "guilt" and "shame" might likewise be closer to each other than either is from "happiness", and each of the four words ("happiness", "joy", "guilt", "shame") would be closer to any other of those words than they would be to "crankshaft"; probably close to "crankshaft" (a noun) than they'd be to "chewy" (an adjective).
Anyhow, if you'd read the paper, at least as far as the abstract, you'd see that this is about *generating* likely dictionary entries for unknown words using analysis of some corpus of texts.

--
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Re:And what's the algorithm complexity? by SuricouRaven · 2013-09-28 11:38 · Score: 4, Funny

Statistical translation is always going to have issues like that, but it can perhaps reach the 'good enough' point to hold a conversation with.
I can easily see it getting confused by formal vs informal use. If it goes on association, eventually it's going to get 'lawyer' and 'extortionist' confused.
Re:how would by SuricouRaven · 2013-09-28 11:40 · Score: 2

Depends on source corpus. If they trained it using one of the usual formal collections of publications, it would only have built up associations based on the slang-free usage and so would translate it as 'Tight cat.' If they have instead fed it a broader selection, perhaps culled from a web spider, it may pick up the other meaning.
Re:And what's the algorithm complexity? by Anonymous Coward · 2013-09-28 12:07 · Score: 4, Funny

I too get lawyer and extortionist confused.
Re:Pun + Her attitude arbitrary pleases me too. by Kjella · 2013-09-28 12:16 · Score: 2

Welcome to /. where we still party like it's 1999. We'll have colonies on Mars before this site gets unicode support.

--
Live today, because you never know what tomorrow brings
Like so many of these algorithms by holophrastic · 2013-09-28 12:21 · Score: 3, Interesting

They do a great job of improving the precision of what used to be mediocre. And then, as a direct result, they not only make the errors worse, they make the errors undetectable.
CAT: small, furry, pet.
BIG CAT: big, furry, pet.
Um. Both are orange. One's a tabby. One's a tiger.
It's not good enough that your translation system has a 99% accuracy whereas the old one has a 90% accuracy. What matters is that the old one's 10% error rate sounded like an error (e.g. tiger becomes monster), whereas your new one's 1% passes the turing test and can't be discerned by an intelligent listener (e.g. tiger becomes tabby).
"My friend owns a monster." -- You friend owns what? I don't think you meant a monster. -- "eh, you know, a very big dangerous jungle cat" -- oh, like a lion -- "not a lion, it has stripes" -- oh, a tiger.
"My friend owns a tabby." -- Ok.
1. Re:Like so many of these algorithms by flimflammer · 2013-09-28 13:38 · Score: 2
  
  "My friend owns a monster." -- You friend owns what? I don't think you meant a monster. -- "eh, you know, a very big dangerous jungle cat" -- oh, like a lion -- "not a lion, it has stripes" -- oh, a tiger.
  Do you frequently converse with machine translators that elaborate the meaning of their mistranslations? Would be interested in knowing which one is capable of that. See when I use them it's what-you-see-is-what-you-get and I have to pick at the original source text with a dictionary to learn monster actually means tiger. That they can nonchalantly narrow the meaning down for you in a Star Trek-esque computer conversation is leaps and bounds ahead of what I'm used to!
  Sarcasm aside for a moment, you're actually complaining that machine translators may eventually get so convincing that you might not even notice the errors anymore? Really? Sign me up for that scenario. Nothing should replace native translators anyway for precision work.
Re:Pun + Her attitude arbitrary pleases me too. by tepples · 2013-09-28 12:32 · Score: 2

Slashdot has a fairly strict code point whitelist because there were problems in the past with trolls using directionality override characters to break Slashdot's layout and big blocks of foreign characters to make not-ASCII ASCII art.
Still needs dictionaries by raju1kabir · 2013-09-28 13:16 · Score: 2

Anyone who regularly uses Google Translate has seen the problems that come with this approach.
It "translates" analogous terms in ways that make no sense. Translate "Amsterdam" from Dutch to English and it often gives you "London". Same with kilometres / miles, and other things that significantly change the meaning of the text.
With some hand-crafted guidance, the outcome can be much less useful than the more rough-sounding word-by-word machine translations from days of yore.

--
"Patriotism is your conviction that this country is superior to all other countries because you were born in it." -- GBS
Re:Cat by dkleinsc · 2013-09-28 14:04 · Score: 2

Rimmer, Lister

--
I am officially gone from /. Long live http://www.soylentnews.com/
Re:Synonyms by Panoptes · 2013-09-28 14:07 · Score: 2

Synonyms are only the tip of the iceberg: there are so many other problem areas. Collocations (words that 'go together'): we can say a 'tall boy', but not a 'high boy'; 'a large beer', but not 'a big beer'. Connotations (attitudes, feelings and emotions that a word acquires): compare 'a slim girl' with 'a skinny girl'. Idioms: 'hot potato' and 'red herring' cannot be translated directly into any another language. Add irony and sarcasm to the mix, class and regional usage, dialects, diglossia (for example, demotic and classical Arabic), puns and plays on words - the list goes on. Machine translation is a chimera.
Re:Synonyms by manu0601 · 2013-09-28 16:29 · Score: 2

I understand that collocation are adressed by their model: they study texts to discover that 'boy' may be preceded by 'tall' but not by 'high', and that in french, 'garçon' may be preceded by 'grand' but not 'haut'. That enables them to translate without a hitch.
But even adjectives handling may come with traps. Adjectives in french may appear before or after a noun. You may say 'un grand garçon' or 'un garçon grand', the meaning is the same most of the time. But there are exceptions! 'un type pauvre' is a poor guy, 'un pauvre type' is a mediocre person. Even the 'grand garçon' vs 'garçon grand' may carry subtle difference, as a father will tell his son he is 'un grand garçon' now (which means he is not a child anymore), but he will probably not tell him he is now 'un garçon grand' (which just mean he is tall). I guess this can be handled by their statistical model, but at some time they will need to add some logic to handle it. I guess it falls in the idiom category.
Puns and irony are probably the most difficult part of the game. Even human translator have a hard time with them
Re:Sounds good, but we need a robust plug by plover · 2013-09-29 03:18 · Score: 2

With this story being about automated translations getting it very wrong, there was a 95% chance people would have thought you were just making a joke about Apple doing language translations!
If you had posted a follow up like "That's what Apple translate gets when I wrote 'Orchards of apple trees have fans to spray microscopic poison dust on all trees', it would have been perfectly believable.

--
John