Slashdot Mirror


Automatic Translation Without Dictionaries

New submitter physicsphairy writes "Tomas Mikolov and others at Google have developed a simple means of translating between languages using a large corpus of sample texts. Rather than being defined by humans, words are characterized based on their relation to other words. For example, in any language, a word like 'cat' will have a particular relationship to words like 'small,' 'furry,' 'pet,' etc. The set of relationships of words in a language can be described as a vector space, and words from one language can be translated into words in another language by identifying the mapping between their two vector spaces. The technique works even for very dissimilar languages, and is presently being used to refine and identify mistakes in existing translation dictionaries."

13 of 115 comments (clear)

  1. My hovercraft is full of eels! by Anonymous Coward · · Score: 5, Funny

    My nipples explode with delight!

  2. Re:how would by Anonymous Coward · · Score: 5, Funny

    how would 'tight pussy" be translated?

    "Tight pussy" would be translated automatically, and without dictionaries. This is answered right in the headline.

  3. Darmok and Jalad at Tanagra by Vanders · · Score: 4, Interesting

    Finally, the team point out that since the technique makes few assumptions about the languages themselves, it can be used on argots that are entirely unrelated.

    Once again, Star Trek is ahead of the curve.

  4. Hofstadter? Isn't this AI, not translation? by Etcetera · · Score: 5, Interesting

    Reminds me a lot of the Fluid Concepts and Creative Analogies work that Hofstadter led back in the day.

    I don't see this directly working for translation into non-lexographically swappable languages (eg, English -> Japanese) very well, because even if you have the idea space mapped out, you'd still have to build up the proper grammar, and you'll need rules for that.

    That being said.... Holy cow, you have the idea space mapped out! That's a big chunk of Natural Language Processing and an important step in AI development. ... Understanding a sentence emergently in terms of fuzzy concepts that are an internal and internally created symbol of what's "going on", not just using a dictionary and CYC-like rules to figure it out, seems like a useful building block, but maybe I'm wrong.

    Very cool stuff. Makes me want to go back and finish that CS degree after all.

    1. Re:Hofstadter? Isn't this AI, not translation? by phantomfive · · Score: 4, Interesting

      I don't see this directly working for translation into non-lexographically swappable languages (eg, English -> Japanese) very well, because even if you have the idea space mapped out, you'd still have to build up the proper grammar, and you'll need rules for that.

      According to the paper, this translation technique is only for translating words and short phrases. But it seems to work well for languages as far apart as English and Vietnamese.

      --
      "First they came for the slanderers and i said nothing."
  5. Re:Sounds good, but we need a robust plug by Finallyjoined!!! · · Score: 4, Funny

    it gets full of lint

    What's it got in its pocketses?

    --
    If I had an Ass, I'd call it Fanny Bottom, then I could slap my Ass; Fanny Bottom, on the Arse.
  6. Dolphinese Will Now Be Understood by MacroSlopp · · Score: 4, Funny

    With this technology we should be able to understand Dolphin-talk.
    It should also allow us to detect future ape rebellions before they happen.

  7. Re:how would by Jane+Q.+Public · · Score: 4, Funny

    "tight pussy" be translated?

    "The cat has drunk a saucer of wine."

  8. Old idea, new implementation? by Theovon · · Score: 5, Interesting

    When I was in grad school, studying linguistics, compitational linguistics, and automatic speech recognition, I recall it mentioned more than once the idea of using latent semantic analysis and such to do this kind of translation. So am I correct in assuming that this hasn't been done well in the past, and Google finally made it work well because they have larger corpora of translated texts?

  9. Old news by richwiss · · Score: 4, Informative

    This is old news, going back to 1975. Yawn. http://en.wikipedia.org/wiki/Vector_space_model

  10. Re:Summary wrong (again) by hey! · · Score: 4, Insightful

    Simply because you embed your dictionary in something you choose to call a vector doesn't make it any less of a dictionary.

    True, but calling a dictionary a vector space doesn't make it so. For example how "close" are the definitions of "happiness" and "joy"? In a dictionary, the only concept of "closeness" is the lexical ordering of the word itself, and in that sense "happiness" and "joy" are quite far apart (as far apart as words beginning h-a are from words beginning with j-o are in the dictionary). But in some kind of adjacency matrix which show how often these words appear in some relation to other words, they might be quite close in vector-space; "guilt" and "shame" might likewise be closer to each other than either is from "happiness", and each of the four words ("happiness", "joy", "guilt", "shame") would be closer to any other of those words than they would be to "crankshaft"; probably close to "crankshaft" (a noun) than they'd be to "chewy" (an adjective).

    Anyhow, if you'd read the paper, at least as far as the abstract, you'd see that this is about *generating* likely dictionary entries for unknown words using analysis of some corpus of texts.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  11. Re:And what's the algorithm complexity? by SuricouRaven · · Score: 4, Funny

    Statistical translation is always going to have issues like that, but it can perhaps reach the 'good enough' point to hold a conversation with.

    I can easily see it getting confused by formal vs informal use. If it goes on association, eventually it's going to get 'lawyer' and 'extortionist' confused.

  12. Re:And what's the algorithm complexity? by Anonymous Coward · · Score: 4, Funny

    I too get lawyer and extortionist confused.