More on Statistical Language Translation
DrLudicrous writes "The NYTimes is running an article about how statistical language translation schemes have come of age. Rather than compile an extensive list of words and their literal translations via bilingual human programmers, statistical translation work by comparing texts in both English and another language and 'learning' the other language via statistical methods applied to units called 'N-grams'- e.g. if 'hombre alto' means tall man, and 'hombre grande' means big man, then hombre=man, alto=tall, and grande=big." See our previous story for more info.
If this happens, I suspect this technology will be illegal...
For example, the English word pattern can be translated in French by any of (please excuse the lack of accents, they were stripped when I submitted): modele, exemple, type schema, dessin, motif, maquette, patron, plan, disposition, groupement, repartition, combinaison, diagramme, gabarit, echantillon, tendance, figure, circuit (and probably others as well) depending on the context -- and not just the lexical context, but the meaning.
Previous attempts to automate translation focused on giving computers grammatical and semantic knowledge, in the hope that it could infer some meaning from this and so choose the right equivalents. Despite some success, this approach failed in general, putting machine translation (MT) firmly in the realm of AI. I believe this statistical approach is a step in the wrong direction (back to purely lexical means of analyzing texts with a view to translation). Further progress in MT will come from AI.
This doesn't detract from the ways in which computers have been useful to translators -- in the area of computer-assisted translation (translation memory, localization, terminology databases, etc.)
The other point is it's a lot harder to get a good-quality parallel corpus than you'd think (even in the Internet age -- most of the stuff on the Internet is crap anyway).
It's not the idea of using computers in translation that I think is limited, just this approach.
Artificial neural nets are one way to do this, but statistical methods are more or less analogous and have the advantage of being highly optimizable. Personally I don't understand the details, but Very Smart Mathematicians have found ways to optimize models like Singular Value Decompositions (SVDs) so that they can be calculated orders of magnitude faster than models that cannot be represent as formally using mathematics.
The bottom line is that statistical methods are probably the way that we will end up producing brain-like behavior on computers, and the fact that there are promising results already is heartening. Yes, for truly intelligent behavior a lot of domain knowledge will also be needed, as you point out. But I don't see any reason why the extraction and mapping of this knowledge couldn't also be achieved with large training corpora and statistical methods, rather than hand-crafting.
Peer Pressure
It may be possible for this approach to address that issue somewhat. Statistics can be collected not only on associations of words with other words, but also on associations of groups of words or phrases with others. So if the translator has learned from documents in which the phrase "put it down" appears near the word "ill" and the word "dog," and from other documents in which the phrase is associated with the word "heavy," it can make a good guess.
Clearly, it would need to learn from a tremendous amount of input data before it could begin to approach the experience of a human, and hence make guesses of similar quality to a human translator. However, the amount of available source material is increasing so rapidly that it may be possible for a translator to get pretty darn smart this way.