The big name in MT from IBM who fired linguists may have hired them for a wrong purpose. Your assertion omits a fundamental point: Translation Machines ONLY work great within uniform and specialized fields of knowledge. This is because human specialized languages behave closely to programming languages, so they are more likely to be processed by computers. Moreover, technical writers have learned how to write for the machine: they try to isolate particular and recurrent actions and express them always with the same isolated sentence. An example of that kind of sentence: "Click with the right button of the mouse". In a normal text, this sentence could be easily inserted into a bigger sentence, or subjected to some variation. But technical writers know that they must make a single invariant sentence, finished by a dot, to suit the machine skills. They try to make idioms behave like programming languages do, like context-independent languages. But don't forget that even in specialized fields, MTs must be used carefully because of the neologisms, that can reach a great percentage of the lexicon (almost the half) in high tech fields. Right? So, does the corpus-based statistics work here? Believe me, if you translate manuals of equipment that threaten their user's life (the only that MUST be translated according to the international law), and you keep thinking only in terms of corpus-based statistics, you should be fired immediately! On the contrary you'll kill people! MTs do have good performance when used with restrictions, for texts that behave closely to the formal languages used to program them. Corpus-based statistics applies in some specialized contexts, to the part of an idiom that can be reduced to formal languages, which is rather narrow.
Gryphia, Your explanation of the difference between prediction based on statistics and on cause seems very clear to me. However, I will disagree on the fact that Google is mainly based on statistics, even if it is what Google claims. If you have some time please have a look at my post on the subject: http://science.slashdot.org/comments.pl?sid=594853&cid=23940075 I hope my post will be as clear to you that yours has been to me. Regards, Netpolyglot
I've read most of your posts, and it seems that my approach is quite different, mostly focused on what the article says about Google fundamental method, alias PageRank.
If Craig's Venture sequencing of genome revealed that environment can influence heavily inheritable genetic traits, this influence is still to exist and to be found in the environment. Google is dominating the Internet's environment, which happens to be presently one of the dominant environment -or context, at least- in which human beings develop relationship and communicate. What is said, in this article, to be Google's philosophy, 'we don't know why this page is better than that one...', is actually Google 's declared or confessed philosophy.
Absolutely opposed to Google's pretended ignorance (or maybe some gap in my knowledge of Klingon), the success of that company would rather rely fundamentally on the analysis of hyperlinks, as a sign of relationship and hierarchy between sites, then between communities of Internet users, then between social groups in a human environment. Google understood that the hyperlinks were an unambiguous expression, mechanically exploitable, of a socially determined human relationships. So it is absolutely wrong to say that Google requires âoeNo... semantic analysisâ. To work, it does requires it, yes, it absolutely requires semantic analysis, and above all semantic analysis mixed with social analysis based on hyperlink observation.
Google does not know, indeed, if this page is better than that one. And this should precisely explain its success. It mainly and simply understood that the human, who consciously make meaningful hyperlinks between webpages, know better than Google ever will, which are the best pages. Google in particular made a brilliant bet on that human knowledge, and speculated heavily on it, as its foundation to provide some quality, instead of spending time and power calculating randomly numeric relationships between words and text elements as Altavista did, in vain.
As regard the term 'raw data' mentioned in the article, everyone will agree that hyperlinks have nothing to do with raw data, undetermined data, on the contrary, because one doesn't put insert tag of an hyperlink on a webpage, pointing to another webpage, unconsciously, which is not the case of many words we use, that always have many ways of being interpreted, including meanings not intended (heard about lapsus?). Hyperlinks are definitely pre-structured data
Google is not a method to find truth, indeed, it's not needed since human are much better at filling Google with their truths... and beliefs! Google provides a method, which is as servile as brilliant and acute, to reproduce and certainly worsen, through ranking, relationships of power just as they already exist and are inherited in real human societies. Google pretends that it just devours our relationships between us to rank us and spits out the (our) truth.
Google owns the Internet, ok, but human social reproduction is still what determines the Internet. Google knows that, that's why it wants to control Internet environment, to know more about how human determination, and maybe how to influence it. This is definitely more a political than a scientific method. It would be better, instead, to use the already old expression âoerealpolitikâ, it is well-proven that it can be pretty useful, oh yes.
What I felt about this article, is some irony, in particular when the author says that Google can translate perfectly from Klingon to English. Don't you?
The big name in MT from IBM who fired linguists may have hired them for a wrong purpose.
Your assertion omits a fundamental point: Translation Machines ONLY work great within uniform and specialized fields of knowledge.
This is because human specialized languages behave closely to programming languages, so they are more likely to be processed by computers.
Moreover, technical writers have learned how to write for the machine: they try to isolate particular and recurrent actions and express them always with the same isolated sentence.
An example of that kind of sentence: "Click with the right button of the mouse". In a normal text, this sentence could be easily inserted into a bigger sentence, or subjected to some variation. But technical writers know that they must make a single invariant sentence, finished by a dot, to suit the machine skills.
They try to make idioms behave like programming languages do, like context-independent languages.
But don't forget that even in specialized fields, MTs must be used carefully because of the neologisms, that can reach a great percentage of the lexicon (almost the half) in high tech fields.
Right?
So, does the corpus-based statistics work here?
Believe me, if you translate manuals of equipment that threaten their user's life (the only that MUST be translated according to the international law), and you keep thinking only in terms of corpus-based statistics, you should be fired immediately! On the contrary you'll kill people!
MTs do have good performance when used with restrictions, for texts that behave closely to the formal languages used to program them.
Corpus-based statistics applies in some specialized contexts, to the part of an idiom that can be reduced to formal languages, which is rather narrow.
Anyway, I think the point here is somewhat different:
http://science.slashdot.org/comments.pl?sid=594853&cid=23940075
Regards,
Netpolyglot
Gryphia,
Your explanation of the difference between prediction based on statistics and on cause seems very clear to me.
However, I will disagree on the fact that Google is mainly based on statistics, even if it is what Google claims.
If you have some time please have a look at my post on the subject:
http://science.slashdot.org/comments.pl?sid=594853&cid=23940075
I hope my post will be as clear to you that yours has been to me.
Regards,
Netpolyglot
Maybe this definition of the word REALPOLITIK could help:
http://www.merriam-webster.com/dictionary/realpolitik
I've read most of your posts, and it seems that my approach is quite different, mostly focused on what the article says about Google fundamental method, alias PageRank.
If Craig's Venture sequencing of genome revealed that environment can influence heavily inheritable genetic traits, this influence is still to exist and to be found in the environment.
Google is dominating the Internet's environment, which happens to be presently one of the dominant environment -or context, at least- in which human beings develop relationship and communicate.
What is said, in this article, to be Google's philosophy, 'we don't know why this page is better than that one...', is actually Google 's declared or confessed philosophy.
Absolutely opposed to Google's pretended ignorance (or maybe some gap in my knowledge of Klingon), the success of that company would rather rely fundamentally on the analysis of hyperlinks, as a sign of relationship and hierarchy between sites, then between communities of Internet users, then between social groups in a human environment. Google understood that the hyperlinks were an unambiguous expression, mechanically exploitable, of a socially determined human relationships. So it is absolutely wrong to say that Google requires âoeNo ... semantic analysisâ. To work, it does requires it, yes, it absolutely requires semantic analysis, and above all semantic analysis mixed with social analysis based on hyperlink observation.
Google does not know, indeed, if this page is better than that one. And this should precisely explain its success. It mainly and simply understood that the human, who consciously make meaningful hyperlinks between webpages, know better than Google ever will, which are the best pages. Google in particular made a brilliant bet on that human knowledge, and speculated heavily on it, as its foundation to provide some quality, instead of spending time and power calculating randomly numeric relationships between words and text elements as Altavista did, in vain.
As regard the term 'raw data' mentioned in the article, everyone will agree that hyperlinks have nothing to do with raw data, undetermined data, on the contrary, because one doesn't put insert tag of an hyperlink on a webpage, pointing to another webpage, unconsciously, which is not the case of many words we use, that always have many ways of being interpreted, including meanings not intended (heard about lapsus?). Hyperlinks are definitely pre-structured data
Google is not a method to find truth, indeed, it's not needed since human are much better at filling Google with their truths... and beliefs! Google provides a method, which is as servile as brilliant and acute, to reproduce and certainly worsen, through ranking, relationships of power just as they already exist and are inherited in real human societies. Google pretends that it just devours our relationships between us to rank us and spits out the (our) truth.
Google owns the Internet, ok, but human social reproduction is still what determines the Internet. Google knows that, that's why it wants to control Internet environment, to know more about how human determination, and maybe how to influence it. This is definitely more a political than a scientific method. It would be better, instead, to use the already old expression âoerealpolitikâ, it is well-proven that it can be pretty useful, oh yes.
What I felt about this article, is some irony, in particular when the author says that Google can translate perfectly from Klingon to English. Don't you?