Slashdot Mirror


Google Faces Plagiarism Questions Over Chinese Software

yaohua2000 writes "Google's laboratory in China has launched its first product, a Pinyin Input Method Editor. The software allows the romanized characters to be translated to more traditional Chinese symbols , via entering on a QWERTY keyboard. Users soon discovered that the data Google used for the product was unusually similar to the data used by a Chinese rival, Sogou. Google has evaded the question about software similarities, reports PC World. 'The similarities, which included an error involving the name of a celebrity, were noted on a Google Labs discussion board about its Pinyin IME. Users noted that entering the Pinyin pinggong into the Google IME incorrectly produced the name of Feng Gong, an actor and comedian.'"

5 of 187 comments (clear)

  1. not saying it's the case by creativeHavoc · · Score: 5, Insightful

    while i am not insisting that it is the case, it seems like it could easily be the same logic flaw. Different algorithms and code can produce the same mistake if you are using the same mis guided logic behind the problem. Thats why you see the same bugs in students' code in university, even when worked on separatly during a lab.

    --
    insight through the mind
  2. Re:This wouldn't be the first time... by Anonymous Coward · · Score: 5, Insightful

    the dozens of person-years that went into writing the actual dictionaries for aspell were simply co-opted by Google. Get off your high horse - you're just another holy roller.

    Thousands of people donate their time, money, and code to GPL-licensed projects. As one of those contributors, I can tell you that I don't believe that Google is doing anything wrong at all with aspell. The terms of the license are clear. Users are no way required to give attribution. In fact, there is not even a suggestion, hint, or implication that attribution would be nice. You suggesting that it should be that way is fine, but to state that aspell was "co-opted" is factually incorrect and falsely implies that Google is doing something against the GPL license.

    If you, as a contributor to aspell, don't like aspell's license terms, you are free to start another project with similar goals under different license terms.
  3. Combing by eMbry00s · · Score: 5, Insightful

    Everybody who says something along the lines of "bah, chinese complaining about stealing" should note that all Chinese are not connected into one single conscious entity, but are different individuals.

    The people who own this IP need not have stolen any other IP.

    It is as dumb as saying that all Americans are christian, guntouting, fat fuckasses.

  4. Re:Identical typos... by Plutonite · · Score: 5, Insightful

    Not really. I'm not defending Google here, but you seem to be talking about an essay not an algorithm. If you have algorithms that are similar enough, they do not even need to be "structured the same way" to produce the same output(errors included). Anybody who has been to an ACM contest will tell you this.

    As such this story is useless. The internet needs no more speculation as it is, it's hard enough arguing what is wrong or right when concrete evidence is available. Our flamewars should be founded on solid ground.

  5. Re:Identical typos... by ReallyEvilCanine · · Score: 5, Insightful
    According to TFA, Sohu has patents in several areas related to how popular Internet search terms can be used for predictive text input. Google does, too. And unlike most others, Google constantly tweaks algorithms. Have you noticed how the Google Toolbar now predicts your search terms? And every time you deviate, they do modifications for you personally and tabulate in general to see if other's are also going after such similar versions.

    I work in I18N and deal with IMEs all the time, from the basic, non-learning MS Windows versions to the ones which come with the NJ Star and give preference to lesser-used terms previously selected to various other proprietary variants. There are only so many ways to write an IME, and there are only so many ways to do good prediction. If I type "go" in Japanese, my first choice will usually be "5" followed by the symbol for "language" and the game "Go", then various other possibilities. Only when I next type a "z" or a "g" do the symbols for a.m. and p.m. move to the front. Now if I'd written an IME and wanted to protect it I might have it always bring up "Mifune Go" ( as the fifth selection or, more subtly, bring up "Go" as the fifth possibility if you typed a "G" or "Go" after "Mifune". This isn't the case here.

    With Google's work and implementation of prediction methods, I find it hard to accuse the company of plagiarism for having the same bug (which comes as a result of predictive methods) as some other company. This is a bug, not some zyzzyx or easter egg which a programmer included to catch thieves. It was unintentional on Sogou's part and likely equally unintentional on Google's.

    Then again, there's a lot of pressure to excel at Google and maybe someone gave in to temptation despite working for a company that knows more about data than anyone else out there. Unlikely, but possible... and if Google issue a statement that someone did indeed plagiarise Sohu's work, fine. It could happen anywhere. Doesn't make Google bad, only one programmer. It makes the company culpable, but it hardly looks malicious.