Slashdot Mirror


Google Faces Plagiarism Questions Over Chinese Software

yaohua2000 writes "Google's laboratory in China has launched its first product, a Pinyin Input Method Editor. The software allows the romanized characters to be translated to more traditional Chinese symbols , via entering on a QWERTY keyboard. Users soon discovered that the data Google used for the product was unusually similar to the data used by a Chinese rival, Sogou. Google has evaded the question about software similarities, reports PC World. 'The similarities, which included an error involving the name of a celebrity, were noted on a Google Labs discussion board about its Pinyin IME. Users noted that entering the Pinyin pinggong into the Google IME incorrectly produced the name of Feng Gong, an actor and comedian.'"

8 of 187 comments (clear)

  1. Identical typos... by pedantic+bore · · Score: 4, Insightful
    Funny, that's how we catch students who plagarize, too.

    Coming up with the same algorithm isn't terribly unlikely. Structuring it in the same way is not uncommon either. Making exactly the same mistakes, however, is hard to believe.

    --
    Am I part of the core demographic for Swedish Fish?
    1. Re:Identical typos... by Plutonite · · Score: 5, Insightful

      Not really. I'm not defending Google here, but you seem to be talking about an essay not an algorithm. If you have algorithms that are similar enough, they do not even need to be "structured the same way" to produce the same output(errors included). Anybody who has been to an ACM contest will tell you this.

      As such this story is useless. The internet needs no more speculation as it is, it's hard enough arguing what is wrong or right when concrete evidence is available. Our flamewars should be founded on solid ground.

    2. Re:Identical typos... by ReallyEvilCanine · · Score: 5, Insightful
      According to TFA, Sohu has patents in several areas related to how popular Internet search terms can be used for predictive text input. Google does, too. And unlike most others, Google constantly tweaks algorithms. Have you noticed how the Google Toolbar now predicts your search terms? And every time you deviate, they do modifications for you personally and tabulate in general to see if other's are also going after such similar versions.

      I work in I18N and deal with IMEs all the time, from the basic, non-learning MS Windows versions to the ones which come with the NJ Star and give preference to lesser-used terms previously selected to various other proprietary variants. There are only so many ways to write an IME, and there are only so many ways to do good prediction. If I type "go" in Japanese, my first choice will usually be "5" followed by the symbol for "language" and the game "Go", then various other possibilities. Only when I next type a "z" or a "g" do the symbols for a.m. and p.m. move to the front. Now if I'd written an IME and wanted to protect it I might have it always bring up "Mifune Go" ( as the fifth selection or, more subtly, bring up "Go" as the fifth possibility if you typed a "G" or "Go" after "Mifune". This isn't the case here.

      With Google's work and implementation of prediction methods, I find it hard to accuse the company of plagiarism for having the same bug (which comes as a result of predictive methods) as some other company. This is a bug, not some zyzzyx or easter egg which a programmer included to catch thieves. It was unintentional on Sogou's part and likely equally unintentional on Google's.

      Then again, there's a lot of pressure to excel at Google and maybe someone gave in to temptation despite working for a company that knows more about data than anyone else out there. Unlikely, but possible... and if Google issue a statement that someone did indeed plagiarise Sohu's work, fine. It could happen anywhere. Doesn't make Google bad, only one programmer. It makes the company culpable, but it hardly looks malicious.

  2. not saying it's the case by creativeHavoc · · Score: 5, Insightful

    while i am not insisting that it is the case, it seems like it could easily be the same logic flaw. Different algorithms and code can produce the same mistake if you are using the same mis guided logic behind the problem. Thats why you see the same bugs in students' code in university, even when worked on separatly during a lab.

    --
    insight through the mind
  3. Re:Ironic, isn't it? by ScrewMaster · · Score: 4, Insightful

    Why is it that saying anything negative about another country is always turned into a discussion about racism and bigotry? It immediately poisons further dialog when it is applied without reason. If you have some reason to think the OP is prejudiced I'd like to hear it, because I didn't read that into his comment. I hear a lot of negative comments about the United States on Slashdot (yours, for one, which is interesting) but I don't immediately conclude that prejudice is the root of it. Sometimes it is, but it's nice to find that out first before jumping to any conclusions.

    The unfortunate fact of the matter is that China's government and industry are completely unconcerned about the source of the technology that they mass-produce and sell to everyone. They just don't care, period, and I suppose when you get right down to it there's no reason they should. On the other hand, that just means there's no reason why we should respect their "intellectual property" either, and when their scientists and engineers come up with something good they damn well shouldn't expect us to concern ourselves over their rights either. If Google did indeed rip off their Chinese counterparts my feeling is ... more power to 'em.

    So, it's not a statement of prejudice (e.g. "I dislike Chinese people because they are Chinese, or have yellow skin, or slanted eyes, or talk funny") but a legitimate observation on the state of affairs in that country.

    Just watch it when you start playing the race card without a good reason ... it prejudices any argument you make after that point.

    --
    The higher the technology, the sharper that two-edged sword.
  4. Or, basically... by mattgreen · · Score: 4, Insightful

    "This is our groupthink, it doesn't need to make sense. Now shut up and conform so you get your mod points!"

  5. Re:This wouldn't be the first time... by Anonymous Coward · · Score: 5, Insightful

    the dozens of person-years that went into writing the actual dictionaries for aspell were simply co-opted by Google. Get off your high horse - you're just another holy roller.

    Thousands of people donate their time, money, and code to GPL-licensed projects. As one of those contributors, I can tell you that I don't believe that Google is doing anything wrong at all with aspell. The terms of the license are clear. Users are no way required to give attribution. In fact, there is not even a suggestion, hint, or implication that attribution would be nice. You suggesting that it should be that way is fine, but to state that aspell was "co-opted" is factually incorrect and falsely implies that Google is doing something against the GPL license.

    If you, as a contributor to aspell, don't like aspell's license terms, you are free to start another project with similar goals under different license terms.
  6. Combing by eMbry00s · · Score: 5, Insightful

    Everybody who says something along the lines of "bah, chinese complaining about stealing" should note that all Chinese are not connected into one single conscious entity, but are different individuals.

    The people who own this IP need not have stolen any other IP.

    It is as dumb as saying that all Americans are christian, guntouting, fat fuckasses.