Slashdot Mirror


Google Faces Plagiarism Questions Over Chinese Software

yaohua2000 writes "Google's laboratory in China has launched its first product, a Pinyin Input Method Editor. The software allows the romanized characters to be translated to more traditional Chinese symbols , via entering on a QWERTY keyboard. Users soon discovered that the data Google used for the product was unusually similar to the data used by a Chinese rival, Sogou. Google has evaded the question about software similarities, reports PC World. 'The similarities, which included an error involving the name of a celebrity, were noted on a Google Labs discussion board about its Pinyin IME. Users noted that entering the Pinyin pinggong into the Google IME incorrectly produced the name of Feng Gong, an actor and comedian.'"

8 of 187 comments (clear)

  1. This is big news in China by Anonymous Coward · · Score: 5, Informative

    Unfortunately, since the IME is only used by Chinese speakers, most reports and discussions about this are in Chinese as well. For example, Sina has published an announcement (in Chinese) from Google admitting that they indeed "used data from non-Google sources" during the testing stage.

    There were actually much more evidence than the PC World article mentioned, the most convincing being that Google IME included many names of the developers of Sogou IME.

    Although according to the other users (I don't use Google Pinyin myself now, or Windows for that matter), the error has been fixed - and those developer names has been removed - in the most recent version of Google IME (1.0.17.0).

    Ming

  2. Re:"Google's" ? by Anonymous Coward · · Score: 1, Informative

    How long has laboratory been a verb? The title previously read "Google's Faces Plagiarism Questions Over Chinese Software"
  3. Re:not saying it's the case by eggstone · · Score: 4, Informative

    Well, if it is kind of programing bug, then the reasoning is fine. However, google is simply using sougou's dictionary. In fact, sougou's dictionary contains several developers' names which can be produced as the 1st choice if input their name, such as Tong Zi Jian, Zhao Li Yang, Lv Jie Yong, and Ru Li Yun. It is impossible for google to use sougou's developers' names in google's dictionary except they are simply copying the whole dictionary. Notice that although those names were in google's Pinyin input 1.0.15.0. they are removed in the newer version 1.0.16.0.

  4. Re:"Google's" ? by Anonymous Coward · · Score: 1, Informative

    The link you mentioned specifically refers to gerunds. A gerund (verb ending in "ing") is not the same thing as a standard (not ending in "ing") verb. If you're going to correct the grammar police, at least make sure you've got your own grammar correct...

  5. Re:Google Should Defend Themselves the OpenBSD Way by Anonymous Coward · · Score: 2, Informative

    The story didn't come from Sohu/Sogou. The copying was originally discovered by bloggers and BBS posters, and Sohu only made their statement once the story had crossed over into the mainstream media and they were being asked about it by journalists. They didn't give any comment at all for the first couple of days.

  6. Clarifications. by MaWeiTao · · Score: 2, Informative

    There seem to be a few misunderstandings here regarding Chinese text entry. First, because this is China and the official language is Mandarin Chinese. This means there are 37 distinct syllables, not the hundreds some have claimed. The distinction is that in addition to those there are 5 tones. This doesn't mean there are that many syllables times the number of tones. Think of tones as accents. Additionally, certain syllables only appear in certain places in a word. So it isn't quite an overwhelming task to type Chinese on a computer as you'd think.

    The keyboards used in China, Taiwan, Singapore and even Japan are almost always QWERTY, but that's irrelevent. Virtually nobody except Westerners use that to type. Printed on Chinese keyboards are 4 sets of characters. The first set is our alphabet, and the next 3 sets include characters for different text entry methods.

    I don't know about China, but in Taiwan one of the sets is Zhuyin fuhao. That system, as I've seen mentioned here, is a set of simple characters, each corresponding to a distinct sound, 21 consonants and 16 syllables. It's the closest thing to a Chinese alphabet in existence. It's only really used for educational purposes, but I don't see why it isn't widely adopted in the same way the Japanese use hiragana or katakana.

    Anyway, that system is comparable to Pin Yin, which is more or less a romanized version of the same thing and it's what is used for signage in China, and now in Taiwan as well. This is the method a westerner is more likely to use to type Chinese.

    The funny thing about Chinese is that the same word could have many different meanings each of which has a distinct character. So you type the word, including the appropriate tones and up comes a list with all the corresponding characters. Then one character is chosen from a list. It's kind of like predictive text. In same cases, when a set of characters produce a meaning, upon entering the first character the user is given a list of additional characters. It's all done, obviously to speed up the typing process.

    So, this input method can be sufficiently quick. Comparable to typing English. However, there are other entry methods, based on different factors which can be more precise and significantly quicker. I have no idea how to use any of those, but it's my impression that typing in those methods can be quite faster than most people typing in English.

    Of course, this begs the question, why did Google bother coming up with their own system? Things are always a bit of a mess with all the options out there.

    As for the possibility of code being plagiarized. I'm really not surprised at all. This is one of the consequences of outsourcing. The company might have a policy against this sort of thing, but the programmer clearly didn't care. He probably thought he could save himself a bit of trouble and ultimately saw nothing wrong with it. I've experienced similar things first hand. Unless you have a team you trust there needs to be a lot of oversight and careful management

    1. Re:Clarifications. by MaWeiTao · · Score: 2, Informative

      You might be right. I can't speak for everyone in China. In Taiwan, and other regions, however, no one uses Pinyin or any other romanized systems, not even with mobile phones.

  7. The plagiarism has been confirmed by Google by gam3cub3 · · Score: 3, Informative

    Plagiarism has been confirmed officially by Google, Sohu and IDG news reporter Sumner Lemon.

    Google admits word database came from third party - Network World

    http://www.networkworld.com/news/2007/040907-updat e-google-admits-word-database.html

    An earlier report by the same reporter: Sohu to Google: Take down copycat software
    http://www.networkworld.com/news/2007/040707-sohu- to-google-take-down.html

    Google China's Official Apology to Sohu.com (in Chinese)
    http://googlechinablog.com/2007/04/blog-post.html