Google Faces Plagiarism Questions Over Chinese Software
yaohua2000 writes "Google's laboratory in China has launched its first product, a Pinyin Input Method Editor. The software allows the romanized characters to be translated to more traditional Chinese symbols , via entering on a QWERTY keyboard. Users soon discovered that the data Google used for the product was unusually similar to the data used by a Chinese rival, Sogou. Google has evaded the question about software similarities, reports PC World. 'The similarities, which included an error involving the name of a celebrity, were noted on a Google Labs discussion board about its Pinyin IME. Users noted that entering the Pinyin pinggong into the Google IME incorrectly produced the name of Feng Gong, an actor and comedian.'"
Blame the Sogou authors, and call them inhuman. Also say it isn't plagarism because it's beta.
Let me be the first to say... WHAT?
Coming up with the same algorithm isn't terribly unlikely. Structuring it in the same way is not uncommon either. Making exactly the same mistakes, however, is hard to believe.
Am I part of the core demographic for Swedish Fish?
while i am not insisting that it is the case, it seems like it could easily be the same logic flaw. Different algorithms and code can produce the same mistake if you are using the same mis guided logic behind the problem. Thats why you see the same bugs in students' code in university, even when worked on separatly during a lab.
insight through the mind
Unfortunately, since the IME is only used by Chinese speakers, most reports and discussions about this are in Chinese as well. For example, Sina has published an announcement (in Chinese) from Google admitting that they indeed "used data from non-Google sources" during the testing stage.
There were actually much more evidence than the PC World article mentioned, the most convincing being that Google IME included many names of the developers of Sogou IME.
Although according to the other users (I don't use Google Pinyin myself now, or Windows for that matter), the error has been fixed - and those developer names has been removed - in the most recent version of Google IME (1.0.17.0).
Ming
Why is it that saying anything negative about another country is always turned into a discussion about racism and bigotry? It immediately poisons further dialog when it is applied without reason. If you have some reason to think the OP is prejudiced I'd like to hear it, because I didn't read that into his comment. I hear a lot of negative comments about the United States on Slashdot (yours, for one, which is interesting) but I don't immediately conclude that prejudice is the root of it. Sometimes it is, but it's nice to find that out first before jumping to any conclusions.
... more power to 'em.
... it prejudices any argument you make after that point.
The unfortunate fact of the matter is that China's government and industry are completely unconcerned about the source of the technology that they mass-produce and sell to everyone. They just don't care, period, and I suppose when you get right down to it there's no reason they should. On the other hand, that just means there's no reason why we should respect their "intellectual property" either, and when their scientists and engineers come up with something good they damn well shouldn't expect us to concern ourselves over their rights either. If Google did indeed rip off their Chinese counterparts my feeling is
So, it's not a statement of prejudice (e.g. "I dislike Chinese people because they are Chinese, or have yellow skin, or slanted eyes, or talk funny") but a legitimate observation on the state of affairs in that country.
Just watch it when you start playing the race card without a good reason
The higher the technology, the sharper that two-edged sword.
"This is our groupthink, it doesn't need to make sense. Now shut up and conform so you get your mod points!"
Thousands of people donate their time, money, and code to GPL-licensed projects. As one of those contributors, I can tell you that I don't believe that Google is doing anything wrong at all with aspell. The terms of the license are clear. Users are no way required to give attribution. In fact, there is not even a suggestion, hint, or implication that attribution would be nice. You suggesting that it should be that way is fine, but to state that aspell was "co-opted" is factually incorrect and falsely implies that Google is doing something against the GPL license.
If you, as a contributor to aspell, don't like aspell's license terms, you are free to start another project with similar goals under different license terms.
Everybody who says something along the lines of "bah, chinese complaining about stealing" should note that all Chinese are not connected into one single conscious entity, but are different individuals.
The people who own this IP need not have stolen any other IP.
It is as dumb as saying that all Americans are christian, guntouting, fat fuckasses.
Care to release those words that prove that Google uses Aspell? I don't see any proof in your post, just claims that are impossible to verify because you give very little information. You're an author of some dictionary that's used in Aspell, you put intentionally misspelled words in your dictionary, but you don't tell us which dictionary or which words, so what do we have to go by? Why is your post any more trustworthy than any other AC post? Furthermore, it's pretty suspicious that you claim that you INTENTIONALLY put incorrect words in your dictionary to catch people using it as part of a larger project, when such use is perfectly legal. Things like that undermine Aspell's credibility as a reference tool, which, as a contributor, I would think you'd care about.
Karma: Contrapositive
This confirms it: meta-discussion of Slashdot makes for karma whoring. Now, can I recurse again and have that be the case?
Just fucking google it ;)
Chinese is a complex language to write. It doesn't use an alphabet (like most western languages). It doesn't even use syllables (like, for example, 2 of the Japanese writing system), it uses logographs : in an over-simplified way, we can say they use 1 symbol for every different word/idea/etc.
This makes thousands of different symbols (According to wikipedia : a little less than 50k variants in the Kangxi dictionary).
This ISN'T something you can put on a regular occidental 107 keys keyboard.
Therefor you have several solutions :
- Custom keyboards :
Use special keyboards where the most frequently couple of thousand of symbols are present.
Not very practical (symbols harder to find compared to looking for a letter on a 107 keyboard). Wikipedia has a picture.
- By shape of characters :
Either by handwriting recognition, or by decomposing charachters (the different strokes) and putting them on a regular keyboard layout.
- By sound of words :
Either by using something like Zhuyin which is system that was invented to help teaching chinese. It has 31 symbols, 1 for each consonant or vowel in chinese. As such, it can be used for other purposes, like putting it on a keyboard : the person type the sound and the software guess the corresponding word/logogram.
Or an alternative method is the Pinyin : it uses latin letters to write the sound. (And thus is interesting for computers on which latin keyboards are widespread).
The mapping of sound to logographs isn't completely straightforward, for example Chinese is a tonal language, but some system don't require the writer to specify tones using marks. Some software work is required. And this software isn't infallible.
Google released such a software. User can phonetically type Chinese on any occidental keyboard using (tone-less) pinyin, and the software tries to convert it to actual Chinese characters.
This software produce the same correct results as another popular one. (Hopefully. If the google soft didn't give the correct results, there would be problems. I wouldn't be a functional pinyin input system).
Sometime, the software hesitates and give a choice of possibilities. Most of the time, the same as the concurrent (Possibly explained by the fact that both softwares have to process the same user input, using the same pronunciation system that isn't unambiguous).
But, sometime the Google soft is plain wrong, and produces the same errors as the concurrent. And THIS is suspicious, because maybe some part of the software uses piece from the concurrent (part of the algorithm ? statistical data ?)
The company is suing googles on the grounds that if both softwares behave the same down to the bugs, maybe some part could have been illegally copied.
Meanwhile, adepts of Google Seppuku rejoiced world wide a cheap and easy to find software that could also be used to produce random chinese caracter to be subsequently imported into Google as Kanji.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Google has learned how to do business in China.
Congrats to them.