Poor Spelling Beats Google's China Filter
antifoidulus writes "CNN's money section contains a blurb(among other blurbs) about how poor spelling can beat Google's Chinese filter. The example given in the article is that a search for "Tiananmen" will yield peaceful pictures of the square, but a search for common mis-spellings such as "Tienanmen" will yield plenty of photos of tanks."
Kind of reminds me of when Napster installed that half-assed search filter. Midonna and Mitallica suddenly became quite popular.
People who want to get information will get it, and you can't stop them.
This is a perfect example of why I've been saying all along that google is making the right decision in cooperating with the Chinese Government: http://yro.slashdot.org/comments.pl?sid=175251&cid =14571383
Who would have thought a thechnique spammers use to beat filters would have real-world value.
Is Google's filter Baysian based?
Ignorance is curable, stupid is forever.
It would probably be better to *NOT* point these things out.
...and so the weakness of computers is revealed: people and their presumption of perfection.
Sig? - yeah, whatever.
Google has really good suggested search terms for typos. Hint, hint. Skeet, skeet.
A NYC lawyer blogs. http://www.chuangblog.com/
SHUT UP!
Do you want to ruin it?
Come on, damnit! Shutupabout it.
Consider this the "getting your foot kicked under the table" move.
Check out my sysadmin blog!
In Chinese, a single character ( for example -- though I'm not sure if this will display properly) represents a whole syllable (as well as a meaning or idea), rather than a consonant or vowel, as most English letters do (some are unpronounced, or just change the sound of another letter).
This eliminates certain types of bad spellings, obviously, but opens certain avenues that aren't available in English, such as choosing characters with similar meanings but different sounds, or similar sounds but different meanings.
For the Tiananmen example, the characters for TianAnMen () mean "Heaven," "Peace," "Gate." Heaven could be replaced with "Sky," which has a completely different sound, or "Money," which (if I rcall correctly) is pronounced "Qian" (Q sounds close to English CH). This could also happen with with the other two characters in this word, and of course for many other 'bad' words.
The reason that common words like "pr0n" have become associated with porn, or other examples, is that a community of users agreed upon a certain misspelling of those words, and the same can and WILL happen in China to evade whatever filters search engines use. There is no way to have an even semi-open search system that doesn't allow human ingenuity to overcome its filters, and the brief history of the internet in the west indicates that these filters will, ultimately, be only partially and temporarily effective.
Although the moon is smaller than the earth, it is farther away.