Slashdot Mirror


Poor Spelling Beats Google's China Filter

antifoidulus writes "CNN's money section contains a blurb(among other blurbs) about how poor spelling can beat Google's Chinese filter. The example given in the article is that a search for "Tiananmen" will yield peaceful pictures of the square, but a search for common mis-spellings such as "Tienanmen" will yield plenty of photos of tanks."

14 of 248 comments (clear)

  1. Obvious by poeidon1 · · Score: 5, Interesting

    that not everything can be filtered but this is a search using english alphabets. How good (read horrible) is the filter which searches using chinese langauge ?

    --
    They called me mad, and I called them mad, and damn them, they outvoted me. -Nathaniel Lee
    1. Re:Obvious by Anonymous Coward · · Score: 1, Interesting

      I don't know if there's a Chinese equivalent to 13375p34k, but there is in Japanese. Gal Characters (in Japanese, but with examples of Chinese character obfuscation) seem to fill the same need for intentional misspelling.

  2. Exploiting Google's Page Rank by eldavojohn · · Score: 5, Interesting

    As we all know, Google has a patented page ranking system that calculates the correlation of words with websites. It does this (primarily) by reading links from all of its cached websites and parsing html links to determine what words are being used to describe the page in the link.

    A while back, this was known as Google Bombing and certain individuals exploited Google's system very effectively by linking to pages with words that, by all rights, were not very accurate. After all, do a Google search for the word 'failure' and the top site is George W. Bush's Whitehouse domain Biography.

    So what do you do to help the Chinese? Perhaps you could make a page with two columns. In one column would be the correct text with no link and the key word. In the other column would be all the permutated misspellings with links to the real sites. You could host this one your website and send it to friends asking them to also host it. They would need to slightly alter it and host it but it would effectively provide the page ranks for the misspellings and allow anyone in China (who has access to your page) a key if they need it.

    --
    My work here is dung.
    1. Re:Exploiting Google's Page Rank by LiquidCoooled · · Score: 2, Interesting

      And then google will be politely asked to remove the domain.

      They aren't stupid.

      --
      liqbase :: faster than paper
    2. Re:Exploiting Google's Page Rank by gavri · · Score: 2, Interesting

      That is only for you people http://www.google.com/search?num=100&hl=en&safe=of f&as_qdr=all&q=failure&btnG=Search&meta=

      Here, in India, it's still Bush http://www.google.co.in/search?num=100&hl=en&safe= off&as_qdr=all&q=failure&btnG=Search&meta=

      Google has never before given me different search results for google.co.in and google.com

      This is the first time I'm seeing different results for these two domains.

  3. Interesting. by BoneFlower · · Score: 5, Interesting

    Now was this simply a failure of the filter method used, or did google deliberately create a weak filter to subvert the effort?

    1. Re:Interesting. by LiquidCoooled · · Score: 3, Interesting

      Google have done exactly what they were asked to do.
      Its like when the RIAA/MPAA ask to filter results from torrent sites - the exact request is blocked but variations continue.

      Censorship is futile and those who want the information can get it.

      --
      liqbase :: faster than paper
  4. Tanks by capnspanky · · Score: 2, Interesting

    ...search for common mis-spellings such as "Tienanmen" will yield plenty of photos of tanks.

    So I did a Google search and all those pictures of tanks are basically one photo hosted on different sites.

    1. Re:Tanks by magarity · · Score: 4, Interesting

      It's not just any picture of tanks; it's the picture of that guy who paused on the way home from shopping to stand in front of four tanks. You know, big metal machines that can squash a pedestrian flat without noticing? Amazingly, as famous as this picture is it is unknown inside China. My Chinese friends in college had never seen it or anything of those ill fated demonstrations despite being in Beijing when it was happening. The word on the street in town during the protests was simply that 'something is happening' and everybody better stay in their homes if they know what's good for them. The Chinese government's crackdown on the media is impressively (depressingly?) comprehensive.

  5. Type of filter by 19061969 · · Score: 4, Interesting
    So (serious question to those more knowledgeable) does this mean that the Google filters are simple keyword matches then? I'm surprised because I would have though that they might have used something more complicated like cluster analysis. For example latent semantic analysis could well have noted mis-spellings of words and clustered them together with the correct spelling thus allowing the misspellings to be filtered out too.

    LSA is useful for dealing with synonyms, so I cannot see any reason why it wouldn't work with misspellings (assuming that they're common).

    --
    bang goes my karma... again...
    1. Re:Type of filter by Anonymous Coward · · Score: 1, Interesting

      Google probably could have done that, but it would have meant more effort to do a morally undesirable thing. So they went with the cheapest and quickest thing, that just barely fulfilled the design requirements.

  6. Re:Valuable Lesson from Spammers by ceeam · · Score: 2, Interesting

    First - I don't think it would have any "real-world value". Using words like "warez" may have some "real-world value" but I think the moment some misspelled word becomes a dissident symbol, Google would have to filter it out.

    Second - let's all not forget that Chinese don't quite "spell" it when writing. I don't know how well (if at all) bayesian filtering and stuff would work for "kanji" (or how do they call it?)

  7. Re:This is exactly why I said Google was good! by Anonymous Coward · · Score: 1, Interesting

    As I recall, the exact same arguments were made by corporations such as Coca Cola who did business in South Africa under the Apartheid regime. They claimed they were helping bring about reform from within, giving good jobs to blacks, etc. And incidentally promoting the regime and helping to undercut the resistance.

  8. Does Google filter other languages? by IAAP · · Score: 2, Interesting

    They're filtering English mispellings, but what about French, Spanish, or German? A Chinese person could just search for what they're looking for under different languages. Granted, English is taught in China in their schools to everyone, but the folks who know other languages can start getting things and spreading it to the others.