Slashdot Mirror


Poor Spelling Beats Google's China Filter

antifoidulus writes "CNN's money section contains a blurb(among other blurbs) about how poor spelling can beat Google's Chinese filter. The example given in the article is that a search for "Tiananmen" will yield peaceful pictures of the square, but a search for common mis-spellings such as "Tienanmen" will yield plenty of photos of tanks."

6 of 248 comments (clear)

  1. Obvious by poeidon1 · · Score: 5, Interesting

    that not everything can be filtered but this is a search using english alphabets. How good (read horrible) is the filter which searches using chinese langauge ?

    --
    They called me mad, and I called them mad, and damn them, they outvoted me. -Nathaniel Lee
  2. Exploiting Google's Page Rank by eldavojohn · · Score: 5, Interesting

    As we all know, Google has a patented page ranking system that calculates the correlation of words with websites. It does this (primarily) by reading links from all of its cached websites and parsing html links to determine what words are being used to describe the page in the link.

    A while back, this was known as Google Bombing and certain individuals exploited Google's system very effectively by linking to pages with words that, by all rights, were not very accurate. After all, do a Google search for the word 'failure' and the top site is George W. Bush's Whitehouse domain Biography.

    So what do you do to help the Chinese? Perhaps you could make a page with two columns. In one column would be the correct text with no link and the key word. In the other column would be all the permutated misspellings with links to the real sites. You could host this one your website and send it to friends asking them to also host it. They would need to slightly alter it and host it but it would effectively provide the page ranks for the misspellings and allow anyone in China (who has access to your page) a key if they need it.

    --
    My work here is dung.
  3. Interesting. by BoneFlower · · Score: 5, Interesting

    Now was this simply a failure of the filter method used, or did google deliberately create a weak filter to subvert the effort?

    1. Re:Interesting. by LiquidCoooled · · Score: 3, Interesting

      Google have done exactly what they were asked to do.
      Its like when the RIAA/MPAA ask to filter results from torrent sites - the exact request is blocked but variations continue.

      Censorship is futile and those who want the information can get it.

      --
      liqbase :: faster than paper
  4. Type of filter by 19061969 · · Score: 4, Interesting
    So (serious question to those more knowledgeable) does this mean that the Google filters are simple keyword matches then? I'm surprised because I would have though that they might have used something more complicated like cluster analysis. For example latent semantic analysis could well have noted mis-spellings of words and clustered them together with the correct spelling thus allowing the misspellings to be filtered out too.

    LSA is useful for dealing with synonyms, so I cannot see any reason why it wouldn't work with misspellings (assuming that they're common).

    --
    bang goes my karma... again...
  5. Re:Tanks by magarity · · Score: 4, Interesting

    It's not just any picture of tanks; it's the picture of that guy who paused on the way home from shopping to stand in front of four tanks. You know, big metal machines that can squash a pedestrian flat without noticing? Amazingly, as famous as this picture is it is unknown inside China. My Chinese friends in college had never seen it or anything of those ill fated demonstrations despite being in Beijing when it was happening. The word on the street in town during the protests was simply that 'something is happening' and everybody better stay in their homes if they know what's good for them. The Chinese government's crackdown on the media is impressively (depressingly?) comprehensive.