Google Warns Users About "Unsafe Sites"
Dynamoo writes "The BBC is reporting that Google will start to warn users about unsafe websites, in particular those that host spyware or have privacy implications. The technology to do this has been developed in partnership with StopBadware, and appears to be an alternative to the popular McAfee SiteAdvisor application. Perhaps this will help curtail slimeware ridden sites from peddling their wares. But it will be interesting to see how Google rates some of its own products, including the potentially risky Google Desktop."
How do you handle sites where the bad pages are hidden behind a robots file? The front page may be crawlable, but the page with the malware isn't.
How do they handle redirects? If I have a site that redirects a user to bad content, is the original page flagged as bad? Combined with a page that isn't crawled, how would they know to flag it?
How are they going to handle any obfuscation that takes place? Or handle new malware? This might not be a show-stopper, but I think it is a techinical issue that should be addressed.
How are they going to handle the lag between crawling and new content? My server gets crawled about once a week. So I would have ~6 days to host bad content before switching it back to look legit for my next Google crawl.
What system are they going to have to handle complaints or appeals? If my site is flagged incorrectly, Google is taking a risk of liability by flagging it that way. It seems that if they take due diligence to keep the false positives low, there will be an increase in false negatives.
These are just off the top of my head and I am sure there are a lot more issues that I haven't thought of.
Reading code is like reading the dictionary - you have to read half of it before you can go back and understand it.
Which* standards does Google support?
I mean, MSN Search does a better job of meeting the W3C's "standards" than Google does.
* When I clicked that link I got a validation check for google.co.jp, but google.com has the same "Optimized so it downloads better on my 2400 baud modem" approach to its source.
For more information, click here.