Google Cans Comment Spam
fthiess writes "Comment spam is in many ways even more annoying than regular email spam, since you generally have to do more than just hit the delete button to get rid of it. Its defining characteristic is that spammers abuse websites where the public can add content (blogs, wikis, forums, and even top referrer lists) to increase their own ranking in search engines. It seems, however, that the days of content spam are numbered: today Google announced that, in partnership with MSN Search and Yahoo!, that they have implemented a way to block content spam." (More below.)
"Briefly, you just change your blogging/wiki/forum/etc. software so that any hyperlinks in publicly-contributed text have a new rel=nofollow attribute added to any anchor tags. Google, MSN, and Yahoo! will now no longer index any such links, so the motive for content spamming disappears. Especially hopeful is the fact that a slew of makers of blogging software, including Six Apart, have announced they are supporting the new attribute."
It certainly will help filtering some of the spam sites out of Google rank and so on, but the links will still be there in blog comments, bulletin boards, etc. The Googlebot will not follow the links, but human readers won't see the NOFOLLOW tag - and they'll click. It means that moderators still have manual work to do.
Well, since MSN Search seems to apply the same policy as Google it would do them no good either.
Does HTML/XHTML allow "rel" attributes on links? And if so, is "nofollow" an allowed value for that tag?
The Tao of math: The numbers you can count are not the real numbers.
Slashdot could implement something like this, it would make article comments meaningful again.
--- "When I think back on all the crap I learned in high school, it's a wonder I can think at all..."
I'm not really into blogging so I don't know how big of a problem this is. I get some spam in my guestbook, which I promptly remove. The spam iteself is what's really irritaing, not the potential "elevating" of the spamvertised site in search-engines, where I've never personally run across one that I can remember.
Am I correct in assuming that these sites pops up and down relatively often? Maybe it'd be possible to use temporal component to the rating. Say if the link points to a site which was just registered two days ago, it's given a very very low weight, and then you ramp up as time goes by. As spam gets deleted from blogs and guestbooks, time would work against these spammers. Or? I dunno.
Belief is the currency of delusion.
Actually, are there any plugins already in existence that modify the appearance of a link based on a regexp match?