Google Cans Comment Spam
fthiess writes "Comment spam is in many ways even more annoying than regular email spam, since you generally have to do more than just hit the delete button to get rid of it. Its defining characteristic is that spammers abuse websites where the public can add content (blogs, wikis, forums, and even top referrer lists) to increase their own ranking in search engines. It seems, however, that the days of content spam are numbered: today Google announced that, in partnership with MSN Search and Yahoo!, that they have implemented a way to block content spam." (More below.)
"Briefly, you just change your blogging/wiki/forum/etc. software so that any hyperlinks in publicly-contributed text have a new rel=nofollow attribute added to any anchor tags. Google, MSN, and Yahoo! will now no longer index any such links, so the motive for content spamming disappears. Especially hopeful is the fact that a slew of makers of blogging software, including Six Apart, have announced they are supporting the new attribute."
It's nice to see Google, MSN, and Yahoo cooperating on this effort.
The NSA: The only part of the US government that actually listens.
Don't forget to put that attribute in your track-back links either :)
Simon.
It certainly will help filtering some of the spam sites out of Google rank and so on, but the links will still be there in blog comments, bulletin boards, etc. The Googlebot will not follow the links, but human readers won't see the NOFOLLOW tag - and they'll click. It means that moderators still have manual work to do.
Well, since MSN Search seems to apply the same policy as Google it would do them no good either.
Does HTML/XHTML allow "rel" attributes on links? And if so, is "nofollow" an allowed value for that tag?
The Tao of math: The numbers you can count are not the real numbers.
Slashdot could implement something like this, it would make article comments meaningful again.
--- "When I think back on all the crap I learned in high school, it's a wonder I can think at all..."
I'm not really into blogging so I don't know how big of a problem this is. I get some spam in my guestbook, which I promptly remove. The spam iteself is what's really irritaing, not the potential "elevating" of the spamvertised site in search-engines, where I've never personally run across one that I can remember.
Am I correct in assuming that these sites pops up and down relatively often? Maybe it'd be possible to use temporal component to the rating. Say if the link points to a site which was just registered two days ago, it's given a very very low weight, and then you ramp up as time goes by. As spam gets deleted from blogs and guestbooks, time would work against these spammers. Or? I dunno.
Belief is the currency of delusion.
Hmmm...if a malicious program adds the tag to links served by a compromised html server, you could have an interesting and different sort of denial of service attack, although it would be slow to take effect.
The NSA: The only part of the US government that actually listens.
Forums and Blogs often contain very useful links. What about them? What about all those sites that are *only* linked to from blogs and forums, and that actually are great and useful sites?
I just don't trust anything that bleeds for five days and doesn't die.
Actually, are there any plugins already in existence that modify the appearance of a link based on a regexp match?
for ch3aP Can.adi n v31g.r a?
Monstar L
You are assuming that Microsoft talks to Microsoft. With so many divisions and levels of middle management. It is possible for something like this to happen. Because they know they are competing with google but they never got the memo that the MSN Search is blocking these sites.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
RTFA. Slashdot could modify slashcode to automatically add the attribute to all links posted in comments. Comment spammers can't do anything about it, so they'll move away to other sites.
No normal links (i.e. not in visitor contributed content) should have the attribute. So slashdot will still be full of normal links; only the links in the comments will have the attribute.
This is your sig. There are thousands more, but this one is yours.
Wikipedia already implemented this feature. See here.
This is not a solution as far as I'm concerned.
Why stop the indexing of relative links from blogs to make google's life easier?
99% of the links posted in comments are relavent and would be beneficial to index. Why stop this for the 1% of jackasses out there?
The domains contained in the links from blogspam are well known, and there are plenty of blacklists out there. Why doesn't googleyahoomsn just remove these sites from its database? Its such an easy solution. I believe they already do this in some circumstances for link trading systems whose only goal is to get higher pagerank.
For the largest company in the world having the flattest organizational structure can still be big. I am sure most employees don't report to Balmer or Gates. I don't know how flat it is. But say for There is MSN Search team belongs to the MSN team who reports to Gates. Then there is the Front Page team that works for the Office Team that belongs to the software development team which reports to Gates. For a company that size the structure is very flat. But still there are middle management involved and Information may not spread across.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
While this will prevent spammers from bumping up their sites' Page Rank (probably their primary motivation for comment spam anyway), it doesn't prevent their bots from spamming targeted blogs etc. in the first place. That is still best handled by the blog software providers.
For example, WordPress has a variety of different plugins for handling comment spam. The best one I've seen renders a series of characters graphically (a la TicketBastard) which the user (a human, of course) has to type into a text field on the comment form before their comment is accepted. Blogs implementing this type of mechanism typically have spam coming from bots drop down to zero.
There are a lot of people out there who understand the PageRank system, and complain that if they add outgoing links on their site then their previous PageRank will be "leaked" to other sites, rather than their own internal pages.
Well, luckily Google has now released a way for people to link to each other without leaking PageRank. Yes, the nofollow relation. So, now everyone can link to each other, and no-one gets any benefit out of it whatsoever.
This tag is not a bad idea, but I think the good things it could stamp out weren't considered anywhere near as much as the few bad things it can stamp out..
Oh the horrors. And what if I created a site with 5 million links to my competitor's website, then stuck it behind a firewall on my corporate intranet so google couldn't search it at all?! Just think of how much damage I'd be doing to them, with all of those unindexed links!
Don't blame me; I'm never given mod points.
It is pretty easy to make rel="nofollow" visible to normal users too in modern web browsers using CSS. You could use something like this:
That will display the given image before any links marked as nofollow.
Sysadmin Geeks who have to clean up the messes left by shoddy Microsoft products, day after day, hate their products because they make extra work for us. We hate Outlook, IE, and IIS because their penchant for spreading worms and viruses. We hate service packs which break more than they fix. We hate Frontpage because of the non-standard, blecherous, broken HTML it spews forth. We hate the general lackadasical attitude Microsoft has about security and quality in general.
Libertarian-minded geeks hate Microsoft for their flagrant disregard for the law and the courts. We hate them for the way they blatantly infringe on other company's patents and lawyer their way out of it. We hate the way they bankrupt or buy out anyone making a product which actually competes with them. We hate the way they use puppet companies (SCO, BSA) as hired thugs to bully other companies on their behalf.
Anti-corporate geeks hate Microsoft because it's a prime example of corporate greed run amok and of the dangers of unfettered capitalism.
Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
Whenever I'm searching for technical information, a couple of sites always come up that are useless to me. They have a question/answer format, questions are left in the clear for search engines, while the answers require registration. What I need is a way to filter those sites out from my searches, so that they simply don't show up in any result set. Hmm might be a good excuse to play with writing Firefox plug-ins... :-)
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?