Google Cans Comment Spam
fthiess writes "Comment spam is in many ways even more annoying than regular email spam, since you generally have to do more than just hit the delete button to get rid of it. Its defining characteristic is that spammers abuse websites where the public can add content (blogs, wikis, forums, and even top referrer lists) to increase their own ranking in search engines. It seems, however, that the days of content spam are numbered: today Google announced that, in partnership with MSN Search and Yahoo!, that they have implemented a way to block content spam." (More below.)
"Briefly, you just change your blogging/wiki/forum/etc. software so that any hyperlinks in publicly-contributed text have a new rel=nofollow attribute added to any anchor tags. Google, MSN, and Yahoo! will now no longer index any such links, so the motive for content spamming disappears. Especially hopeful is the fact that a slew of makers of blogging software, including Six Apart, have announced they are supporting the new attribute."
It's nice to see Google, MSN, and Yahoo cooperating on this effort.
The NSA: The only part of the US government that actually listens.
Don't forget to put that attribute in your track-back links either :)
Simon.
It certainly will help filtering some of the spam sites out of Google rank and so on, but the links will still be there in blog comments, bulletin boards, etc. The Googlebot will not follow the links, but human readers won't see the NOFOLLOW tag - and they'll click. It means that moderators still have manual work to do.
Well, since MSN Search seems to apply the same policy as Google it would do them no good either.
Does HTML/XHTML allow "rel" attributes on links? And if so, is "nofollow" an allowed value for that tag?
The Tao of math: The numbers you can count are not the real numbers.
Slashdot could implement something like this, it would make article comments meaningful again.
--- "When I think back on all the crap I learned in high school, it's a wonder I can think at all..."
But what will happen then with the miserable failure and weapons of mass destruction? Can't anyone efficiently bomb google anymore?
Make even shorter URLs - 8LN.org
I'm not really into blogging so I don't know how big of a problem this is. I get some spam in my guestbook, which I promptly remove. The spam iteself is what's really irritaing, not the potential "elevating" of the spamvertised site in search-engines, where I've never personally run across one that I can remember.
Am I correct in assuming that these sites pops up and down relatively often? Maybe it'd be possible to use temporal component to the rating. Say if the link points to a site which was just registered two days ago, it's given a very very low weight, and then you ramp up as time goes by. As spam gets deleted from blogs and guestbooks, time would work against these spammers. Or? I dunno.
Belief is the currency of delusion.
Hmmm...if a malicious program adds the tag to links served by a compromised html server, you could have an interesting and different sort of denial of service attack, although it would be slow to take effect.
The NSA: The only part of the US government that actually listens.
Google, MSN, and Yahoo! will now no longer index any such links
Not quite. What happens is, that the link wont add anything to the site in question. As you probably all know, most search engines rank pages by incoming links - it's not just google. By adding this tag, the incoming link wont count.
I think this is a great idea. It will probably break the w3c compliance, but hey - anything to piss off a spammer.
Underholdning.info
Forums and Blogs often contain very useful links. What about them? What about all those sites that are *only* linked to from blogs and forums, and that actually are great and useful sites?
I just don't trust anything that bleeds for five days and doesn't die.
Actually, are there any plugins already in existence that modify the appearance of a link based on a regexp match?
Will this be implemented on Slashdot as well? Perhaps those with karma lower than neutral would get a rel="nofollow" tag added to the URLs they post?
for ch3aP Can.adi n v31g.r a?
Monstar L
You are assuming that Microsoft talks to Microsoft. With so many divisions and levels of middle management. It is possible for something like this to happen. Because they know they are competing with google but they never got the memo that the MSN Search is blocking these sites.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
RTFA. Slashdot could modify slashcode to automatically add the attribute to all links posted in comments. Comment spammers can't do anything about it, so they'll move away to other sites.
No normal links (i.e. not in visitor contributed content) should have the attribute. So slashdot will still be full of normal links; only the links in the comments will have the attribute.
This is your sig. There are thousands more, but this one is yours.
It's an interesting idea, but it's probably a matter of short order before MS starts to use this to cut out non-MS sites.
well, it does, kinda
people spam comment boards on sites with high pageranks.
Goolge's logic here is: If a high-ranked site links to site X, X's ranking also gets higher. If your site is spam/ad-ridden, this is step 3. Profit!
With rel=nofollow in place, this tactic no longer works.
No Revenues -> No reason to spam
QED
Exercise caution when modding this message up: the author acts like a jerk when his karma is excellent.
Wikipedia already implemented this feature. See here.
This is not a solution as far as I'm concerned.
Why stop the indexing of relative links from blogs to make google's life easier?
99% of the links posted in comments are relavent and would be beneficial to index. Why stop this for the 1% of jackasses out there?
The domains contained in the links from blogspam are well known, and there are plenty of blacklists out there. Why doesn't googleyahoomsn just remove these sites from its database? Its such an easy solution. I believe they already do this in some circumstances for link trading systems whose only goal is to get higher pagerank.
So if the big blogs use the attribute then spammers will go after the slow to upgrade folks, in self defense most of them will upgrade eventually.
Really even for a custom designed visitor book or blog it is not that hard to add the attribute to every hyperlink in user comments. Most such programs already do mangling and vetting of submitted html.
[Set Cain on fire and steal his lute.]
For the largest company in the world having the flattest organizational structure can still be big. I am sure most employees don't report to Balmer or Gates. I don't know how flat it is. But say for There is MSN Search team belongs to the MSN team who reports to Gates. Then there is the Front Page team that works for the Office Team that belongs to the software development team which reports to Gates. For a company that size the structure is very flat. But still there are middle management involved and Information may not spread across.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
While this will prevent spammers from bumping up their sites' Page Rank (probably their primary motivation for comment spam anyway), it doesn't prevent their bots from spamming targeted blogs etc. in the first place. That is still best handled by the blog software providers.
For example, WordPress has a variety of different plugins for handling comment spam. The best one I've seen renders a series of characters graphically (a la TicketBastard) which the user (a human, of course) has to type into a text field on the comment form before their comment is accepted. Blogs implementing this type of mechanism typically have spam coming from bots drop down to zero.
If it's merely a question of "rank", shouldn't the attribute be norank instead of nofollow? I expect a link tagged "nofollow" to, well, not get followed.
Belief is the currency of delusion.
There are a lot of people out there who understand the PageRank system, and complain that if they add outgoing links on their site then their previous PageRank will be "leaked" to other sites, rather than their own internal pages.
Well, luckily Google has now released a way for people to link to each other without leaking PageRank. Yes, the nofollow relation. So, now everyone can link to each other, and no-one gets any benefit out of it whatsoever.
This tag is not a bad idea, but I think the good things it could stamp out weren't considered anywhere near as much as the few bad things it can stamp out..
There is plugin for WordPress. However, the problem is that now even legitimate comment links won't have an effect, which is going to skew Google's results to favor only story links. I'm not sure we appreciate the full ramifications of this quite yet.
It makes me curious. Are people against Microsoft beacause of their business practices, their product, or just because they are a large company who did extremely well?
If only it was so simple. Links are the basis of the pagerank algorithm. This isn't just seom random coding task, but at the very forefront of computer science. Anyone can code some random condition for a good page, but the trick is to invent something that is feasible to compute for billions of pages and that gives even remotely good results for a good proportion of them.
It was a real advance when the google founders invented the pagerank algorithm. Before then, the state of the art was based on counting the words appearing on a page, and you had to go through several pages of search engine results to get something even approximately relevant (remember when AltaVista was the hot stuff?)
Right now every major search engine uses some variation of the pagerank algorithm -- the google founders were generous enough to publish the theory behind its operation, back when they were just graduate students at Stanford. Even AltaVista uses a related algorithm now. This is why it wasn't just google that was working on this rel=nofollow stuff, but other search engine people too.
In brief, it takes a lot of genius, luck, and experimentation to find a better algorithm. I'm sure people at google are working on it, but we're talking about real research-level stuff here, it's not something you can guarantee your success at.
This can be done whether it is linked in a blog or not, and will improve the overall quality of the search database.
- Erwin
What about not linking to people such jackasses would want to annoy ? Same result (no extra pagerank), just simpler.
Your abuse scheme seems a bit convoluted to me, or do I miss something ?
blah
- I had first a Yahoo account and recently got a GMAIL account. What was of interest is that Yahoo put the GMAIL request into their SPAM folder....
Don't worry too much, this might not be on purpose, but out of stupidity: I am using a (different) big free mailer and they manage to put _their_own_ announcements into the spam folder...How is that for content filtering?
Oh the horrors. And what if I created a site with 5 million links to my competitor's website, then stuck it behind a firewall on my corporate intranet so google couldn't search it at all?! Just think of how much damage I'd be doing to them, with all of those unindexed links!
Don't blame me; I'm never given mod points.
Hey! This is the first time that I can comment spam and have it not modded off topic!
Ha, ha! Nobody ever says Italy.
It is pretty easy to make rel="nofollow" visible to normal users too in modern web browsers using CSS. You could use something like this:
That will display the given image before any links marked as nofollow.
Sysadmin Geeks who have to clean up the messes left by shoddy Microsoft products, day after day, hate their products because they make extra work for us. We hate Outlook, IE, and IIS because their penchant for spreading worms and viruses. We hate service packs which break more than they fix. We hate Frontpage because of the non-standard, blecherous, broken HTML it spews forth. We hate the general lackadasical attitude Microsoft has about security and quality in general.
Libertarian-minded geeks hate Microsoft for their flagrant disregard for the law and the courts. We hate them for the way they blatantly infringe on other company's patents and lawyer their way out of it. We hate the way they bankrupt or buy out anyone making a product which actually competes with them. We hate the way they use puppet companies (SCO, BSA) as hired thugs to bully other companies on their behalf.
Anti-corporate geeks hate Microsoft because it's a prime example of corporate greed run amok and of the dangers of unfettered capitalism.
Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
This new tag will not restrict blogs from comming up in search results. It only restricts the Spamvertised sites from search results, not the blog with the spam links.
Those who would give up liberty in exchange for security and DRM should switch to Microsoft Palladium!
"(an example here)"
:-(. So much stuff you can't do without making it look broken to IE users (though I guess you could check the user agent string via PHP and modify the page based on that...
Wow! Thanks for that link! That site is awesome! It's amazing what you can accomplish using pure CSS magic!
Too bad IE still doesn't support all of CSS1 even
Again, thanks for the link! Everyone whos into webdesign should be forced to read that site before they start ruining the Internet.
Whenever I'm searching for technical information, a couple of sites always come up that are useless to me. They have a question/answer format, questions are left in the clear for search engines, while the answers require registration. What I need is a way to filter those sites out from my searches, so that they simply don't show up in any result set. Hmm might be a good excuse to play with writing Firefox plug-ins... :-)
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?