Google Cans Comment Spam
fthiess writes "Comment spam is in many ways even more annoying than regular email spam, since you generally have to do more than just hit the delete button to get rid of it. Its defining characteristic is that spammers abuse websites where the public can add content (blogs, wikis, forums, and even top referrer lists) to increase their own ranking in search engines. It seems, however, that the days of content spam are numbered: today Google announced that, in partnership with MSN Search and Yahoo!, that they have implemented a way to block content spam." (More below.)
"Briefly, you just change your blogging/wiki/forum/etc. software so that any hyperlinks in publicly-contributed text have a new rel=nofollow attribute added to any anchor tags. Google, MSN, and Yahoo! will now no longer index any such links, so the motive for content spamming disappears. Especially hopeful is the fact that a slew of makers of blogging software, including Six Apart, have announced they are supporting the new attribute."
It's nice to see Google, MSN, and Yahoo cooperating on this effort.
The NSA: The only part of the US government that actually listens.
How is that for content filtering?
Don't forget to put that attribute in your track-back links either :)
Simon.
How soon before MS frontpage puts this in all URLs, to nobble google...
It certainly will help filtering some of the spam sites out of Google rank and so on, but the links will still be there in blog comments, bulletin boards, etc. The Googlebot will not follow the links, but human readers won't see the NOFOLLOW tag - and they'll click. It means that moderators still have manual work to do.
Easy! I read about this at Cnet and it looks like if you don't have a plugin it might be hard to implement.
Dashboard Widgets
Does HTML/XHTML allow "rel" attributes on links? And if so, is "nofollow" an allowed value for that tag?
The Tao of math: The numbers you can count are not the real numbers.
Slashdot could implement something like this, it would make article comments meaningful again.
--- "When I think back on all the crap I learned in high school, it's a wonder I can think at all..."
And it's great to see most of the major bloggers software makers are supporting this. Now, how long until everybody upgrades? That's the real question, and whoever can come up with a solution to that will really solve the problem.
That's what *I* want to know!
Sightly reminiscent of the 'evil bit' RPC - only they appear to be serious.
When Guns are outlawed, only outlaws will have guns ...
:)
Soon you'll see that only links with tag rel="nofollow" will count as geniune links because the spammers do NOT use that as much as the regular users..
Just like spammers were the first people to implement DomainKeys
Quidquid latine dictum sit, altum videtur
But what will happen then with the miserable failure and weapons of mass destruction? Can't anyone efficiently bomb google anymore?
Make even shorter URLs - 8LN.org
I'm not really into blogging so I don't know how big of a problem this is. I get some spam in my guestbook, which I promptly remove. The spam iteself is what's really irritaing, not the potential "elevating" of the spamvertised site in search-engines, where I've never personally run across one that I can remember.
Am I correct in assuming that these sites pops up and down relatively often? Maybe it'd be possible to use temporal component to the rating. Say if the link points to a site which was just registered two days ago, it's given a very very low weight, and then you ramp up as time goes by. As spam gets deleted from blogs and guestbooks, time would work against these spammers. Or? I dunno.
Belief is the currency of delusion.
Hmmm...if a malicious program adds the tag to links served by a compromised html server, you could have an interesting and different sort of denial of service attack, although it would be slow to take effect.
The NSA: The only part of the US government that actually listens.
I'm glad that Google has stepped up to deny spammers Page Rank. It's the greatest feeling in the world, knowing that all their effort in spamming people was for nothing. I have WordPress, and there's been no mention of a plugin for it yet, so until someone creates a plugin, I won't be able to use the rel="nofollow", short of manually editing URLs. But it's not like anything gets through, anyways. I have a spam filter set up, and a few other tricks on my blog, and it stops 99% of spam.
Google, MSN, and Yahoo! will now no longer index any such links
Not quite. What happens is, that the link wont add anything to the site in question. As you probably all know, most search engines rank pages by incoming links - it's not just google. By adding this tag, the incoming link wont count.
I think this is a great idea. It will probably break the w3c compliance, but hey - anything to piss off a spammer.
Underholdning.info
There are to many custom BLOG software out there and many of these programmers don't read slashdot (or may not read it today) or check with Google Yahoo or MSN, are concerned with there blogging software messing with page ranks. There are also way to many people who will not upgrade there BLOG software because it is not worth the hassle. There are still people who run Windows 3.1 or Apple ][ or Commodore 64 expecting people to upgrade there software is not going to happen any time soon. Mabey most will be upgraded in 20-30 years. But still some people make bloging software that will not even check to see if the html is parsed they just want to make it quick and easy.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Forums and Blogs often contain very useful links. What about them? What about all those sites that are *only* linked to from blogs and forums, and that actually are great and useful sites?
I just don't trust anything that bleeds for five days and doesn't die.
the days of "Texas Holdem Poker" are over.
Actually, are there any plugins already in existence that modify the appearance of a link based on a regexp match?
Google's pagerank algorithm may have worked well in the distant past, but it's not 2001 anymore. Things have moved on. We have blogs and trackbacks and - as the article says - comment spam.
What does this mean? It means Google's scoring system is broken. It's time Google implemented a new better system, where blogs count for nothing, and only real sites that people read count for anything. It should not be up to the users top change the functionality of the internet to boost a greedy corporations profits.
Will this be implemented on Slashdot as well? Perhaps those with karma lower than neutral would get a rel="nofollow" tag added to the URLs they post?
for ch3aP Can.adi n v31g.r a?
Monstar L
Wouldn't this have the same effect as adding for example "Disallow: /forums/" to your robots.txt?
So, one of the things that Google really has going for it is the fact that they assign "value" of a link based on how it is referenced. If we mute the voice of the average blogger in that calculation, don't we lose quite a bit? Granted, the cost is having the first few links owned by content spammers, but that seems like a small price to pay, and there should be other, less absolute ways of dealing with it....
Correct me if I'm wrong, but this doesn't actually can spam... all it does is stop the embedded links _within_ the spam from getting googleized...
Seems to me the only real way to kill the spam is to have a captcha system which most blog engines are starting to move to anyway...
Would the /. crow like this extra attribute (rel="nofollow") on all the links in comments.
./
This you you cannot gain some revenue anymore for your little website by putting a link in your sig.
I PROPOSE A POLL!
should we add rel="nofollow" to comment links on
1. WTF!?
2. Yes, of course.
3. No! that will kill the google rating for my homepage!
4. Yes, except links pointing to OSTG websites
5. Yes, except links pointing to cowboyneal.org
cheers,
_cies.
None of the links in the article has a "nofollow" attribute. But all the links look sort of weird anyway. Example:
2 F% 2Fwww.livejournal.com%2F" LiveJournal
;-)
href="http://www.google.com/url?sa=D&q=http%3A%
Does that mean Google is applying another strategy against content spam?
I'd LOVE to hear somebody explain exactely why they are not theoretically screwed here. As near as I can tell, no matter what they do, people are still going to be able to make the "miserable failure" trick work, and if they can do that, all of the spammers can/will end the usefullness of search engines as we know it.
A good point.
:)
However: It wouldn't take long for it to be reversed once people found out what FrontPage was doing.
Also: This would only appear in new versions of Frontpage.. which would be a year? from now. Who's to say what will happen in that time.. MSN Search could die an unbereived death
You have a sick, twisted mind. Please subscribe me to your newsletter.
I could see how this could definately be abused. What if I wanted to be a jackass and put that in all of my links on my site? It probably still works out the same, but a group of people could potentially boycott PageRank for someone.
"Your search - ch3aP Can.adi n v31g.r a - did not match any documents."
Guess you're screwed. Um.. as it were.
For example, <A HREF=... REL=next>
Here, the linked to document is defined as being "next" in relation to this document.
Nice feature. I like it. The link wont add pagerank to the linkee, but will it also not drain the linker?
It's an interesting idea, but it's probably a matter of short order before MS starts to use this to cut out non-MS sites.
Unfortunately, I don't know how quickly it would get to Slashdot even if it went into Slash code today.
But what an awesome statement. After all, Slashdot is a huge referrer.
Wikipedia already implemented this feature. See here.
This is not a solution as far as I'm concerned.
Why stop the indexing of relative links from blogs to make google's life easier?
99% of the links posted in comments are relavent and would be beneficial to index. Why stop this for the 1% of jackasses out there?
The domains contained in the links from blogspam are well known, and there are plenty of blacklists out there. Why doesn't googleyahoomsn just remove these sites from its database? Its such an easy solution. I believe they already do this in some circumstances for link trading systems whose only goal is to get higher pagerank.
I am afraid that many people could use the nofollow tag in a commercial way. I have outlined my thought in my blog. What do you think about this possibility?
I would worry that in a technical discussion that legitimate links would no longer be followed.
Many times those making a comment will include a link to reference materials or documentation. Searching those types of links could be very useful.
I wonder if Google could apply some type of filtering or text analysis of a link and its surrounding text to give it a spam number before deciding to follow it or not. Of course, there would be the CPU price and a constant updating of criteria.
Damn spammers keep breaking everything good about the web. Maybe *I* should do something about them.
----- If communism is a system where the government owns business, what do you call a system where business owns govern
(of course not, this is an intelligent discussion)
The whole point is, a google bomb works. If you type cisco, and cisco come up first, it is because of a google bomb.
:-) )
/. is no-follow actually, as /. doesn't need to be indexed by google.
Google ranks a link with the same word linking to it higher, which makes sense.
Why do spammers use blogs to post comments? because they have a high pagerank, because many people link to them. If a spammer makes his own blog, then no matter how many times his link to www.viagra-for-goats.com he wont budge its page rank unless someone slashdots the twat (why doesn't slashdot add this by tomorrow? what does google browse at? +5 ??
I hope the whole of
Back to the point, spammers need high page rank sites to leech page rank off them, they cannot do it alone easily.
Now, I suggested way back that people install a server side url obfuscator that would fsck over google (it wouldn't read viagra-for-goats.com but mysite.com?golink=foo.blah) and kill google from hitting those links (on the no-follow of the redirect page)
this was working then, but peope are lazy to upgrade thier blog/site software.
I noticed now a few blogs are doing it, and by validating email addresses it stops people putting web urls as email addresses,with keywords as thier name (this should be highlighted to all devs as well).
spam I got today: Courtney Cox already has a rolax! Get yours! antisemitic
Stupid spammers. I already have 3 rolaxs.
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
While this will prevent spammers from bumping up their sites' Page Rank (probably their primary motivation for comment spam anyway), it doesn't prevent their bots from spamming targeted blogs etc. in the first place. That is still best handled by the blog software providers.
For example, WordPress has a variety of different plugins for handling comment spam. The best one I've seen renders a series of characters graphically (a la TicketBastard) which the user (a human, of course) has to type into a text field on the comment form before their comment is accepted. Blogs implementing this type of mechanism typically have spam coming from bots drop down to zero.
If it's merely a question of "rank", shouldn't the attribute be norank instead of nofollow? I expect a link tagged "nofollow" to, well, not get followed.
Belief is the currency of delusion.
There are a lot of people out there who understand the PageRank system, and complain that if they add outgoing links on their site then their previous PageRank will be "leaked" to other sites, rather than their own internal pages.
Well, luckily Google has now released a way for people to link to each other without leaking PageRank. Yes, the nofollow relation. So, now everyone can link to each other, and no-one gets any benefit out of it whatsoever.
This tag is not a bad idea, but I think the good things it could stamp out weren't considered anywhere near as much as the few bad things it can stamp out..
I'll bet TargetAlert could easily be tweaked to do this.
This is my sig. There are many like it but this one is mine. My sig is my best friend. It is my life.
The end of the blurb got my hopes up; I really need someone to fix the "content spam" on the internet these days.
So much spam=cr*p and not enough content!!!
I hate to be a cynic, but the only thing this will do is create a new kind of spam. Still, it's nice of Google to work to fix a problem they created.
Assuming this works and kills the incentive to overload my blog with spam comments, the spammers will just come up with another way to annoy the crap out of me, though. And I'm sure it will still involve posting 200 spam comments overnight.
Here are some things I did which have helped, though:
I killed my comments RSS feed.
Wordpress has a plugin which allows you to close all comment threads older than x number of days. I set it to two weeks, giving my visitors plenty of time to comment on "new" stories, but covering the bullseye on the older ones.
I banned inter.net.il.
XFN already uses the ref attribute to establish relationships to people you are linking to.
Free iPods - now in the UK!
Okay, can someone more capable than I write a firefox extension that automatically blocks any link with a nofollow tag from even appearing in the browser?
A more effective solution would be to remove the comment system from your blog. You won't get spam. You won't get argued with or corrected. You won't endure lesser intellects (cough) posting their inarticulate garbage (cough, cough) on your site. And you won't have those embarrassing "Comments [0]" links all over your home page anymore. Sorted.
/
If anyone still wants to take issue with your sterling advice, let 'em put it in an email, where it can be deleted more easily.
Ade_
Big Bubbles (no troubles) - what sucks, who sucks and you suck
This can be done whether it is linked in a blog or not, and will improve the overall quality of the search database.
- Erwin
For those who are familiar with SEO (search engine optimization); does this mean that you can be linked to from a linkfarm and then linkback to the linkfarm with this new tag, and your site will not be punished ?
Bodø community site
...Advertising. And that's what it is. It's not spam, it's a blatant advertisement, like a sticker placed on a door of a public train car that shouldn't be there.
I've always hated unsolicited advertising on blogs, but honestly, once I see the pitch, "Need new hair?" then I immediately glance over it and move on. Not a big deal.
However, anyone who is dumb enough to actually patronize ANYone who uses blogs, unsolicited emails or POP-ups to purchase anything is only going to be hurting themselves, when there are so many other reputable companies online that do not solicit your business but provide everything you need - without annoying spam / popups / blatant unwanted advertising.
Brooklyn.
Well if it helps "purify" their search results I'm all for it. But another problem I'd love to see Google address is what I call "merchant spam results".
This is where you enter your search text in Google and the results include page after page of search results pointing to "price comparison" sites.
Fer fricks sake, there is nothing more infuriating that clicking on a Google result and getting a page from some lame, brain dead and retarded "compare prices" site that says "your search for returned no results".
And I'm not even going to name any of the sites as they don't deserve the publicity.
So please Google either filter this shit out or give me the option to create a blacklist so I can say "never return results from these sites".
Thankyou...
Sky subscribers are morons. They pay to be advertised at !
Hey! This is the first time that I can comment spam and have it not modded off topic!
Ha, ha! Nobody ever says Italy.
On the subject of Google spam in general: I was looking over my wife's shoulder recently while she searched for something on Google. I wanted to pull my hair out as I watched her following an endless ring of "click here to search for..." result links.
Evil is the money of root.
The idea sounds good in dealing with spammers embedding links in blogs, etc to bump up search engine rankings. But what about the people who visit the blogs - it does not solve the spam problem for them. I have seen blogs/sites/guestbooks (probably misconfigured) that are spammed to oblivion by bots inserting random spammy URLs. The signal to noise ratio on these sites is so low that the sites become pretty unusable very quickly.
Some alternative suggestions.
/.er but it really has the same effect as paying for placement in the first place. Which as we all know is one of the revenue generating aspects of search engines.
1)Moderation of search engine results.
The page rank system is modified to subtract ranking based on user submissions that the link is to web-spam and hence spam itself. The usual one vote per IP will legitimize votes. Only negative votes allowed. So, I do a search and the first 10 results are CR^P. I check the CR^P box next to each of them. This system is good in that it only kills the really annoying sites and not the moderately annoying (possibly useful) sites ranking. Some significant numbe of votes relative to ranking gets a no-follow tag added to the link referring page(s) which may or may not be limited to 'comment' type pages.
2)Compute intensive, but statistical relevence analysis of linked sites could be useful. If the linked site contains much of the same crap that gets email tagged as CR^P then the nofollow tag is added as above.
3)MONEY could be offered to the search engine to remove all the no follow tags for a particular site. This was mentioned as a negative by another
another nifty blogging type site using the nofollow thingy.. fotopages.com
I stopped getting comment spam when I installed a version of my blog with a captcha requirement. Now I get the expected one comment every couple months. :)
The problem now is that there's a comment spammer going around who tries to spam my blog anyway. In the process this spammer throws up hundreds of bogus referer values and makes my referer logs virtually useless. Plus (s)he's wasting my bandwidth in the futile process of trying to comment spam.
Blocking is tough becaues the spammer uses so many domains and IPs from all over the place (likely due to open proxies).
Several bloggers are following the problem, but the best details are found here.
I don't like the low-level implementation of this feature. First of all, it requires me to parse comments, find the links, and edit them. Automated editing of user-contributed text equals bad. As just one way it could go wrong, note that spammers can make links without using the A tag, by way of IMG, EMBED, and God knows what else. You're fighting a losing battle if you expect blog software developers to know about and correctly edit all possible link-making tags. Why couldn't they instead make it look something like this?
<google:usercontributed>User contributed text goes here <a href="http://goatse.cx/">This is a spam link.</a> Blah blah blah...</google:usercontributed>
With that markup, user-contributed text doesn't have to be edited as much (which is a good thing); it also has the advantage of marking ALL the user-contributed content, not just the links, so a clever search engine could also have the option of assigning less weight to the words in the user-contributed section of the page. It's less work and less chance of mistakes for the blog-software developer, more useful information for Google, and better engineering practice.
That example illustrates another point, which is that this ought to be inside a namespace. There's no excuse for polluting the top-level namespace of REL values, especially not with something uninformative like "nofollow". Does this attribute really mean "Do not follow this link?" No, it actually means "Hey, Google and anyone following their standard, the following stuff is user contributed!". The markup should reflect that.
Why isn't the attribute named "robots=nofollow"?
That would make more sense to me.
Chip H.
considering the distribution of attacks these days... I'm pretty convinced at least some of this stuff is being done by trojans, or some other hijack method. This isn't a webserver scanning the web. Most of these are home (comcast, roadrunner, etc. etc.)
The problem is there's no motivation to stop what they have going. Not everyone will patch. So there's still some benefit.
If they made the tag give negative carma... perhaps when found in high doses (more than the normal commenter)... perhaps there would be incentive to cut out the attacks.
I can see this removed the big motive to spam... but it didn't motivate them to stop the attacks taking place already.
litigous bastards? Or miserable failure? Will I still be able to find relevant information on these terms???
Just this morning I received an email newsletter, commenting on advertising moving into 'blogs.
From the newsletter:
Advertising has always been part of our society. Online it's no different, and the number of ad options seems to increase daily. The latest target of advertisers is Blogs/RSS.
This new type of advertising is still growing and evolving and the majority of ads being served are not graphical but text-based in nature. Advertising in Blogs and feeds has a slow growth area, as Blogs still have not reached the mainstream, primarily due to the fact that readers are not yet included within a browser's interface. In the future, browsers will come with this option built in. End of excerpt.
So, do away with "comment spam" and replace it with ads. Sneaky, but ads would make 'blog owners a few dollars. I understand that the moves to place a "nofollow" tag would not change the link in the 'blog itself, but it may slow down those that place links in order to gain pagerank. In the end, 'blogs would have more relevant links, and ads. And don't take me to task for the comment in the article about 'blog readers, apparently the author is using an older browser (not Firefox).
Pete Carr Owner Chatmag.com
Considering the small amount of comment spam vs. the large amount of trolls and crapfloods, removing all the comment spam would have a negligable effect upon the comments.
However, I do agree with you - adding this attribute to the links would be a good idea if only to slightly discourage the "Help me get a free $THING via this multi-level-marketing scam" sigs.
www.eFax.com are spammers
If you were an intelligent carbon-based lifeform, you would illustrate your point by posting an IMG- and EMBED-enhanced comment right here.
MovableType has had a feature like this for a long time, in which web links are redirected so they do not influence Google's PageRank algorithm.
It does not matter at all. MT blogs still gets comment spam on a regular basis, even though that spam is ineffective.
For a spammer, it is not cost-effective to check each blog to see whether spamming will have an effect or not, it is easier to just spam everyone and whatever works, works. This will be true of Google's solution as well.
Want to make $$$ with your computer? No risk! Simply press shift-4 three times in a row!!
Unknown host pong.
Use the new attribute and make your site less attractive to spammers. But that means hobbling the power of your site's voice in the Internet.
If your blog uses the attribute, then it no longer has the ability to affect the rest of the net. If your blog is on politics, and your readers post links to some leaked government memo online, those links will not increase the visibility of that memo. Your site becomes a curio - nice to look at, but ultimately irrelevant.
It's hard to soar like an eagle when you're surrounded by turkeys.
Either disable new posts to the archives entirely, or make it difficult to automate by requiring human email and/or captcha security image verification.
--
Power to the Peaceful
Where the hell do they get the idea that this blocks spam links? I've been doing this for months now (redirecting external links so the links do not get pagerank), and it hasn't slowed down the spammers at all.
It is pretty easy to make rel="nofollow" visible to normal users too in modern web browsers using CSS. You could use something like this:
That will display the given image before any links marked as nofollow.
so, this is just another (small) reaction to an ever-growing problem. how long do you honestly think it will be before some little cracker-wannabe figures out how to ocr the 'graphical characters' currently used to stop spam-bots? what then? what we need is a more permanent method of stopping this type of refuse from invading our lives. the type of scum that performs these activities obviously gets paid a lot of money. and like a lot of people who get paid a lot of money, they sit around thinking of ways to make more. so they should be punished, severely, and stopped at their source.
.
instead of just 'reacting' to the next new thing these maggots use to spam, we should act proactively to stop them. i think if a site is using spam bots to post comment spam, shut it down. disable the domain name and punish the owner. hold ISP's and webhosting companies responsible for the content of their customers. force ISP's to spam-filter their outgoing mail. blacklist any ISP that doesn't do these things and enable browsers to block the offending IP addresses. set up blacklists on dns servers so that they don't reference producers of spam. if they can physically capture and imprison a spammer, how much easier would it be to just shut down their source of income?
obviously things are not that simple. ideas like these could easily be subject to abuse and a whole new form of DoS, but that doesn't mean something like this can't be worked out. if a president can set up "free speech zones" that blatantly violate constitutional rights, enacting stricter policies on what is allowed on the internet should be a piece of cake (especially since most spam apparently originates in the US).
there are already limits on what can be broadcast on the radio, what can be shown on broadcast/cable tv, and what you can and can't do in public, and i don't hear many people complaining about those. ( and yes, i know things are different outside of the US, and other countries are far more liberal in what they allow. ) but the concepts are still there.
just like any other virus or nagging problem, you have to attack the source, not just treat the symptoms.
now, i need some cawfee, tawk amongst yerselves . .
Just because you're paranoid, it doesn't mean that they're not out to get you.
This is not about people who write thier own web pages, its about people who post in public forums. Its those forums which will automagically insert the extra tag.
so when will this A element's attribute become part of the XHTML spec? =)
i think its a great idea, and when powerhouses like Google, MSN and Yahoo! are promoting this sort of thing, blogging software's bound to jump on the bandwagon eventually.
Enjoy an e-piphany
Except that, to stop the spam, the spammers need to actually realise that you apply rel="nofollow" attribute to links. Somehow, I suspect they won't bother examining previous comments on your site to see. They'll just spam you anyway on the off chance.
This will only work if everyone does it. We need a critical mass of sites that apply rel="nofollow" to user submitted links, so that there is no longer any profit in spamming sites.
You're right. This "solution" will do about as much as against Comment SPAM as recent US Federal Legislation has done for e-mail SPAM.
The BLOGS that collect this kind of SPAM in their comments are being run by people who either don't know how to, or don't care enough to update to the most recent version of, for example, MovableType. This "solution" requires action on the part of the people hosting the BLOG, something that I can guarantee will not happen with the idiots who can't even be bothered to take the existing rudimentary steps necessary to limit BLOG SPAM.
Anything that requires action on the part of the people administering the BLOG will fail to make an impact, plain and simple. How many MovableType BLOGS are out there with literally pages upon pages of SPAM comments? How many of them are EV ER edited, moderated or subjected to a little house cleaning?
No, a far better solution would be to NOT index ANY BLOGS unless the bloger take action, such as adding something a SPAMMer can't inject with comment SPAM, like a specific Metatag in the document Head.
I hear a bunch of people wine about how this will end up restricting BLOGs from coming up in search results. Fine, I'm OK with that. If the people administering the BLOG can't be bothered to take action to reduce BLOG SPAM, then the site is unlikely to contain anything I'd be interested in anyway.
"Live Free or Die." Don't like it? Then keep out of the USA
Why not make it so you could apply it to the parent element? Eg,
<div rel="nofollow">
[insert comments here]
</div>
So that even custom apps can easily add it?
Pain lasts, kid. Its how you know you're alive. Sometimes I think this growing up thing is just pain management-TheMaxx
I suggested this idea on my blog about a month and a half ago. It's awesome that Google, Yahoo, and MSN were able to work together to turn it into reality.
The potential goes far beyond curbing blog spam. Perhaps more importantly, it provides a means for publishers to link to information without lending creed to it.
One of the most prominent example of a reason publishers would want to do this are educational sites that link to sites such as www.martinlutherking.org in order to teach children about misinformation on the Internet. In doing so, they inadvertantly raise its search engine rankings, propogating the misinformation itself.
Read on for more...
this article is worth checking.
Critical Mass being every site with a sufficiently high pagerank
A more interesting question really is:
If a certain Search Engine's database is polluted because of a loophole in it's ranking system, and users of that search engine are hindered, is it not their responsibility to make sure this doesn't happen?
Is this not 'Passing the buck' or better, 'Jamming the Tag down Throat' to people with bonafide message boards which suffer under the Mass of the almighty Google?
Exercise caution when modding this message up: the author acts like a jerk when his karma is excellent.
This should disincetivize one class of blogspam: the pagerank pimps.
However, it doesn't prevent clicking on those links, or having to scroll past general pointless posts.
Phishing links are popping up regularly, as are the "Help me get my free iPod" crap. This does not get reduced by adding the nofollow tag.
Design for Use, not Construction!
For the guys without PageRank based algorithms, this doesn't really improve consumer experiences.
In those cases, it's just MSFT and Yahoo wanting to charge small sites for mere inclusion in their search engines.
This may sound win/win/win for Google/competitors/losers.
But in reality, it's Win for Google (easier programming), Win for MSFT and Yahoo revenue, and lose for owners of small web sites.
I don't understand. Give me an example or two.
Oh crap! Then I'd best put in all of the shameless plugs that I can!
Looking for Open Source linux/windows software piracy protection usb security key dongles?
Look no further.
Quando Omni Flunkus Moritati
Guy told me about this as a way to get better search engine rankings.
I said, "That's just about functionally identical to peeing in the street repeatedly until the whole town knows your name."
He said, "Well, yeah, it is."
I hope the initiatives taken will eliminate the problem.
Astro
I find it sad that we have come to the point where we have to alter the content of our pages (or comments, in this case) to combat spam.
From an engineering point of view, there is nothing less satisfying than adapting the entire content of the WWW to help one specific algorithm with a big, nasty problem that will not go away even if we have a remedy for that aspect.
"We have the best B-B-Q in town!!" 8D
Now that we have a way to describe the marketting that goes on here at /., comment spam, with some companies being very egregious in their trolling and posting of veiled rave reviews for dupious products, will /. now have a moderation for some posts as comment spam?
/., can you implement a moderate as comment spam feature?
a post that reads as follows could be marked as comment spam:
Yes, I understand how the tsunami was so terrible, and while I was on line I downloaded a song on my new product name here which is from that great company name of a fruit here from Cupertino, and aren't they the best company in the world and I think they rock.
So,
How hard could it be?
You clearly have no idea what you're talking about. Microsoft is not even a large company. What do they have, 10,000 employees or something? IBM has well over 300,000. IBM takes in way more revenue than Microsoft, too. There are companies that employ more people than medium-sized governments. Microsoft is not one of them.
I like to call them free speech ghettos.
This is designed to stop googlebombing, which is not necessarily spam. It won't stop spammers, because they'll still get people following links who read the blog on their own. E.g. if you post a message to slashdot with a link to some site, you'll get thousands of people seeing/clicking on your link without google's help. This will only stop incidental visitors, and really won't act as a deterrent to spammers. It will only help search engines, not blogs. In fact, it will reduce the number of visitors to blogs, which may not be a good thing. Think of all the *good* links that will be missed...
A little off topic:
I love your sig because it sticks it to the comment spammers from that litigious company in CA.
And I have often thought that it would be good to have a way to spider the content to catch all references to that product that you mention. But if I did that I wouldn't see your awesome parady of the offensive content.
We need a way to surround a parady or joke something that says that what you have is a parady and not comment spam. But then if we have that what stops people from using that to push their message?
If style sheets are working one could set up the default style for A tags to have rel="nofollow"
Then the page should always include that as part of the A tag attribute list.
My experience with style sheets is that they don't work consistently accross browsers. So while they are a good idea, it is important to be able to use the attribute list as well.
If style sheets work, though, a rel : nofollow in the style for the link should take care of the issues that you mention. But style sheets don't work consistently as I have noticed by viewing pages in different browsers.
godamn it just when i start my comment spam career someone ruins the industry. arrgggghhhhh??
Soundproofing Acoustics noise
Microsoft has about 58,000 employees.
Well, not to detract from the main point, "Comment spamming creates a lot of false results on google;" however, I have to say that i have found some fairly obscure or rare files and information in forum posts and comments. if all the links in this medium get nofollow status to the bots, it makes finding these rare items much more difficult.
Well, the beauty of it is it doesn't have to work in all browsers, it just has to not make the page look bad. Browsers shouldn't do anything based on the nofollow tag, only search engines. As long as google/msn/yahoo all respect whatever hack is used to insert it, it doesnt matter if some browsers dont understand it.
Pain lasts, kid. Its how you know you're alive. Sometimes I think this growing up thing is just pain management-TheMaxx
I thought the same thing. It's an interesting idea for sure but letting users define which links affect pagerank and which don't is very open to abuse. And if enough people abuse it, it will totally destroy their whole pagerank system.
:P
To be totally honest, I'm probably going to do this with a few links on the sites that I'm in charge of. I know, I know, totally evil and abusive, but seriously, if I have the ability to lower other sites of interest in my "genre" in search engine results, without hurting my own results, why WOULDNT I do that??
Something that might work would be for their crawlers to be able to identify when a page has user comments and to automatically ignore all links in that section. Of course it would be impossible to recognize all areas on sites like this automatically, but they could probably get 90-95% of them pretty accurately. That would work pretty well I think.
Joseph?
I run a phpBB forum, and one thing I noticed (prior to adding the setting up the addon that puts up a random number image for validation) was that accounts on my forum were being automatically generated, usually 2-3 a day (which I promptly deleted). At first I was confused as to why someone would do that, but then I noticed that these accounts all had their homepages setup to point to various websites. Some of the accounts pointed to the same site, some pointed to different sites, but generally I'd always find a duplicate account pointing to the same site. It seems the person(s) doing this realized that with phpBB, if you setup a URL in your account's homepage field, you can get the URL linked on that account's info page. When Google comes along and caches the page, you get one more rank bump.
If you run phpBB and you have a problem with auto-generated accounts (and still have them after you setup the random number image validation addon), you might want to add this new tag to the home page links displayed on an account's profile page (as well as to URL's in posts, etc).
All I know about Bush is I had a good job when Clinton was president.
To my knowledge, CSS can only be used to define layout and look, it can't change actual content. Probably need a regex.
Joseph?
Movable Type already, by default, includes a link redirect through any URLs left in comments, so that the links won't drive up the spammers' pageranks. Sadly, the spammers haven't caught on that what they're doing is futile, and continue to slam my server when they attempt to spam me. I also use MT-Blacklist, but even blocking URLs doesn't keep my server from getting slammed when they hit the comment script.
If you're worried that you might be increasing traffic to some other site, wouldn't your number one action to be to not link to that site?
Here's something funny that happened to me... I was doing the NYT crossword puzzle and typed in one of the clues (I think it was "prince valiant's wife") to Google - I know, that's cheating - and the top hit was a Russian site selling Viagra or something. I wonder - do they enter in all the clues to these puzzles as hidden text on their pages and get them Google-indexed in time to catch the many lazy people who do the same thing I did?
content or comment? Big difference!
Employee wise, in the scheme of thing its a bubby.
Of course thats why there all so filthy rich. Bugger all staff and monster loads of income means huge huge profit.
Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
"(an example here)"
:-(. So much stuff you can't do without making it look broken to IE users (though I guess you could check the user agent string via PHP and modify the page based on that...
Wow! Thanks for that link! That site is awesome! It's amazing what you can accomplish using pure CSS magic!
Too bad IE still doesn't support all of CSS1 even
Again, thanks for the link! Everyone whos into webdesign should be forced to read that site before they start ruining the Internet.
All this little value for the rel attribute does is help Google make their Page Rankings pure. The bots that post comment spam are already out there and aren't configured to see if it is a waste of time to post to a site that employs the rel="nofollow" atrribute in anchor tags. This one is already out of the box and it appears to me that Google is just trying to get everyone else to clean up a problem Google started.
CSS can't change the structure of HTML, only the presentation.
http://shit.slashdot.org/article.pl?sid=05/01/19/0 516246
No, this is the first time you can comment spam and have it modded off topic incorrectly! :)
WWJD?
JWRTFM!
That would still clasify it as a large company. I work for a company of 5 People. so Microsoft Employess over 11,000 times as many people as mine.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Whenever I'm searching for technical information, a couple of sites always come up that are useless to me. They have a question/answer format, questions are left in the clear for search engines, while the answers require registration. What I need is a way to filter those sites out from my searches, so that they simply don't show up in any result set. Hmm might be a good excuse to play with writing Firefox plug-ins... :-)
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
As far as I am concerned they should stop indexing blogs all together. Most of it is self-centered rubbish nobody is interested in anyway. This will solve blog spam problems by default.
Modify your web server to only put this attribute in links when Google's robot comes to visit. Remove it when anyone else visits. Then you can tell all your link buddies that you're linking to them, yet secretly hoard your pagerank.
I think I can get Apache to do this in a few minutes, wonder how I should leak it. Anon to webmasterworld might do.
If you don't like the look of a link, don't click it!!
If it's confirmed spam, the author will delete it.
I don't see how graying helps at all!
no,
it was an insta-geek reaction to the news of the rel=nofollow tag.
There's no point. Spam doesn't stay in wikipedia long enough for it to be indexed very often.
of course other wikis using mediawiki might be helped.
The best thing i think is using it to block 'edit this page' links, and 'histories'. Even though these have noindex anyway, they still get spidered heaps.
outgoogle me.
Honestly,
who cares if google follows a link!
Does it really hurt you that much?
Apart from controlling what links of your own site are spidered. This is a little geek tantrum about people spamming their wiki!
How do you get rid of the mischievous links at my blog?...m /stats/referers
http://GuideToProblematicalLibraryUse.buzzword.co
It is on a blog template provided free to bloggers.
This is pretty much useless. Within a week of opening up comments on my blog, I was getting blogspam. I went to war immediately; the first thing I did was to submit all comments to an approval queue. No spam has appeared on my blog since. I noted this fact in an article, and in comments around the comment submission form, and the result of POSTing a comment tells you it's been submitted for approval.
But this did nothing to stop the flood of incoming blogspam.
I blocked, and still block, a few of the repeat offending IPs. But these days, my comment log looks like:
[Thu Jan 20 02:03:39 2005] Rejected spam from 213.121.209.14: carroll
[Thu Jan 20 02:18:59 2005] Rejected spam from 61.221.15.131: cleotilde
[Thu Jan 20 05:08:55 2005] Rejected spam from 211.57.209.225: tera
[Thu Jan 20 05:09:07 2005] Rejected spam from 61.221.15.131: lawanda
[Thu Jan 20 05:09:30 2005] Rejected spam from 66.160.17.189: deangelo
[Thu Jan 20 05:09:41 2005] Rejected spam from 193.251.169.174: raymonde
[Thu Jan 20 05:10:03 2005] Rejected spam from 66.250.69.7: tynisha
[Thu Jan 20 05:11:02 2005] Rejected spam from 211.57.209.225: corrie
[Thu Jan 20 05:37:47 2005] Rejected spam from 85.64.61.191: Online Poker
[Thu Jan 20 08:14:10 2005] Rejected spam from 211.250.80.2: heike
So, blocking by IP is pretty useless. I was in no mood to try word filters or statistical filters or any such, so I simply added a hidden field to each page, based on the time the page was requested and a secret token. When a comment is submitted, it is rejected if the hidden field is not present, or if it is from a time that is too old. This immediately blocked 95% of comment spam.
Some few people were persistent, fetching a page and then posting back to it. So I checked my referrer logs; seems blogs to spam are found by Googling for typical strings, and posting in an expected format. So I made the Subject field mandatory.
I now have a close to 100% spam block rate. Why would I add a "nofollow" tag to my links, when spammers won't stop spamming just because their spam isn't being read (they don't stop now, and their spam isn't even being accepted!) and when real comments would suffer from it?
I beleive forum posts and so on much more accurately reflect sites people are talking about and finding useful than what you get from only scanning the pages of the site itself. Forum posts and comments are made by the people who use and require the information that they seek when searching, it seems backwards to ignore them.
For example, an extremely resourceful but static page may have several other relevant webpages link to it at the time it is launched. But only that once - a webpage rarely has other sites post news about the simple fact that it is still there. On the other hand, the page may be so useful to the user's of those site's that it's link is being given in forum responses on a daily basis.
If setting rel=nofollow by default becomes common practice on comments and forums, then I fear search engines will be giving much poorer results. If it is not set by default, then I suspect it will hardly be used and hence will have negligible effect either on search results or at discouraging spam.
I suppose it could work well if rel=nofollow is only set for "Anonymous Coward" comments by default, or some other sensible usage that will probably rarely happen because those who would have to code that sort of thing probably dont have much incentive.
Excellent idea... *heads to tweak vBulletin right now*
:)
I was recently hit by a wave of porn spam referral bots..the page they are displaying themselves on is PR 0, and pretty much hidden, but it's not overly nice to have links to such sites in your referral logs.
It wasn't too difficult to modify the script slightly so that it didn't record the offenders, so once they appear once, they can never get on again.
http://www.nzboards.com - Online Kiwi Community
I took a somewhat more drastic step to combat the rampant comment spam on my bloggg. Spammers insert their target URLs (the payload) randomly in comments presmably to increase the target site's Google Pagerank by giving the appearance of thousands of sites linking to it. Block the payload and you remove the incentive. So I have my bloggg detect URLs and present a message saying that URLs are not allowed in blog comments, blocking the post. (I suppose humans can post their URLs in a more creative, although non-clickable way, but spambots aren't likely to catch on.) A little drastic, but it's been highly effective.
Caveat Emptor is not a business model.
Actually they're not all as bad as they seem, but they are (deliberately) misleading.
I've seen a few sites like that where the answers are on the actual page in question. However they don't exactly go out of their way to make this obvious. You get a big "Register to view answers" link followed by what look like a stack of end-of-page adverts and links.
Scroll past this garbage and you can still find the discussion including the submitted suggestions.
It is irritating, though. The answer is there on some of these sites but they make it seem like you have to register - and I'd guess that many people end up registering due to this deception.
Tiggs
"120 chars should be enough for everyone..."
The Google Blog post shows that the posters, working for Google, forgot the rel="nofollow" attribute themselves when posting. :)
Maybe they just wanted a higher ranking for their friends
Extended from the original topic.. like All Good Things (?) the point was made that if this takes off in the blogging world then the next step would be effect it in the webmaster world. It could happen.. wouldn't be that hard for MicroSoft to automatically add the tag to all hyperlinks in pages created by FrontPage (as the parent posted noted).
You have a sick, twisted mind. Please subscribe me to your newsletter.
Actually, it's slightly more complicated.
Document stylesheet trumps user. Unless the user specifies the "!important" modifier, in which case user trumps document. I find this very useful, and use it to override a lot of b0rkeness on the Web.
Also, in some browsers (Galeon among them) it's possible to create a set of stylesheets which can be applied to any arbitrary page, only when specified. I actually use this to tweak the "light" Slash code to format it more like the default. Fun thing is you can apply it to default websites. Screenshots linked from http://lists.svlug.org/pipermail/svlug/2005-Januar y/048897.html
This and other userContent.css tricks at UserContentCSS TWikIWeThey page.
There's also the greasemonkey Firefox extension, which extends this concept somewhat.
What part of "gestalt" don't you understand?
One of the higher ups at Microsoft recently made the mistake of telling the employees in the Windows division that each person there helped generate more than a million dollars per person per year. Of course, all the employees then looked at each other wondering whether they were underpaid considering very few were actually millionaires.
Because the most troublesome blog spammers use software to spam the blogs, adoption of the "nofollow" tag will increase spamming of blogs rather than decrease it. The spammers will need to hit more blogs to get the number of inbound links that they want, so they simply will hit more blogs.
Legitimate bloggers who have contributed comments to related blogs or give them links and trackback will see their search engine rankings fall. It appears that the big three search engines see this as a good thing, but bloggers who consider the long term implications do not.
In the long term, reducing the search engine rankings for blogs makes blogs and blogging less visible to the world as a whole. This is a concern even for bloggers who get most of their visitors from other blogs. Attrition is inevitable, no one blogs forever. For blogging virtual communities to thrive, it is important for new people to start blogs as other bloggers move on. Many bloggers were first introduced to blogs when they used a search engine and the results showed a blog. Pushing blogs out of the search engine rankings will have a powerful and negative longterm impact in blogging.
To learn more go here:
http://netinstitute.com/archives/2005/01/20/blogge rs-cheer-google-as-their-search-rankings-plummet/