Smart Spam Filtering For Forums and Blogs?
phorm writes "While filtering for spam on email and other related mediums seems to be fairly productive, there is a growing issue with spam on forums, message-boards, blogs, and other such sites. In many cases, sites use prevention methods such as captchas or question-answer values to try and restrict input to human-only visitors. However, even with such safeguards — and especially with most forms of captcha being cracked fairly often these days — it seems that spammers are becoming an increasing nuisance in this regard. While searching for plugins or extensions to spamassassin etc I have had little luck finding anything not tied into the email framework. Google searches for PHP-based spam filtering tends to come up with mostly commercial and/or more email-related filters. Does anyone know of a good system for filtering spam in general messages? Preferably such a system would be FOSS, and something with a daemon component (accessible by port or socket) to offer quick response-times."
Akismet
Re-Captcha was fairly effective and easy to install and useful.
Dave Barnes 9 breweries within walking distance of my house
Or am I misunderstanding what FOSS really is about?
"Does anyone know of a good system for filtering spam in general messages?"
..
Yea, design an email system that is immune to spam and make the ISPs responsible for blocking spam, phishing and such attacks
davecb5620@gmail.com
2 apps that come to mind - providing you are not running on shared hosting They may to overkill - they both have spam rules and can help web apps from hackers. modsecurity - http://www.modsecurity.org/ snort - www.snort.org both are free and GPL - the snort does have a paid for subscription service - however if you just free register you get pretty up to date rules
I've been thinking about modifying my VSDB software to do something like this...
meh
There are a number of things you can try:
For a small site I helped set up, they went to complete SSL and client certificates, where users had to obtain a cert from Verisign or Comodo before they would get access. This stopped spam, and one can obtain a client cert for free or a low cost. However, this can't be done for most forums or blogs.
For larger sites, a lot have ended up moving to an approval type of system where a human approves the creation of the user, then a limit on how many posts a first time user could do, and how many features the user can access.
Finally, one site just went to a paid subscriber system where for any access at all, people had to pay $5 to $10 via PayPal. This at least forced spammers to pony up cash (or commit credit card fraud) before they would get access.
Akismet is the best thing for blog spam prevention ever. I can't believe you've never stumbled across it before. It uses statistical analysis to identify spam, and the more people use it, the better it gets. If everyone used it, the blog spammers would just disappear because their attacks would be completely ineffective.
mollom
i discovered this one through drupal. I thought it was completely free but apparently for high traffic sites it isn't.
I think all your user generated content is sent to them and checked for spaminess against the other submissions they are receiving and they give you back a rating.
Any method you use can be broken. Your only chance is to reduce the likelihood that your site is worth the effort.
Basically, if you use a common solution - no matter of FOSS or commercial - then there will be a thousand other sites that use it too. This attracts attackers because they know when they hack it once, they can re-use it.
However, if you handcode something, no matter how primitive, it likely lasts a lot longer because nobody bothers hacking into your site...
Of course that doesn't work if you have a large site like myspace - there, a single site is worth the effort by itself.
Anyway - then there are two things - a really fast moving animated gif and silly things where you ask people to identify items usually work.
I help out with a site that randomly takes five pictures of cats and dogs and it asks you to identify which of the images contains the highest number of kittens... We barely ever get spam through - and that with almost 20K attempted submissions by non-humans a day makes us pretty happy
Peter.
Forums and blogs are susceptible to post spam because most are large opensource or commercial scripts. The fact that there are thousands identical scripts makes them a prime target to spammers. The trick to prevent forum spam is to confuse the spam bots. Most all bots will fail to register a user if given something unexpected. For forums I administer, I wrote a vbulletin mod that requires a bit of human intuition to solve (not much mind you) such as have a text input field that must remain blank. This simple measure has prevented nearly 99% of forum post spam.
Windows Vista Help Forum
I have a series of 4 tests to block spam on my website. So far it has stopped over 30,000 attempts in the last year.
Test one is, does the last name = the first name. For some reason almost all spammers do this.
Second, do they use a keyword from a list of about 15 words.
Third, do they fill out a hidden inputbox? This is sort of the reverse captcha.
Finally do they use more than 4 "http" in a post. Almost all comment spam is an SEO effort to increase their pagerank.
"During My Service In The United States Congress, I Took The Initiative In Creating The Internet." -Al Gore
Project Honeypot's HTTPBL has been good to me:
See: www.projecthoneypot.org/httpbl.php
The fastest way is probably to just slow down user registration. Permit anonymous posting, but make it moderated/screened by default (ie. not visible to other users until the forum owner flags it as OK). When a user goes to register (so they can get their posts visible immediately), do not send them the confirmation e-mail immediately. Batch your confirmations up and send them out twice a day at odd times (ie. not midnight and noon, something like 3:47am and 3:47 pm) (you could do it 4 times a day, but not much faster than that since the idea's to introduce a delay in the registration process). Make sure to tell the user on the registration screen what sort of time-frame they can expect their confirmation to arrive in. Ordinary users who plan on using the forum long-term won't be inconvenienced much by this. Spammers... won't tolerate the delay, they want to get their message in fast and get out. With their automated scripts they might not even notice things are failing. Also, don't include a direct confirmation link in the e-mail. Include a URL to a form and make the user copy-and-paste the confirmation number from the e-mail. That'll be trivial for humans, but not easy for an automated script to handle without human assistance.
None of that will stop a determined spammer, but most of them are more interested in volume than anything else and they won't bother spending time/effort on just one forum when they could hit 10 others instead.
There is a well working semi-dynamic plugin for wordpress. It has served me well. It is called YAWASP and you can find it here: http://wordpress.org/extend/plugins/yawasp/. The author also describes the common problems & shortfalls with traditional captcha-like methods.
how IT is changing the world - http://max.zamorsky.name
It's got a field that says "I am a robot" checked off by default. A human should obviously see that and uncheck it. Those registrations that come in with it checked are blackholed. It's definitely cut down on the SPAM accounts since they enabled it.
not so easy for you of course make the forum moderated and require post approval. then get good mods to approve posts.
But some PHP tinkering and you could probably do something to pass comments through spamassassin using a socket or something.
Spamassassin would need updating though to work with content-only data.
Wonder if anyones ever thought of this before?
Arguably, it is Mollom. Especially if you are using Drupal.
Askimet is 'rotting on th evine' in many ways - including development updates. Mollom is a commercial web service, with a free version for non-profit and small volume sites/users.
The Drupal module is explained here:
http://drupal.org/project/mollom
The Mollom site:
http://mollom.com/
"Flyin' in just a sweet place,
Never been known to fail..."
...there are companies out there that use a Bayesian filter to sort posts into low scoring and high scoring, and then they have their employees manually sort through the high scoring messages.
This is really a very good test. As others have mentioned in this thread, it's the sort of thing that spammers will circumvent if it becomes widespread, but for now it's great.
There's something else I've found to be really quite effective: deliberately misnaming my form fields. For instance, give the input field that's labelled "First Name" an input name of "phone number." Humans don't use input names to determine what text to enter, but spambots do. Then check that inputâ"if the first name field contains a phone number, you know you've got yourself spammer.
I've used solely the combination of these two things to run one of my websites for two years now, and I get a vanishingly small amount of spam.
I had a similar problem in the comments area of my site. It was all fun and games, until one day I checked, and there were something like 1000 spams for every real message.
I wrote my own system to deal with it. It's not very hard, assuming you know how your site works (of course you do, right?)
I ended up making two blacklists. One was for words and phrases. The spammers tend to post (and repost, and repost) the same crap. My blacklist rules had some simple regular expressions that I could run queries with. Like, "%http://%spamsite%" and "%v%gra%". You get the idea. The second list was IP's that were known spammers.
At the time, I allowed both anonymous comments, and comments from logged in users. I eventually did away with the anonymous comments, as they were a headache. This was the best cure.
So, when my script ran (once a minute), if it matched a message, it would delete the message, and append the IP to the IP blacklist. If it was posted by a user account, the user account got suspended, so they could no longer log in, nor post.
After it's detection and cleanup run, it then ran back over the IP list, and pruned out every post by that IP. Sometimes they'll do practice runs saying silly things like "nice site". I thought they were real user complements at first, until I saw the same posting verbatim coming from the same IP to multiple news stories, and then that IP would start spamming later.
Some people will argue that the IP cleanup run was not nice, polite, or even fair. People use proxies. Sure, they do. We got a lot of abuse from anonymous proxies, and no real messages from them. The spammers didn't seem to like to use AOL.
When I implemented this, I posted a very brief description of what I was starting ("We're starting advanced anti-spam protection"), with an apology for real messages that were deleted. I never received one complaint about real comments disappearing.
How brutally you do it is really up to you. I built my method by manually doing it for a while, and then letting the script do it on it's own. Occasionally, I would have to go in and add new words and/or site names to the words blacklist.
I noticed the spammers hit more common software more often. It's worth it for them to make automated systems to abuse a piece of software that's deployed on tens of thousands of sites. When I rewrote my site from scratch, then abuses dropped down to 0 for a long time. Now, they manually submit "news" items which are just ads for their own sites. It appears to be manual, and since we won't run them as news stories (our editorial staff decides what does or doesn't show up as news, and if it needs to be edited first), they give up pretty quickly.
Serious? Seriousness is well above my pay grade.
I like honeypot links that blacklist anyone who clicks it. Seems to take out spam spiderbots effectively, until they learn how to avoid the honeypot links.
“Common sense is not so common.” — Voltaire
I've had good luck using Drupal with a couple of techniques.
First is the Spam module. Pretty much out of the box.
Second is to have all comments subject to moderation.
Third is IP-address filtering via the .htaccess file. Blocking entire continents for some Drupal sites with a US geographic audience has keept the spam down to a very low level.
TypePad antispam is a great alternative to Akismet.
I have implemented something similar, but I haven't been checking the number of blocked messages. All I know is that I used to get spam, and now I haven't gotten any for years. I use this for Formus and the Contact Us page.
My rules are:
1) The text boxes for things like name and subject are actually called junk.
2) There are hidden textboxes called name and subject (1 hidden by javascript and one by CSS) that if they are populated the post is ignored.
3) A third hidden field is the result of a simple javascript math equation that is checked on the server side. If the value is wrong, the post is thrown out.
As others have said, if your site is small these types of things are good enough to prevent spam because the spammers won't bother to figure it out. These concepts would never work for any of the larger sites or 3rd party forum software.
Maybe use something that lets users moderate posts up or down with labels like Informative, Funny, Flamebait, and Offtopic. Also, don't forget include one involving a fictional bridge dweller, that will be misused by just about every moderator for anything they don't like.
On phpbb boards I run the most productive things are:
1. Do not allow external links in profile of newly registered / non validated users
2. Do not allow registrations with gmail.com email addresses
3. Ensure "valid" timezone and country settings are selected by users.
L/
you might be interested in this chart:
http://www.opendemocracy.net/blog/economics/admin/2008/11/18/mollom-beats-spam
that shows what Mollom did to our forum spam after just 2 months. The interesting thing is that the spammers actually seem to have stopped trying to attack this url.
Tony
I run a site for my rennisance faire guild to talk at and plan things. We had tons of message board spam until I implemented a simple solution: a password is required to register. If not entered, registration fails. The password is posted elsewhere on the site in my case, but you could communicate it only to people who need access if the site is small enough.
Not a sentence!
The comment- and trackback-spam blocking techniques in Pivot blogging software are, from my limited personal experience, 100% effective. There's even an extension that uses the enormous Project Honeypot database (http:BL) to weed out IP addresses of identified harvesters and comment spammers. That's just for entertainment, though, since the basic techniques are completely effective.
Mollom is free for low to medium traffic sites. They have plugins for the major CMSes out there (Drupal, Joomla, Wordpress, and a bunch of others).
It is relatively new, but I use it on several sites and it works well. See the score card for some fun.
The founder of Mollom is Dries Buytaert, the founder of Drupal, the CMS.
2bits.com, Inc: Drupal, WordPress, and LAMP performance tuning.
If a spammer really wants to, he can test his attacks against the site until he beats your filter. Filtering works impossibly well, but only if the output of the filter is private. Spammers may not be doing these attacks now, but if everyone started using Akismet, no doubt they would start.
As someone who once used text browsers, I can only advise everyone not to do this - it breaks accessibility at a fundamental level: I got banned from a forum once because they mislabeled fields.
What however, works really great for comment spam is a simple question like "What is the name of Barack Obama ?".
I've been told that Bad Behavior is the shiznit. http://www.bad-behavior.ioerror.us/
What I did in the past is to do a reverse CAPTCHA. Basically just a regular one but with a little text below: "Enter the letters shown in the image _in reverse order_". Stopped 100% of all spambots.
CRM114 is an option you might want to consider.
Plusses and minuses:
+ REALLY FREAKIN' ACCURATE. trains to 10x better than human.
+ REALLY FREAKIN' FAST. 20-50 milliseconds/posting without even being demonized.
+ REALLY OPEN SOURCE. GPLed. Free forever.
+ REALLY FLEXIBLE. Has about a dozen built-in classifiers, most of which work on any human language (including
chinese, japanese, korean, etc in their native formats).
- Arcane control language. "like 'awk' on meth". "grep bitten by a radioactive spider". You get my drift here?
- Not a drop-in solution for blogposting. You'll have to do some coding.
- Needs to be trained, with both positive and negative examples. When it wakes up, it knows _nothing_.
It's at "crm114.sourceforge.net"; there's mailing lists as well as an IRC channel (#crm114) on freenode.
I rarely see spam here...or is it just quickly modded down to oblivion?
Take a look at StopForumSpam.com. I've got it installed on a vBulletin forum and it works very, very well to prevent spambots from registering. Every now and then one sneaks through, but it's a lot less than I was seeing before.
Warning: This signature may offend some viewers.
I'm using the old "Fake Textarea" trick. If anyone fills in the fake textarea, the post is rejected. The fake textarea comes up first, but is hidden with CSS. I also modified the forum software so that the fake text field has the same form name as what the forum traditionally uses for the real field.
I'm also using this in conjunction with blocking posts containing URLs from guests or users with no posts.
Of course, this is all useless against Stock Ticker symbol spammers.
... 90% of all spam would be eliminated.
hey there
if you want to filter for humans simply present a bunch of images and have the person spot the cat among the dogs
then apply the spam filtering (simple stuff really works you can even just use spam assassin plugins for content ) to get rid of the spammers posting urls and rubish and denie based on IP if you catch spam unless they contact you somehow
regards
John Jones
http://www.johnjones.me.uk
it doesn't work well/isn't very smart, either. better days ahead. keep it to yourself, to avoid the embarrassment of being 'filtered'.
Recently, one of my users got infected with some spam-spewing bot malware which resulted in my company being listed at least four RBLs. It is annoying, but I can't hold it against the list services as I use them myself in my own filtering.
I have to wonder if RBLs of some sort could also be applied to web browsing especially on forums? But since most people are on dynamic IP addresses, I can only assume that without some very clever ideas to go along with it (perhaps some sort of cumulative scoring + fingerprint method?) RBLs for dynamic IPs is a rather bad idea. Still, it would be nice if there were some means of simply blocking "infected" IP addresses or at the very least calling the problem to the user's attention in some way.
Last year I wrote my own software for a blog to be viewed by just family and friends. I figured security by obscurity. Was I ever wrong. As soon as I was discovered the spam just flooded in at a rate of about 250 posts a day (which completely blew me away).
I noted that it was all designed to raise pagerank. Just random words and several links. So I wrote a simple function that every time a post was submitted it would be checked to see if there was more than two links. After all Grandma won't be sending me tons of links anyway.
If it detects more than two links the function spawns a separate thread which waits ten seconds then deletes the post. In short the post last just long enough for the spammer to test if the posts are still sticking. After I installed the filter about 16k posts were filtered out (which also blew me away) until the spam stopped all together.
I was actually pleased to realize that I was spammed that much, because it meant 16k posts were filtered absorbed by my blog rather than an un-protected one.
If I could think of a better way to take up a spammer's bandwidth/processes I would do that as well.
The phpbb forum I administer fell victim to spammers more than a year ago so I tried a bunch of MODs that implemented a couple of changes to foil attempts of automatically registering. Spam slowed down a bit, but still was strong enough to be a problem... it seems that whatever script spammers use to post in phpBB already implements most standard MODs.
PhpBBs own Captcha is no good either... ... So what I did was implement my own validation, which requires to enter a fixed word ("Dragon") in a text box. It's not a captcha... it simply states "Enter the word 'dragon':" in plain text. That did the trick and spammers completely dissapeared from the forum. ... So as long as you're not running a high profile site, a custom mod should be enough.
As a Slashdot discussion grows longer, the probability of an analogy involving cars approaches one.
I do something rather simple in my forums (about 30 of them) that seems to work very well: I disallow any user from posting a message with a URL in it until they've made a certain number of posts (usually about 20 or so). Until they have that many posts under their belt any post with a URL in it just returns them to a preview screen. Since their goal in life is to drop a link, this really frustrates the *&$%! out of them. :)
None of them so far has every bothered to make 20 real posts in order to get by this limitation. On the rare occasion (and I do mean rare) that they have started to post stuff in order to work towards reaching the limit, they have their posts removed by mods or admins which resets them back to zero.
Like I said, this really frustrates the *&$%! out of them. lol
At the risk of being labeled a spammer, you can see it in action here: www.grouptopic.com
Same with my contact forms- no links allowed. It just stops 'em dead, and if they REALLY need to send a link, they can contact me first and say so. Works like a charm.
Mike
I use Spam Karma and Bad Behavior on my not-very-popular Wordpress blog and have no spam comments at all, and I used to have lots of them. SK has recently been abandoned by its founder, but still works on WP 2.7.
Comment removed based on user account deletion
Comment removed based on user account deletion
Comment removed based on user account deletion
Comment removed based on user account deletion
Considering the complexity of the Internet, I have real and increasing difficulty understanding how the spammers manage to survive. They require an entire chain of support services to stay in business. Not just ISPs who let them access the Web, but also hosting services to hold their websites, DNS providers, and the domain registrars. They need lots of help to link their spamvertised websites to the spam, just on the minimal chains. (I've noticed that more complicated chains seem to be less frequent these days.) In addition, the spammers are strongly constrained by their need to reach suckers and provide ways for the suckers to reach back to them. They can't hide or obfuscate their spam too much, or the human suckers won't be able to figure out how to send the money.
The strength of a chain is the weakest link. Right now that seems to be the domain registrars. If the technical honchos of the Internet scanned the spam to find the largest spam-supporting registrar of the month and the rest of the Web then stopped talking to that registrar, that would seem to be rather harmful to the spammers' so-called business model, eh?
- Manual, less mutable, and more than short sig -
I only stop by /. when I'm feeling sufficiently acerbic and have a few minutes to waste. Used to be /. was good for humorous or sardonic moods. My basic feeling is that the average wit of the /. participants has greatly declined. As a metaphor, the residual wit on /. is far below critical mass. Creating a new meme is laudable, applying an old meme in a new way can be somewhat witty, mindlessly repeating tired old memes is *NOT* witty nor amusing.
I welcome thoughtful or witty rejoinder, but if you are a typically witless /. contrarian and simply lack the mod points for a spineless and anonymous censorship mod, then please don't waste our time with a reply. In particular, if you are one of the morons who still wants to defend Dubya's miserable failures, please just designate me your foe, and I'll gladly ignore you. One can never be designated as "foe" by too many fools.
Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.
> why would they agree to sign up for such a service
> in the first place?
If they don't, then they stay in the zombie spam ghetto where they belong - fine.
I forgot to mention these 2 plugins:
SABRE: against spam registrations on your blog ( http://wordpress.org/extend/plugins/sabre)
and
Simple Trackback Validation: a trackback validation tool for wordpress ( http://wordpress.org/extend/plugins/simple-trackback-validation/ ).
how IT is changing the world - http://max.zamorsky.name
I run a phpbb forum. I added a question to the sign up sheet that requires a user to pull down a menu selecting their time zone. Seems the default for most spam bots is to select the first option in a pull down menue. The first option is a time zone where there is no human habitations in the Pacific. It has brought our spam down to one or two a month.
Seems to confuse human spammers also. Also, adding a report spam button to allow human users to report spam has eliminated the rest. I also implemented site wide, no questions asked, no links until at least 10 posts or you are banned policy. Most spammers are after links, so eliminate the links and the spammers tend to go away.
TypePad Antispam is an open source project and a commercial (but free) service. The core is released as open source (GPL2) so you can install your own instance of TypePad Antispam in your servers. It has an Akismet compatible API and plugins already exist for Movable Type, WordPress and other CMSs. The free service is what TypePad uses, and has some extensions not released in the open source version, so has some advantages to a single installation.
Víctor R. Ruiz
rvr(at)blogalia.com
That's exactly what Sblam! does.
It's PHP-based filter for web forms that detects spam based on content (bayesian filter + specific rules), behavior and uses 3rd party blacklists.
It's absolutely transparent to the user (well, 99.8% of them).
"While filtering for spam on email and other related mediums seems to be fairly productive, there is a growing issue with spam on forums, message-boards, blogs, and other such sites. In many cases, sites use prevention methods such as captchas or question-answer values to try and restrict input to human-only visitors. However, even with such safeguards â" and especially with most forms of captcha being cracked fairly often these days â" it seems that spammers are becoming an increasing nuisance in this regard. While searching for plugins or extensions to spamassassin etc I have had little luck finding anything not tied into the email framework. Google searches for PHP-based spam filtering tends to come up with mostly commercial and/or more email-related filters. Does anyone know of a good system for filtering spam in general messages? Preferably such a system would be FOSS, and something with a daemon component (accessible by port or socket) to offer quick response-times." "While filtering for spam on email and other related mediums seems to be fairly productive, there is a growing issue with spam on forums, message-boards, blogs, and other such sites. In many cases, sites use prevention methods such as captchas or question-answer values to try and restrict input to human-only visitors. However, even with such safeguards â" and especially with most forms of captcha being cracked fairly often these days â" it seems that spammers are becoming an increasing nuisance in this regard. While searching for plugins or extensions to spamassassin etc I have had little luck finding anything not tied into the email framework. Google searches for PHP-based spam filtering tends to come up with mostly commercial and/or more email-related filters. Does anyone know of a good system for filtering spam in general messages? Preferably such a system would be FOSS, and something with a daemon component (accessible by port or socket) to offer quick response-times." "While filtering for spam on email and other related mediums seems to be fairly productive, there is a growing issue with spam on forums, message-boards, blogs, and other such sites. In many cases, sites use prevention methods such as captchas or question-answer values to try and restrict input to human-only visitors. However, even with such safeguards â" and especially with most forms of captcha being cracked fairly often these days â" it seems that spammers are becoming an increasing nuisance in this regard. While searching for plugins or extensions to spamassassin etc I have had little luck finding anything not tied into the email framework. Google searches for PHP-based spam filtering tends to come up with mostly commercial and/or more email-related filters. Does anyone know of a good system for filtering spam in general messages? Preferably such a system would be FOSS, and something with a daemon component (accessible by port or socket) to offer quick response-times." "While filtering for spam on email and other related mediums seems to be fairly productive, there is a growing issue with spam on forums, message-boards, blogs, and other such sites. In many cases, sites use prevention methods such as captchas or question-answer values to try and restrict input to human-only visitors. However, even with such safeguards â" and especially with most forms of captcha being cracked fairly often these days â" it seems that spammers are becoming an increasing nuisance in this regard. While searching for plugins or extensions to spamassassin etc I have had little luck finding anything not tied into the email framework. Google searches for PHP-based spam filtering tends to come up with mostly commercial and/or more email-related filters. Does anyone know of a good system for filtering spam in general messages? Preferably such a system would be FOSS, and something with a daemon component (accessible by port or socket) to offer quick response-times." "While filtering for spam on email and other related mediums seems to be fairly productive, there is a g
Im afraid you misunderstand me. Again, only the field name is affected, not the label for the field. I've used text-only browsers regularly since 1994 (Mosaic over a 14.4k modem, Lynx, and now Links), and I'm yet to encounter one that displays the name element of an input field to the user.
> Considering the complexity of the Internet, I have real and increasing difficulty understanding how the spammers manage to survive.
Complexity begets niches. Niches are difficult to actively manage, hence, spammers thrive.
> If the technical honchos of the Internet scanned the spam to find the largest spam-supporting registrar of the month and the rest of the Web then stopped talking to that registrar, that would seem to be rather harmful to the spammers' so-called business model, eh?
That type of approach was demonstrated to be minimally effective recently. Google mccolo and esthost for examples of how enforcement actions by the community with backing from the "technical honchos of the Internet" had minimal effect on spammers in 2008. As long as there are other spammy registrars, hosting providers, transit providers etc. for spammers to go to, the greatest effect from closing down a provider is to add a line to a black list, without having sustainably altered the situation.
With respect to registrars, it's often not viable for a number of reasons to poke spammy registrars that are also closely linked to ccTLD registrars (or similar) for particular countries (or similar). Individual organizations can implement or subscribe to particular DNS tricks, but there are barriers against large eyeball networks (TW, AOL, VZN, RR, etc.) doing so. (It's interesting that Internet censorship and information lensing of this sort are not well tolerated in general, but are simultaneously explicitly demanded to curtail spammers.)
With respect to structurally breaking spamming methods, having forums/blog software authors universally deploy basic things like robots.txt, nofollow, etc. in conjunction with Google and the other search engines would be much more harmful to the link spammers' business model overall.
There are 1.1... kinds of people.
I run a blogging site. When the spammers discovered it, I started getting several thousand automated spam comments per day.
I solved the problem (ie. absolutely no automated spam) with a two-step process:
First, I wrote a REALLY quick text analysis script in PHP which looked for the presence of links and other suspicious text. This reduced the spam by 95%, with no false positives.
Since I had to keep examining the spam that got through and improving the filter, I wanted a system that didn't require constant maintenance, and one which did not incur the risk of false positives.
The solution was dead simple: my comment forms now have two sets of inputs for the comment and the commenter's info. The first one is hidden through CSS. The second one is visible.
Real people see only the 2nd form. The spambots see the first one. If there's any data in the first form when it's posted, the comment is dropped on the floor with no filtering, no hits to the SQL database, no nothing.
I haven't had a problem with spam since. Perhaps there'll be a day when the spambots are tuned for my site, but I've been spam-free for two years.
Sitting in my day care, the art is decopainted.
I always wanted to make a spam-filtering method based upon the age-verification questions in Leisure Suit Larry 1.
Comment removed based on user account deletion
I moderate for the Ubuntu forums and even with hundreds of legitimate posts per day, we have no trouble keeping the spam count very low. The reason is that it's very easy to remove it. We use Spam Decimator for vBulletin and it's literally two clicks to go from "This is Spam" to "This is deleted and the IP has been banned". You probably can't prevent spam but, you can make your life easier by finding a way to deal with it so effectively that it becomes pointless for the spammers to spam.
They'll have no trouble with it at all. I'm yet to encounter a screen reader that reads the name element of an input field. Perhaps it would if the field was otherwise devoid of any descriptive text, but you'd have to be a real jackass to provide an unlabeled input field and expect anybody to know what to do with it. :)
http://stupidfilter.org/
Works for me.