Preventing Forum Spam-bots?
A concerned reader asks: "Recently it seems that forums have become the new target for spam bots advertising everything from porn to casinos. The forums that I admin are constantly harassed by these bots even though you must enter the visual confirmation code code (the picture with letters/numbers) as well as reply to an e-mail in order to register. This only started a few months ago so I'm suspecting that some new spam program was released that somehow gets around these anti-bot measures. How can I get rid of these annoying bots?"
kittens
For the record, those blurred/skewed letters and numbers are called a "Completely Automated Public Turing test to tell Computers and Humans Apart" - Captcha.
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
tedious (I hope, it's kinda hard to read)
Use a Captcha, which is a tool that displays a distorted image containing a word, or sequence of letters. The user must enter in the correct sequence in order to post.
Captcha's aren't perfect, some have readability problems for the people, and they completely exclude blind people unless you use an audio captcha as well.
Require completing captcha to create a new account, or post as a guest. Once users have an account and are logged in, you can drop the requirement to use the captcha on every post.
Maybe have a grace period between the time one registers and the time they are allowed to post or post replies?
Just display a confirmation page with the goatse.cx picture.
Anyone who can still click on the confirm button is not human.
What's to stop a spammer/script kiddie from making a script that does all the registering except for the visual code, giving an average reg. time of maybe 5 seconds per site?
...it's patented. (and Turing is spinning in his grave...)
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
Add hidden variables to submission forms that change everyday. This will force the bot software to do pagescraping for your specific webforum, which probably isn't worth their time. They will go to the easier targets first.
But if they are defeating captcha, there is probably someone who just sits there manually spamming forums through anonymous proxies. The amount of money that can be made by doing this spamming is probably enough to pay people with lower standards of living to just do it manually. And if that's so, there's just no way to get around it. I started logging how many bots the captcha and hidden variables were catching, and it was tons. Still, I get spammers. Just not nearly as many.
Maybe, they've hired a bunch of folks in: India, Mexico, whereever, to just manually register. It'd be cheaper than hiring a coder to figure it out. Also, that would be some really great image scanning software to read those words with all the crap that's drawn through them. I can barely read the ones for /. when I post - as AC.
Sometimes, it is a vocabulary lesson, though :-)
One is to write a program which recognizes the characters in the captcha. Algorithms for a surprising number of captcha types exist, so you may simply need to look for a better/harder captcha generator.
The other method is to provide a popular service and guard it with a remote captcha. This is usually done with free porn sites. The site promises to show the visitor some pictures, but only if he proves that he isn't a bot by entering the letters from the captcha. The captcha is the one from the forum that the porn site admin wants to post to. When the visitor solves the captcha, the forum post is made and the result of the captcha test on the forum site is taken as the result for the porn site as well. Since porn sites have a steady stream of visitors, they can spam many forums, so long as they use a standard posting verification scheme. A way around that may be to obfuscate the fact that you're using a captcha and what the captcha image is (compared to a standard installation of your forum software).
Good: CAPTCHA
Better: dynamically change the names of form fields ("subject", "message", etc) based on the current time. MD5 hash the current hour with the field name, and have the software only check the current and previous values. Spam bots generally have to be told what field names to look for.
Best: have good moderators who kill spam and block IP's more or less instantly. Not practical for smaller sites, of course.
-b
If I wanted a sig I would have filled in that stupid box.
Don't use phpbb, vbulletin or whichever other forum software everyone uses. Don't name your registration page "register.php" or something similarly easy to guess. Don't give your username and password fields name and id attributes of "username" and "password". Etc, etc. There is no security in obscurity, but there sure as hell is lots of convenience and freedom from automated harassment.
The rewards for writing scripts that can handle the subscription process for all the big software packages are simply too large. Yes, these software packages will now start up the arms race, same as has happened with weblogs and email and referer spammers (does anyone else have the feeling we've won that last one, btw?). You can try and follow along and update your forum software every other day. But it's much more convenient to simply duck under the radar. Chances are no spammer is going to bother figuring out how to register at your custom-built/modified forum.
If they are using something like hotmail, then maybe just disallow hotmail. Nobody with a brain uses it anymore anyway.
If they are using gmail, then maybe google would be nice enough to start a service where you could report addresses that bots are using. The great thing about google requiring invites is that google now has this neat chain of responsibility. If they see a pattern where all of the addresses created by invites from a certain person's account have been used as bots, then they could delete all those accounts and all the accounts they invited. That would seriously screw the spammers.
I'm guessing you're using phpBB. I've actually been hit by these guys on my boards; it wasn't a problem for me until they started to post. It appears to be actual people and not robots. I should also note I didn't have this problem until I added Google AdSense to my boards. After I did that, I started to get two or three of these spammers each week. Another phpBB board I administer hasn't gotten a spam user yet.
What worked for me was checking the registration e-mail addresses of these people and putting in bans for "*@mail.ru" and "*@*.info". On phpBB, you'll have to manually add these to your ban list table in the forum database. Given that a US board isn't likely to have legitimate users coming from Russia or with .info e-mail addresses (.info generally being the Internet equivalent of the sleazy parts of a big city), I don't think I'm really affecting potential new users. I haven't gotten any complaints or new spam users yet, so my technique seems to be working.
The Freelance Wizard
There are a number of options you have, depending on how aggressive you want to be. You may have implemented some of these suggestions already, but they may help other forum admins in a similar quandry.
Firstly, disable anonymous posting. What works for slashdot does not necessarily work for phpbb. This may sound obvious, but a forum I check on now and again is slowly haemorrhaging members due to guest bot spam.
Secondly, find yourself a list of public proxy servers. Ban them. Find some more. Ban them too. Also, take note of the IPs the spambots were using to post. Ban them as well (unless they are AOL IPs -- be smart and do an nslookup). Keep this list of banned IPs, and are them with the blacklist groups, or other forum admins you know. You help them, they help you.
Thirdly, augment your signup process. You say you are using CAPTCHAs, but if the bots are getting arond or through them, you have to do more. Write a few hundred straightforward questions; you can get your community to help you for this one. Have one o two of those questions displayed at regitration time, along with the CAPTCHA. For example:
Which of this is not one of the seven dwarves?
Or would you like another question ?
Keep this as simple as possible. "What color is the sky?" is about the level you are looking for. A bot won't be able to answer these unless it is specifically programmed to. Need I say you should serve a random question?
For bonus points on this one, make the questions something to do with the topic of the forums. If the forums were about widgets, you could ask something (really basic) like "What is the most common color of widget?". Or make come of the questions about the TOS. You know, the thing everyone checks the box saying "I agree to abide by the TOS". This may alienate some people, though, which you may or may not want. Also remember to consider non-native English speakers.
If you are sill getting those darned bots, consider manually approving by hand all registrations. This will obviously depend on how many new signups you get, and what kind of manpower you have (think moderators and "trusted community members"). On the other hand, you should be able to spot and stop bots right off the bat.
But why stop there? Be even more proactive! Set up a honeypot. Disallow a certain directory with robots.txt, and ban all IPs that find their way there. Include an invisible link to the disallowed location and see what falls in the trap. Remember that blacklist you started earlier? Add (and share) these IPs!
Finally, let your community know what you are doing. They will appreciate the effort (If you have noticed the spam, so have they). Set clear guidelines, and encourage community vigilance.
In the end, remember: spam is beatable.
If all you have is a grenade, pretty soon every problem looks like a foxhole -- MightyYar
"Captcha" techniques aren't bulletproof. If someone can automate all but the "captcha test" part of the posting process, then someone can sit and repeatedly answer the captcha test and still post spam pretty efficiently.
The only truly effective way to stop this crap is to require a certain amount of time to elapse before being able to post another post, like the way Slashdot does it, and to implement some kind of moderation+filtering system so the crap can be all be modded down by vigilant users. Combine that with a couple other requirements (you must have a user account to post, and new users can't post for the first 48 hours), and you'll easily sqaush the spam problem.
Moderator hint: a comment is neither "Flamebait" nor "Troll" if it is true.
i wont echo the above (kittens and altering html templates to make a more unique code process - both well worth it) but i say that on one site i used to run, we allowed anyone with 1000 posts, all members of a screening club .. and every new user had to have their posts screened before being posted .. once an account got to 10 non-spam posts, their group changed to allow normal postings.
.. and odds are, they'll help as well
i do recommend you use your community to help your community
Robert Anton Wilson
I'm certainly no expert in such things, but here are some suggestions. The idea, of course, is to make life difficult for the spam-bot (or the spam-bot writer I suppose) without making life hell for your users. You seem to already be using a CAPTCHA, but you could switch to a different one. Everytime you switch, the bot-writer has to update his code. This is annoying for him but is no big deal for your users, since they are humans and can pass whatever simple visual test you give them. You might also consider making small changes to the HTML of those "make new account" pages. It's likely that that bot is making many assumptions about how your page is organized. Changing the names of forms (or having random names), or changing subtle things about the layout (things that a human wouldn't even notice, but which would break an HTML parsing program that was expecting your page to be organized in a certain way) are also good ways to slow down the bots. Make the HTML obfuscated. Include bogus hidden forms, for instance.
Perhaps the best way to fix your site is to attack it yourself. Try to write a simple bot that automates the login process, and see what happens. You may suddenly notice a subtle hole in your security (maybe the filename for the captcha gives away what it is... or maybe after a successful verification, the same cookie can be used to create another account... or something). In the process of attacking your own site you may uncover something you've missed before.
I host a phpbb2 bulletinboard to help coordinate a team of amateur game developers. It's not linked anywhere, nor is it installed in the default directory. Still, one of these spam bots managed to find it and within a week had 50+ registrations of people with bogus web addresses.
My solution was to implement the visual check that everyone's talking about. I still get some registrations, but much fewer. What's crazy is that by default, these users can't do hardly anything. Unfortunately creating spam is basically free on a per-bulletin board basis.
I'm tempted to post some of them, just so they can feel the mighty power of Slashdot, but my account would probably be banned for life as I bet many of these sites have malware all over 'em.
...but those moderators burn out pretty damned quickly under the load that a concentrated attack can bring - every damned day.
c om/
The most recent batch to hit the site where I'm one of the mods, often use a *@mail.ru e-mail address and eight to ten character random character strings as the registered name.
Most of those we are getting link to sites like the following:
http://www.drugsn.com/
http://phentermine.snow-send.com/
http://internet-casino-gambling-online.snow-send.
http://xanax.crasn.com/
http://www.drugname.net/
http://adipex.crasn.com/
Be nice to be able to nuke 'em from orbit...
--
Tomas
First of all, check the user agents of the users/bots doing it, although this should be fairly obvious to check for and change, but its worth a look anyway. Another idea is to prevent all new users from posting links for a week or so, or even anything that looks like a link, like anything that contains "http://", "www", "w w w", and such like, anything that you can block that wont restrict normal conversation on the forums too much. Although, I suppose its possible that they may then turn to using gibberish, like the gibberish encountered in spam.
I saw a forum which required that you post a (non-'shopped) picture of yourself holding a 45 rpm record of the artist the forum was about before getting an account...best signal/noise ratio I ever saw with rec.guns, which seems to be moderated by gods because of the very high flame and spam potential!
Google passes Turing test : see my journal
Easy way of fixing the problem:
Install mod_security for Apache. Install the current development version (2.0.0-dev1) and use DNSBl with mod_security to block most of those spam-bots. Go to got root? and download their rule set and include it into your mod_security configuration.
That's it! This gives you a good tool set to fight the spam bots. I was able with the above mentioned setup to block ALL spam bots and all the anoying linkdumper bots, without any problems.
Some of these are glitchy, and the code can be obtained from hidden form values or the image URL.
I'll probably get downmodded for this but some GNAA members (a couple of them are MIT students) developed OCR tools that defeat captchas, very long ago.
Block reoccuring IP addresses used by spammers, non browser programs (yes bots do tend to identify themselves in access logs), and those who seemed to have directly access (bookmarked perhaps) the post page from nowhere.
DEAD DEAD DEAD DELETE ME
www.cheapmeds.com
Go ask the porn webmasters which CAPTCHAS work and which don't.
A better idea is to ask the people who spend their time brute forcing porn sites. They'll know what is undefeatable and what isn't, where the webmaster may only be worried about limiting the damage instead of preventing it outright.
[Fuck Beta]
o0t!
(well, how i *used* to do it)
...
1) set up some cheapo site (keygens, torrents, whatever)
2) have captchas for every torrent; use the images from your target
3)
4) Accounts!
No matter how fancy you make your captcha, human labor is cheap. This is especially true when you consider the lengths people are willing to go to get free internet porn. The most genius way I've heard of to beat CAPTCHAs:
1. Find links to a handful of free thumbnail galleries 2. Set up a webpage with links to said galleries 3. Make every outgoing link require filling in a CAPTCHA
When your page gets a hit, you pull down the CAPTCHA image (or whatever) from the target site, and serve it up to the masturbator. He/she (using left hand only) types the answer to the CAPTCHA, and gets free porn. You relay the answer to the target site, and get your account on. SPAM ahoy!
You know, it's possible the spam-bots are using human-based systems to bypass your "computer can't recognise it" authentication method. Here's two ways:
:)
1. Spammer farms out registration to third world sweatshops - for US$1 per day, a person just sits there and fills in registrations then passes them on to the bot system to use.
2. Spammer's system redirects your challenge to a "Free Porn Sign Up" page - now nudie hungry humans are filling doing it so they can see free naughties.
Either way is not impossible to figure out and implement - the former costs a small amount to run but could churn out heaps of applications. The latter would produce as many registrations as there are people hanging for "Free Porn" (a rather large number, no?
Me thinks a previous poster's comment about throwing in random questions based on the forum topic/theme/etc would help in either of these situations, no?
I left my body to science, but I'm afraid they've turned it down...
But im never taken seriously see this. What I can say its got nothing to do with images, questionnaires, javascript, java, active x, ajax etc. But I took a different approach being an ex email, forum, blog spammer and needing to create tens of thousands of accounts per day. I know how to get around CAPTCHA devices using OCR techniques etc and I have throughly tested my code and it works. I need help to protect my Idea and I would have expected at least someone from one of the bigger companies to at least contact me. My system is section 508 friendly and is transparent to the end user. Im not asking for any cash to be sent to me just legal advice and any copyright/Patent cost to be covered for this I would be offering a percentage of Royalties and with the number of sites requiring such a system it would be a very lucrative business venture. So if someone signs my disclosure I will explain how it works and show examples in exchange for information and the cost involved in copyrighting-patenting this software worldwide. My email is delusrexpert(i already get)@(heaps of spam)hotmail.com. With companies like Google, Yahoo, Hotmail, Lycos using Captcha I thought they would have contacted me straight away. Note my system is totaly transparent as stated above to the end user (unless they look at the html source code which really show nothing unusal) and all processing is carried out on the server side I have also created an ASP model so I can deliver forms from my server that can be placed in third party websites as to keep all the IP inhouse. I expect a number of numb nuts to flame me but I have it and you dont, ignorance is bliss.
My bank's system allows you to listen to a computer generated .wav instead.
I've got a small phpBB bulletin board setup to support some of my websites. For the last 4 or 5 months I've been using a CAPTCHA. It has done almost nothing to reduce SPAM because most of the spammers are from people in "cheap labour" areas who get paid to post. I assume that there's some sort of exchange market out there where people can hire people to make posts for pennies a piece. I don't know how they would track performance, but I assume that they've got that figured out.
I run a quiet phpBB for forum support of some websites of mine. For the last few months SPAM has outnumbered real posts by a large margin. I tried a CAPTA module (I think it was the built in one) and it did next to nothing - they aren't programs, the posts are from humans who have (low paying) jobs to post links on message boards.
I had reasonable success by limiting posts to people who have verified their email address -- I think that that was also a feature of a recent phpBB update.
But the spam still outnumbered posts, so in the last two weeks I've added these two phpBB mods:
http://www.phpbbhacks.com/download/4878 - this mod checks each registration IP address against the dns blacklists. I think that it improved the situation, but it didn't stop the problem out right, and I still had to clean up the board once in a while.
http://www.phpbbhacks.com/download/6208 - this mod gives a really easy way to delete a user and all of their posts at once. It's not a fix, but it's turned out to be the best solution. It only takes a few seconds to undo the damage from any one individual, no matter how many spam posts that they have made. A person could spend 20 minutes registering and posting 20 messages and I have to spend 20 seconds nuking the account and all it's posts. It's a fair trade, and I get some small satisfaction in that!
I've had quite good luck by using Apache mod_security (modsecurity.org) to filter web activity. Yes, all the suggestions people have been giving about CAPTCHAs, blocking people with addresses in high spam domains, etc., are all good and useful, but mod_security lets you cover a base those approaches are missing: it lets you block spammers from posting spam, even if they somehow manage to get through your registration defenses. I use a mod_security ruleset based on one published at http://gotroot.com/tiki-index.php?page=mod_securit y+rules which watches POST content for URLs and terms commonly used in spam postings, and blocks them--in adddition to rules that are more traditional for mod_security, such as blocking phpBB exploits--which I've also found it to be invaluable for.
I administer several forums and wikis that were having quite bad problems, even with CAPTCHAs, email verification, and so on. . . but the problems pretty much went away once I pulled mod_security into the battle.
Spam in forums should be dealt
as Email spam. Delete by filters.
Add spam to text filters sets to reduce all future spam posts to blanks.
sure its hard and time-consuming plus it
gets its share of CPU power but
Its most use-friendly.
No CAPTCHAs.: just text filtering.
All spam forms can be catalogued and string added to blocklists.
i.e. If you post something
(question marks indicate any letter)
Containing string "Am?z?ng op?or?un?ty"=
you get banned for a week.
Or if you post "ch?ap Vi?gra substitut?",It get text filtered to blank/_____ (underscore,to check with search and mass delete/scripted removal) .
Now,if the user persists you can ipblock him(after 4 posts in minute,ipblock for week ) etc.
as for KittenAuth if sucessful will lead to sweatshops of chinese kids furiosly clicking on kittens(Click the kitten with a hat to submit!) to post spam(which is authorized if poster human).
IP addresses: The big boys use open proxies all over the world. You'll often get spam which is clearly from the same source but comes from IP addresses all over the place.
User agent strings: Again, the big boys use proper user agents so that they look like regular browsers.
Referrers: Those are unreliable even with human visitors, as proxies (as e.g. used by companies) often filter those out. By relying on referrers you'll block a good portion of your regular visitors.
Having said that, there are tools like Bad Behavior which take a closer look at the HTTP requests, checking for non-conforming HTTP requests and typical indications of spam bots that do work quite well most of the time.
Use something like: reply e-mail activation and plain text only for n00bs. Then moderator review to get past n00b. One forum I joined, briefly, I as a n00b couldn't use post in html, upload avatar or use smilies (like I cared about that)
Sig Hansen?
If you dont want to use PHPbb, PunBB is great. Its much easier to make themes for since its XHTML 1.0 strict compliant, so most of the changes you can make are done with just the CSS.
Although a good idea, that I've seen on a forum once was that any new users, cant make a new topic until they make at least 2 replies first. Most bots are setup to make new topics and not replies. Although I guess they could change that. Ive even seen one forum that makes you wait 48hrs before you can ever post.
Another idea is to make all links, use the rel=nofollow, so search engines wont follow the link. I doubt the bots check for that but thats really what they are after.
Here are a couple places to start your search:
I'm just putting the final touches on my own hashcash implementation that doesn't require a server-side database, I'll post a link to my journal when it's publicly availble.
Bad Behavior ( http://ioerror.us/software/bad-behavior ) is my choice for I think pretty much everything for a few reasons:
e nt-1053 ) (the first actual comment, past all the trackbacks/pingbacks, USE THE LINK I JUST PASTED) details how you can use a .htaccess file (assuming you're in an apache environment where php is compiled as an apache module NOT AS A CGI BINARY!!! (----- IMPORTANT!!!!)) to protect your entire domain. .htaccess
/-full path-/T2/wp-content/plugins/bad-behavior/bad-behav ior-generic.php
:)
/path/to/bad-behavior/bad-behavior-generic.php
1) While it's not made for forum spam, it can still work with it. It comes with drop in files for a ton of CMS, Blogging, and many other web scripts.
2) If there's no file for your software, Podz comment ( http://www.ioerror.us/software/bad-behavior/#comm
Seeing as the site is susceptible to go down, I suppose I'll post the comment/instructions here:
===========
On my domain I currently have about 6 WP installs, and bbpress.
Bad-behaviour is installed into my main blog plugins directory and I have this line in my
php_value auto_prepend_file
I'll get no error logs maybe, but I do get site-wide protection.
If you activate the plugin as well as doing this, you WILL get errors. So don't
Comment by Podz -- April 25, 2005 @ 12:44 pm
===========
Note: Full path = file file system path to wherever you have the bad-behavior-generic.php file.
It can be rewritten as:
3) Bad Behavior 2 is going to rock, as it'll fit with the natural progression of web scripts. More modular and flexible for integration into nearly any piece of software for the web.
4) Captcha's (attempt) to prevent automated registration/form submission. However, bots can still roam your site and leech your bandwidth. Bad Behavior is configured so that bots recieve a simple error page. sub-1K vs. 10K or even more per page? (including even more for inline images or flash animations and such). You be the judge.
5) It is well maintained, well supported (all by only one person!) and io_error does in fact work with the community, especially when it comes to new bots or false positives.
Check it out, wontcha?
Nope.
The requirement to do something related to logic and common sense would have an additional benefit: No posts by politicians!
Yours, Christian
Forum spammers want to submit very specific content: hyperlinks (to boost their Google page rank). Our forum gets hammered by spambots hundreds of times per day, yet nothing comes through - we simply filter away any message containing a hyperlink (plain, non-clickable URLs are allowed). Works like a charm - no user registration, no fancy and annoying CAPTCHAs.