Preventing Forum Spam-bots?
A concerned reader asks: "Recently it seems that forums have become the new target for spam bots advertising everything from porn to casinos. The forums that I admin are constantly harassed by these bots even though you must enter the visual confirmation code code (the picture with letters/numbers) as well as reply to an e-mail in order to register. This only started a few months ago so I'm suspecting that some new spam program was released that somehow gets around these anti-bot measures. How can I get rid of these annoying bots?"
kittens
For the record, those blurred/skewed letters and numbers are called a "Completely Automated Public Turing test to tell Computers and Humans Apart" - Captcha.
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
Just display a confirmation page with the goatse.cx picture.
Anyone who can still click on the confirm button is not human.
...it's patented. (and Turing is spinning in his grave...)
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
Add hidden variables to submission forms that change everyday. This will force the bot software to do pagescraping for your specific webforum, which probably isn't worth their time. They will go to the easier targets first.
But if they are defeating captcha, there is probably someone who just sits there manually spamming forums through anonymous proxies. The amount of money that can be made by doing this spamming is probably enough to pay people with lower standards of living to just do it manually. And if that's so, there's just no way to get around it. I started logging how many bots the captcha and hidden variables were catching, and it was tons. Still, I get spammers. Just not nearly as many.
Good: CAPTCHA
Better: dynamically change the names of form fields ("subject", "message", etc) based on the current time. MD5 hash the current hour with the field name, and have the software only check the current and previous values. Spam bots generally have to be told what field names to look for.
Best: have good moderators who kill spam and block IP's more or less instantly. Not practical for smaller sites, of course.
-b
If I wanted a sig I would have filled in that stupid box.
Don't use phpbb, vbulletin or whichever other forum software everyone uses. Don't name your registration page "register.php" or something similarly easy to guess. Don't give your username and password fields name and id attributes of "username" and "password". Etc, etc. There is no security in obscurity, but there sure as hell is lots of convenience and freedom from automated harassment.
The rewards for writing scripts that can handle the subscription process for all the big software packages are simply too large. Yes, these software packages will now start up the arms race, same as has happened with weblogs and email and referer spammers (does anyone else have the feeling we've won that last one, btw?). You can try and follow along and update your forum software every other day. But it's much more convenient to simply duck under the radar. Chances are no spammer is going to bother figuring out how to register at your custom-built/modified forum.
If they are using something like hotmail, then maybe just disallow hotmail. Nobody with a brain uses it anymore anyway.
If they are using gmail, then maybe google would be nice enough to start a service where you could report addresses that bots are using. The great thing about google requiring invites is that google now has this neat chain of responsibility. If they see a pattern where all of the addresses created by invites from a certain person's account have been used as bots, then they could delete all those accounts and all the accounts they invited. That would seriously screw the spammers.
I'm guessing you're using phpBB. I've actually been hit by these guys on my boards; it wasn't a problem for me until they started to post. It appears to be actual people and not robots. I should also note I didn't have this problem until I added Google AdSense to my boards. After I did that, I started to get two or three of these spammers each week. Another phpBB board I administer hasn't gotten a spam user yet.
What worked for me was checking the registration e-mail addresses of these people and putting in bans for "*@mail.ru" and "*@*.info". On phpBB, you'll have to manually add these to your ban list table in the forum database. Given that a US board isn't likely to have legitimate users coming from Russia or with .info e-mail addresses (.info generally being the Internet equivalent of the sleazy parts of a big city), I don't think I'm really affecting potential new users. I haven't gotten any complaints or new spam users yet, so my technique seems to be working.
The Freelance Wizard
If a site makes me wait three days, though, I'm likely to forget about it in that time.
Or were you talking about smaller grace periods? Perhaps 10 minutes? That might work well.
Earn a % of cash back from Newegg, Tiger Direct, Walmart.com, and more: http://www.mrrebates.com?refid=458505
There are a number of options you have, depending on how aggressive you want to be. You may have implemented some of these suggestions already, but they may help other forum admins in a similar quandry.
Firstly, disable anonymous posting. What works for slashdot does not necessarily work for phpbb. This may sound obvious, but a forum I check on now and again is slowly haemorrhaging members due to guest bot spam.
Secondly, find yourself a list of public proxy servers. Ban them. Find some more. Ban them too. Also, take note of the IPs the spambots were using to post. Ban them as well (unless they are AOL IPs -- be smart and do an nslookup). Keep this list of banned IPs, and are them with the blacklist groups, or other forum admins you know. You help them, they help you.
Thirdly, augment your signup process. You say you are using CAPTCHAs, but if the bots are getting arond or through them, you have to do more. Write a few hundred straightforward questions; you can get your community to help you for this one. Have one o two of those questions displayed at regitration time, along with the CAPTCHA. For example:
Which of this is not one of the seven dwarves?
Or would you like another question ?
Keep this as simple as possible. "What color is the sky?" is about the level you are looking for. A bot won't be able to answer these unless it is specifically programmed to. Need I say you should serve a random question?
For bonus points on this one, make the questions something to do with the topic of the forums. If the forums were about widgets, you could ask something (really basic) like "What is the most common color of widget?". Or make come of the questions about the TOS. You know, the thing everyone checks the box saying "I agree to abide by the TOS". This may alienate some people, though, which you may or may not want. Also remember to consider non-native English speakers.
If you are sill getting those darned bots, consider manually approving by hand all registrations. This will obviously depend on how many new signups you get, and what kind of manpower you have (think moderators and "trusted community members"). On the other hand, you should be able to spot and stop bots right off the bat.
But why stop there? Be even more proactive! Set up a honeypot. Disallow a certain directory with robots.txt, and ban all IPs that find their way there. Include an invisible link to the disallowed location and see what falls in the trap. Remember that blacklist you started earlier? Add (and share) these IPs!
Finally, let your community know what you are doing. They will appreciate the effort (If you have noticed the spam, so have they). Set clear guidelines, and encourage community vigilance.
In the end, remember: spam is beatable.
If all you have is a grenade, pretty soon every problem looks like a foxhole -- MightyYar
"Captcha" techniques aren't bulletproof. If someone can automate all but the "captcha test" part of the posting process, then someone can sit and repeatedly answer the captcha test and still post spam pretty efficiently.
The only truly effective way to stop this crap is to require a certain amount of time to elapse before being able to post another post, like the way Slashdot does it, and to implement some kind of moderation+filtering system so the crap can be all be modded down by vigilant users. Combine that with a couple other requirements (you must have a user account to post, and new users can't post for the first 48 hours), and you'll easily sqaush the spam problem.
Moderator hint: a comment is neither "Flamebait" nor "Troll" if it is true.
i wont echo the above (kittens and altering html templates to make a more unique code process - both well worth it) but i say that on one site i used to run, we allowed anyone with 1000 posts, all members of a screening club .. and every new user had to have their posts screened before being posted .. once an account got to 10 non-spam posts, their group changed to allow normal postings.
.. and odds are, they'll help as well
i do recommend you use your community to help your community
Robert Anton Wilson
I'm certainly no expert in such things, but here are some suggestions. The idea, of course, is to make life difficult for the spam-bot (or the spam-bot writer I suppose) without making life hell for your users. You seem to already be using a CAPTCHA, but you could switch to a different one. Everytime you switch, the bot-writer has to update his code. This is annoying for him but is no big deal for your users, since they are humans and can pass whatever simple visual test you give them. You might also consider making small changes to the HTML of those "make new account" pages. It's likely that that bot is making many assumptions about how your page is organized. Changing the names of forms (or having random names), or changing subtle things about the layout (things that a human wouldn't even notice, but which would break an HTML parsing program that was expecting your page to be organized in a certain way) are also good ways to slow down the bots. Make the HTML obfuscated. Include bogus hidden forms, for instance.
Perhaps the best way to fix your site is to attack it yourself. Try to write a simple bot that automates the login process, and see what happens. You may suddenly notice a subtle hole in your security (maybe the filename for the captcha gives away what it is... or maybe after a successful verification, the same cookie can be used to create another account... or something). In the process of attacking your own site you may uncover something you've missed before.
I saw a forum which required that you post a (non-'shopped) picture of yourself holding a 45 rpm record of the artist the forum was about before getting an account...best signal/noise ratio I ever saw with rec.guns, which seems to be moderated by gods because of the very high flame and spam potential!
Google passes Turing test : see my journal
www.cheapmeds.com
Google, Yahoo and MSN have already done this. Simply insert 'rel="nofollow"' into all the tags that people post in the comments, and although they still show up it makes it pointless for those spammers trying to increase their PageRank.
I know this won't help with the unsightly comments on your website, but since this is the slashdot crowd just flag all the comments with URLs in them as 'hidden' and on a daily/whenever basis go through them deleting spam and unhiding legitimate comments. Stick this all in a central control panel and it's unlikely to take up more than 10 minutes of your time.
In addition to that, just stop any client with a useragent string that contains a URL or one of the known spambot names.
http://www.kloth.net/internet/bottrap.php - A quick implementation of a bot-trap, which bans bots which don't follow your robots.txt directions.
I run a quiet phpBB for forum support of some websites of mine. For the last few months SPAM has outnumbered real posts by a large margin. I tried a CAPTA module (I think it was the built in one) and it did next to nothing - they aren't programs, the posts are from humans who have (low paying) jobs to post links on message boards.
I had reasonable success by limiting posts to people who have verified their email address -- I think that that was also a feature of a recent phpBB update.
But the spam still outnumbered posts, so in the last two weeks I've added these two phpBB mods:
http://www.phpbbhacks.com/download/4878 - this mod checks each registration IP address against the dns blacklists. I think that it improved the situation, but it didn't stop the problem out right, and I still had to clean up the board once in a while.
http://www.phpbbhacks.com/download/6208 - this mod gives a really easy way to delete a user and all of their posts at once. It's not a fix, but it's turned out to be the best solution. It only takes a few seconds to undo the damage from any one individual, no matter how many spam posts that they have made. A person could spend 20 minutes registering and posting 20 messages and I have to spend 20 seconds nuking the account and all it's posts. It's a fair trade, and I get some small satisfaction in that!
It would work reasonably as well in reverse: Allow the person's posts, but forward them to a moderator. If the moderator determines them to be spam, that poster gets the boot (along with all their posts). Add in some intelligent "Find Similar" logic, and you'd have y'erself a good start at a forum anti-spam system.
Information wants to be free.
Entertainment wants to be paid.
You just want to be cheap.
I've had quite good luck by using Apache mod_security (modsecurity.org) to filter web activity. Yes, all the suggestions people have been giving about CAPTCHAs, blocking people with addresses in high spam domains, etc., are all good and useful, but mod_security lets you cover a base those approaches are missing: it lets you block spammers from posting spam, even if they somehow manage to get through your registration defenses. I use a mod_security ruleset based on one published at http://gotroot.com/tiki-index.php?page=mod_securit y+rules which watches POST content for URLs and terms commonly used in spam postings, and blocks them--in adddition to rules that are more traditional for mod_security, such as blocking phpBB exploits--which I've also found it to be invaluable for.
I administer several forums and wikis that were having quite bad problems, even with CAPTCHAs, email verification, and so on. . . but the problems pretty much went away once I pulled mod_security into the battle.
I've wondered what would happen if you distorted the CAPTCHA using a site's name or URL instead of a random background. Do you think at least some people would hesitate a moment if you went to some random porn site and had to type a CAPTCHA with slashdot.org watermarked in the background?
Stick this all in a central control panel and it's unlikely to take up more than 10 minutes of your time.
I basically gave up on blogging because I had to sort through 500 spam comments a day. I know another blogger who had to clean 7,000 (yes, thousand) spams out of his blog every day.
It took both of us longer than 10 minutes.
Forum spammers want to submit very specific content: hyperlinks (to boost their Google page rank). Our forum gets hammered by spambots hundreds of times per day, yet nothing comes through - we simply filter away any message containing a hyperlink (plain, non-clickable URLs are allowed). Works like a charm - no user registration, no fancy and annoying CAPTCHAs.