Smart Spam Filtering For Forums and Blogs?
phorm writes "While filtering for spam on email and other related mediums seems to be fairly productive, there is a growing issue with spam on forums, message-boards, blogs, and other such sites. In many cases, sites use prevention methods such as captchas or question-answer values to try and restrict input to human-only visitors. However, even with such safeguards — and especially with most forms of captcha being cracked fairly often these days — it seems that spammers are becoming an increasing nuisance in this regard. While searching for plugins or extensions to spamassassin etc I have had little luck finding anything not tied into the email framework. Google searches for PHP-based spam filtering tends to come up with mostly commercial and/or more email-related filters. Does anyone know of a good system for filtering spam in general messages? Preferably such a system would be FOSS, and something with a daemon component (accessible by port or socket) to offer quick response-times."
Any method you use can be broken. Your only chance is to reduce the likelihood that your site is worth the effort.
Basically, if you use a common solution - no matter of FOSS or commercial - then there will be a thousand other sites that use it too. This attracts attackers because they know when they hack it once, they can re-use it.
However, if you handcode something, no matter how primitive, it likely lasts a lot longer because nobody bothers hacking into your site...
Of course that doesn't work if you have a large site like myspace - there, a single site is worth the effort by itself.
Anyway - then there are two things - a really fast moving animated gif and silly things where you ask people to identify items usually work.
I help out with a site that randomly takes five pictures of cats and dogs and it asks you to identify which of the images contains the highest number of kittens... We barely ever get spam through - and that with almost 20K attempted submissions by non-humans a day makes us pretty happy
Peter.
I have a series of 4 tests to block spam on my website. So far it has stopped over 30,000 attempts in the last year.
Test one is, does the last name = the first name. For some reason almost all spammers do this.
Second, do they use a keyword from a list of about 15 words.
Third, do they fill out a hidden inputbox? This is sort of the reverse captcha.
Finally do they use more than 4 "http" in a post. Almost all comment spam is an SEO effort to increase their pagerank.
"During My Service In The United States Congress, I Took The Initiative In Creating The Internet." -Al Gore
The difficulty in evaluating Akismet - I speak not as a user but as someone who ended up apparently blacklisted and having to try their appeals system - is that everyone I see praising it is by definition the kind of person who pays attention to the filter and therefore will train it effectively. Since your average wordpress.com user more likely lets false positives pile up, I'd love to know how effective it is for people who don't wonder how effective it is.
I have implemented something similar, but I haven't been checking the number of blocked messages. All I know is that I used to get spam, and now I haven't gotten any for years. I use this for Formus and the Contact Us page.
My rules are:
1) The text boxes for things like name and subject are actually called junk.
2) There are hidden textboxes called name and subject (1 hidden by javascript and one by CSS) that if they are populated the post is ignored.
3) A third hidden field is the result of a simple javascript math equation that is checked on the server side. If the value is wrong, the post is thrown out.
As others have said, if your site is small these types of things are good enough to prevent spam because the spammers won't bother to figure it out. These concepts would never work for any of the larger sites or 3rd party forum software.