Block Spam Bots With Free CAPTCHA Service
Chirag Mehta writes "I just released a freeware service called BotBlock (barebones demo) that lets site owners copy/paste a few lines of PHP code and insert a CAPTCHA image-verification system into any web form. The amount of form spamming by bots is on a rise. While remedies exist for MT blogs, a more efficient solution is to use image-verification or text-identification. Used for a while by sites like Yahoo! (scroll to bottom), Hotmail and patented in 2001 by AltaVista, CAPTCHAs are now being used more widely. PARC also came up with two algorithms Baffletext and Pessimal Print. The technology always existed, but until now required the site owners to install image libraries and understand how to generate images that cannot be OCR'ed. With BotBlock it is like inserting a page counter."
A few things to keep in mind:
1) Colorblind people (10% of the male population of the world). By far the most common form of colorblindness is red/green, so as long as you stick with easily-distinguished colors like black, red, and blue, you should be fine. You could probably add yellow and a medium grey to the mix, but yellow can be hard for normal people to read, and on some monitors, grey can be mistaken for black.
2) Increase the overlapping of the characters a bit. Right now, the characters can usually be separated out by color into three images, at which point a spambot can simply pick the one that matches the color of the instruction image.
3) You can make an audio CAPTCHA harder for computers to recognize by adding noise to the sound, or by using recordings of a person with a strong accent (or better still, a variety of accents)
"They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
The problem is, generating all those sentences. The sentences have to vary, they can't all be: My name is Barney Big Purple Dinosaur. What are my initials? My name is Einstein Mozart Bach Quartet. What are my initials? Then a spammer could just use regular expressions to handle that. Even Java introduced an easy-to-use regex package a few versions ago. Another problem is, you would have to generate literally billions of them, because a spammer may theoretically just hit a service with billions of requests - who's to say that the requests are real or not? And then the ultimate problem: How are we going to generate all these questions? A computer, of course, but the problem is again, how does a computer generate billions of these things so only a human and not a computer can interpret it? At that point, you're approaching true AI. And if we had AI, forget the spam problem: Just have the AI process each and every email.
Even if you had an image that was 0% readable by OCR, image verification only stops "pure bot" spamming. It does not stop someone writing a helper or proxy app that presents them with a list of 1000 images that they type out in a very efficient manner. This could mean the difference between a million and a thousand spams per hour, but that's still a thousand spams per hour. And if you dismiss this as something that nobody would bother to do, you obviously don't know anything about spammers...
This is not the greatest sig in the world, this is just a tribute.