How to Prevent Form Spam Without Captchas
UnderAttack writes "Spam submitted to web contact forms and forums continues to be a huge problem. The standard way out is the use of captchas. However, captchas can be hard to read even for humans. And if implemented wrong, they will be read by the bots. The SANS Internet Storm Center covers a nice set of alternatives to captchas. For example, the use of style sheets to hide certain form fields from humans, but make them 'attractive' to bots. The idea of these methods is to increase the work a spammer has to do to spam the form without inconveniencing regular users."
Ok, so captchas and other email obfuscation mechanisms are used a lot. Fine, a web designer can choose to do this.
Now, lets enter US law: American with Disabilities Act. Target is currently being sued for NOT complying with this federal law. I can understand why businesses would be required for this, but where will the net-boundaries stop?
For example, I have a US corp. I hire an offshore datacenter to handle web processing. Is my website have the compulsory ADA lawss upon it, or do they not apply due to international boundaries? Yipe.
Why is it so hard to make a captcha that a bot can't read but a human can?
The slashdot captchas are among the easiest I have ever seen to read, however I still havn't seen any spam on slashdot. Is there something else goign on here? It can't be anything like IP banning or flood controlls as those don't stop botnets. Is it that spammers just don't target slashdot? or is it that captcha reading bots are not nearly that good at breaking them and we could tone down the level of those horrible tiwsted-doted-lined Captchas?
Do Or Do Not, There Is No Spoon, There Is Only Zuul. Everything in the above post is probably opinion.
How accessible is this though? Won't it hinder those who use screen readers?
If it doesn't, this honestly isn't a solution in my opinion.
Good. Cheap. Fast. Pick Two.
I hadn't read the article yet, and just the summary, and as soon as they said 'hidden fields' that are attractive to spambots, I thought "Why not hide the fields from the spambot instead?"
It's easy, you just have the javascript create all or part of the form. Or modify the form in some way. It would happen before the user even sees the form, and the spambot would have to implement a javascript parser to get it. (Or a parser, that's unique to your site.)
I would think AJAX would be a huge hamper to them as well.
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
I'm already using the "identify this" / "identify that" approach. I went from 75+ spams a day to zero. Seems no hand-fed spam for my site. I'm very happy.
m mstein.php/
http://lyricslist.com/lyrics/artist_albums/663/ra
This is still somewhat problematic for blind users. If decoy field names are picked up when CSS is turned off, then there will be a lot of users exposed to the bogus fields.
Just shoot 'em on sight.
KFG
Men's and Ladies Prestige Watches For all occasions! Perfect Christmas gifts!
These replicas have all the presence and poise of the originals after whome they were designed at a fraction of the cost. The attention to detail is paramount and they are comparable to the originals in every way.
To view our huge inventory visit our website now at:
http://pwned31337.ku/
: Replicated to the smallest detail
: 98% A+ Accuracy
: Includes all Proper Markings
: Wide selection and fast worldwide shipping
: Authentic Weight
: True-to-original self winding and quartz mechanisms
: Guaranteed worldwide Christmas delivery
Private Key encrypt the randomized field names and have a hidden Public Key field. That way, the fields foo, bar, and abacab have no sense of meaning to the bots, but will decrypt to subject, body, and spammer catcher.
How well do these 'invisible forms' work on browsers that don't make the greatest effort to comply with W3C guidelines concerning style sheets? They might stop spammers, but it might make the contact form difficult to navigate for users of everyone's favorite browser...
Think about it ... the slashdot crowd is technical and informed and "knows better" ... why would someone spambot slashdot? It surely would not be effective...
I like Slashdots, it uses real words, also google's approach is good too
...can it be clearly labeld as bogus? Something like:
Subject: _______{-enter your spam topic here if you want me to disregard your email
Can the label/tag telling someone to leave a field blank be hidden form a bot but clearly visible to a live person?
Hiding things seems like a good way to get search engines to not like you.
Well, that's nice then...
!ERR: Signature not found.
My Method is to just disallow posting of html. I have a simple blog, and if they try to do anything like post too many HREFs or or something, then I just deny the post. That seemed to work for the most part. The bots usually tried to post URLs on my site, so if they posted something like with < and >. They also try posting [link]...[/link] which also doesn't work on my blog, so I just display an error message and let the user fix it. You can still post straight URLs, but that's not too good for spammers, because they usually want a link. I also stop people from trying to post more than 5 URLs in a single post, since I noticed the bots like to do that. I recently upgraded by blog to use AJAX to submit the comments. Adds an extra layer of protection against the bots, but I really haven't needed any since I added in the filters mentioned above.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
This page took me 7 seconds.. ..oh damn aMule is active.. -.-
Well, I for one think that blind people should be allowed to participate on the web, so why not make "captchas" that'll work for the blind?
For instance:
"Please enter the second word of the following sentence to continue: The dog had a long tail".
I'm a dreamer, the world is my playpen. But hey, I'm a serious person, I can't dream all the time.
Warn them before they post that they can't post spam.
Make it a contract to post there.
If someone posts spam then make them a 1 or 2 bucks. Money$$
Or even organize other blogs and websites to sue them.
Comment removed based on user account deletion
Since the editors didn't see fit to put this in related links:
What Ways Can Sites Handle Spambot Attacks?
Fascism starts when the efficiency of the government becomes more important than the rights of the people.
one method I use to avoid use of captchas is to require javascript (yes, this sucks somewhat) to use my forms. when the form is 'submitted', i dynamically add a new form element into the form and then submit the form. server side, i check for the additional form element. it works pretty well - it seems that bots don't run javascript. i've received very few complaints (2 or so in as many years) from non JS enabled people.
I quickly eyeball the 100+ bot submissions daily for the few *real* submissions. The rest are for "Laboratory Equipment", Viagra, mail-order brides, porn, and other crap.
And before anyone asks, I *have* looked into modding the scripts to add a simple barrier for these bots, but the scripts are in the ugliest perl code I've ever seen in my life (sorry Gossamer, but the code makes my eyes bleed), and while I have written/tweaked perl in the past, I don't have the patience to tackle Links.
I have noticed in the logs that the submission POST is the the only hit from the bot, so this package must be well-known to these bots, and not customized for *my* site (or so I assume). Would this be thwarted by generating random form field names each time the page is loaded and processed? If the same CGI page does the initial form *and* processes the POST, this should be feasible, no? Or do these bots actually process the human-readable rendered form to do their work?
Method of processing duck feet
I have 2 blogs set up on Blogger, one with a customized stylesheet and another using one of the standard CSS templates. I am not sure how good Blogger 1.0 does to prevent bot spam on blogs that allow anonymous posting, but there seems to be a lot of it around.
However, the one with the customized style sheet receives no bot spam! The 'Comment' link is actually called 'Talk about this', and the whole section of the Blogger posting is set up differently (i.e. left to right rather than top to bottom). The one that uses a standard CSS template has lots and lots of botspam. I think that the bots are programmed to see which template the page has (its right there in the source) and then they know which links will be the links to the comment area.
So the person that suggested even moving the form field around, well I know this is not dynamic movement, but it sure seemed to have worked. Now if my customized blog was popular enough... that would be a different story.
-b.
It seems like people rediscover the same techniques over and over and over without even bothering to do a simple Google search to find out if things have been done before. I block about 90% of submitted spam using Bad Behavior. I'm working on the other 10%...
How am I supposed to fit a pithy, relevant quote into 120 characters?
I have a small-ish website that allows people to submit sites that they want listed in my directory (think old Yahoo). I review the sites submitted before adding them so I can make sure the sites are relevant. Robo-spam submission was getting pretty horrible so I switched to a simple captcha script and it stopped all the robo-spam. Problem is, spam is still getting through because humans are still submitting things by hand. Somebody in India, for example, is getting paid to manually submit irrelevant sites to my little weight training site. Wish I could stop it but at least it's better than robo-spam.
I live ze unknown. I love ze unknown. I am ze unknown.
One of these things not like the others:
Cat dog fish *car*
Black *stapler* white red
car truck *J-lo* SUV
*Madonna* J-lo K-fed Ja-rule
Obama's legacy: (N)othing (S)ecure (A)nywhere and (T)error (S)imulation (A)dministration
I guess you'd better hope that brail terminals have a javascript parser.
I can't read the article because it appears to be /.'d, but I have a technique that has foiled a spammer from using my web mail form and it would probably work with discussion forums, too.
.htaccess with a "Deny from" directive.
In the program run to process form input, I check the HTTP_REFERER header sent by the client. It should exactly match the URL of the form that was being posted, if it doesn't, then you know that someone is accessing the input program illegally, i.e. they aren't using your form. It seems that the spambots out there send a referer that matches my site's main domain, but doesn't include the full URL of the form.
Of course, now that this has been posted, it is only a matter of time before the bots are fixed to send the whole form URL. 'Course, I have a couple of other tricks to separate the bots from the humans.
What does my program do when it detects a bot? It returns a 403 Forbidden error and adds the ip address of the client to
I'll have to actually RTFA when it becomes available again later.
Just be sure to wear the gold uniform when you beam down -- you know what happens when you wear the red one.
What if there were instructions on the web page that only a human could interpret? I know that sounds like the captcha. But, I mean something like "What is three times two"? Or, have a drop down list box of colors or patterns (like checked, striped or solid. Then tell the people to choose the color that matches closest something you present randomly. Make it easy by only offering black, white or red.
Sorry for the ignorance, but where are the /. captchas? I don't run into any when submitting comments...are they somewhere else?
This will prevent 100% of the bots from even entering your page... ... plus a few IE users.
None of the spambots that attack my site fetch the comments page before trying to post. There's never (and I do mean never) a GET before a spambot's POST. So I have a hidden field with a meaningless name ("magic"), and the value is set to the server's current time. Comments with timestamps that are too old are ignored.
To make it less obvious that the value is a timestamp, it's XORed with a random number (which is included in the form value) and eight random, meaningless bytes are thrown in for good measure. The end result is 32 seemingly-random hex digits--it looks just like a session ID.
This technique certainly isn't going to fool a determined attacker, but no spammer is going to waste their time trying to figure it out.
and install Akismet
Another idea I had (but haven't tried implementing yet) is to work from timings. Have some javascript on the page that notes times like how long it took you to fill out and submit the form, and send that back with the form. If it's way too fast, it's probably spam.
One blog I know has a fill-in form and one of the optional fields almost nobody will fill in. If you do fill it in, you get an error page saying to leave the field blank.
The site owners tell me the bots always fill it in.
Oops.
What actually all but eliminated spam sent through my web forms is disallowing newlines in fields where they shouldn't be (like the subject and from address fields).
Please correct me if I got my facts wrong.
Instead of introducing images and their associated problems and disadvantages, I would give users a little puzzle to solve; something that requires them to understand human language. Something like "Enter the first letter of every word in this sentence", "What color is a banana?", etc.
Please correct me if I got my facts wrong.
I'm guessing an end user would notice when IE opened up and started filling out forms on websites. Or they could use a hacked firefox, but then the worm payload would be gigantic. Compare that to the current bots which evade detection by running in the background.
I run two largish Vbulletin forums - and we get at least 1-2 spammers a day. I haven't found a way to prevent them yet, but I have found a way to stop em from getting any traffic or money for the unsuspecting idiot that clicks on them.
.
:)
I use an anti-spam e-mail technique: blacklist.
Vbulletin has a censoring system where words you choose can be replaced with your choice of characters - by default it's an *. www.clickmeforspam.com, where I would use the "clickmeforspam.com" as the censored word, shows up as www.******************
It's quite hilarious to see the humans behind the spam, who have registered, gotten through a human image trap, clicked on a link e-mailed to them, logged in and posted their spam re-post it like 2-3 times only to realize they got owned by my filter. They get all pissed off, and by that time a user has reported the post or we've seen it and banned them. It's very fun to make fun of them in their spam posts filled with ***s.
> > - Neal writes: "[on some site the] submission from .. asks you to enter ...
> > text found in a gif. However, no matter what you enter the first time,
> > it says you entered it wrong"
> Mean and devious. I like it!
Yeah, I came across that one myself one time, and, uh.. huh? If it just refuses anything you answer how can you submit the form?
I personally hate the Slashdot lameness filter. It punishes fast typists who want to get their point across, without being verbose. Not all replies have to be several paragraphs long. I wish the user's karma/posting history would lessen the grip of the lameness filter. I assure you I'm not abusing the comment system. Don't tell me to slowdown, and I'm not a cowboy.
So far, the approaches I've heard that I like the best are simple human question (what is X times/plus/minus X, what is the second word in this sentence, etc). Field obfuscation and embedded public/private keys are pretty useful techniques. Even though I don't like making a form only work when javascript is enabled, but there was a pretty clever little script that didn't apply the "action" of the form until is submitted would probably confuse a lot of spam bots as well.
However, I really haven't heard much mention of using email verification. Unless you are a registered user, then you have to provide an email address that a confirmation email is sent to. Once you click on the link that is sent in an email, then the comment becomes active. That is one method that I am currently using and so far so good (also engadget uses this method).
So, what are the drawbacks of the email verification method other than some people not wanting to give an email address just to post a comment? I think it significantly raises the "cost" of trying to spam since that process can't as easily be automated and would require them to have to check a specific email address (future attempts could be blacklisted by either email address or domain).
When I have a kid, I want to put him in one of those strollers for twins and then run around the mall looking frantic.
http://yro.slashdot.org/yro/04/01/28/1344207.shtml
It's a pretty obvious way to get around it, so I'm not surprised that it has been done.
Do two things...
So, you'll have to change this depending on how many people, on average, sign up for new accounts.
1) Don't allow more than one new account from the same IP address in a single hour.
2) Decide how many people sign up in a single day, on average. Multiply that muber by 1.5. Divide 24 hours by that number.
Put a single block of javascript that disables the button for X seconds, depending on what that number was (and for those without js on, make a bit bold warning), and when tht time period is up, it automatically sends the info.
If the login info is sent before that time period expires, lock out that IP for 24 hours.
So, the bad part, this only works reasonably well for sites with lots of news sign-ups every day (like a few hundred/few thousand. Otherwise you'll just have to pick an arbitrary time limit, like 2 minutes maybe), but the good part, a minute or two is not too long to have a tab open sitting in a window while the timer clicks down, and the best a spammer can do is a few accounts an hour, instead of thousands. This will defeat even the groups that hire real humans to sit and read captchas all day.
- Scramble field names, so the INPUT named "comments" is actually for the email address, etc.
- Multiple type="submit" buttons, all but the real one hidden using CSS, all in a random order on each page load
- Non-intuitive action= names such as b41gzL924.php which are further generated by Javascript in the client browser from an obfuscated string
- REFERRER games
Of these, singly, I have found obfuscating the name of the submit script for the FORM the most effective. The Javascript code is left as an exercise for the reader.I'm guessing that I might not be the only person here that does not know what "CAPTCHA" stands for, so here is is: "Completely Automated Public Turing test to tell Computers and Humans Apart". And it is apparently trademarked by Carnegie Mellon University.
http://en.wikipedia.org/wiki/CAPTCHA
Firehed - Unfortunately, thanks to medical breakthroughs, common sense is not as common as it once was.
An easy way to get rid of automatic form spamming is to require some intrinsically difficult computation to be performed before submitting the form. This computation would be performed in a few seconds by Javascript while the user is filling the form. A robot would need to do the same computation, which spammers can't probably afford (unless if spamming from botnets).
Example of computation: the server picks a random string S and a random integer I between, say, 0 and 1000, sends S and md5(S+I) to the client and ask it to find I.
Firstly, the ADA does not require you to make impractical or impossible accomodations for those with disabilities. The actual law uses the language not requiring "undue burden" (open to interpretation). It requires business to make "readily achievable" changes.
From the DOJ website: "The ADA does not require the provision of any auxiliary aid that would result in an undue burden or in a fundamental alteration in the nature of the goods or services provided by a public accommodation." Altering a website is not considered by most courts to be an "undue burden", and making a website accessible is not particularly difficult if it is taken into account when designing the site to begin with. Yes, retrofitting a website can suck, but that does not absolve a business from doing it properly to begin with. For a small website, it doesn't take anything more than making sure it is usable in Lynx.
I would not call blind people a "special interest group", in sense that they aren't say, timber industry or oil company lobbyists. It is not as if somebody chooses to be blind. We should not require blind people to be dependent on family help that may not be available. There is "no shame in asking a family member to..."? I have a funny feeling you would not feel the same way if you actually were blind and had to totally rely on others for your daily activities.
SirWired
You know it makes sense!
I implemented something like this on a phpBB forum I run. The 'Register' link is linked to a file called 'register.php' which in turn redirects you to Google. This link is hidden using stylesheets. After that, there's a second Register link which goes to a file called 'logout.php' which in turn redirects to the correct registration page.
I also randomized the form field names.. They're all now md5(fieldname+day-year).. which means they change every day at midnight. This should completely block any bot which searches for fieldnames like 'login' and 'password' to populate.
Sadly, this hasn't completely eliminated the spam bots. I found that preventing them from adding a URL to their profile until they've been a member for a set amount of time has helped, as has automatically purging accounts which don't make posts within 48 hours.. but ultimately, a few still get through. As the article says, however, these have to be done manually. There's no way it's an automated bot.. so we're wasting their time. The more people who do that, the better..
As opposed to using just adding "display: none" to the style of an input element (as TFA advocates), which the spambot writers can easily detect and ignore, would it be more useful to convert the style information into the equivalent ascii entities?
3 2;none" or, now that I've revealed my clever scheme to the world, some variation thereof. Alternately, the display style could be set in a class somewhere and the class name could be set as entities.
For example "display: none" -> "display:&#
Would this even slow the spambots down, or do they generally have access to the entire DOM?
Use CSS' media types.
Aural, braille, and embossed are all media types that would hide the fields for blind users if done correctly (i.e. used and the reader supports it, which you'd think they would want to). This technique is not the only reason why blind user's tools need to work differently based on mediate type in CSS.
My Suburban burns less gasoline than your Prius.
In the PHP code for the site, I set it to check for the referrer, and if it's not from my own domain, then bounce the bot back to the front page. The logic behind this: No one bookmarks my "News Submission" page. People always browse there. Once I started checking for a referrer, the spam completely stopped. There was no extra step that my viewers had to take, it happened without anyone outside of my staff noticing a change.
I realize this is a temporary solution, and just posting about it increases the chance that someone will read about it and crack it, but it sure has been a nice reprieve. What's ridiculous is that form doesn't post anywhere. It sends me and about six other people an email, and the post never really sees the light of day in its original form.
I am Leviathant and I approve this message.
Is to use images instead of text. For example, out of three images, two are alike. Third one isn't. User has to pick the mismatching one to activate his/her account. Here is a really good example:
Great Example
P.S. Some image categories can be confusing.. In this case, axe vs mace can make a human fail the test.
I won't stand for that, so the simple fix is to remove the "WEBSITE" input from the form. If "WEBSITE" gets POSTed along with the other data, I know it's a robot and post a message to kindly go away. Genuine users can edit their profile once the account is activated, if they want to plug their website.
Author, Shell Scripting : Expert Re
Here's an example of how you might comply- just make the text visible, then you're providing the same opportunity for both blind and seeing. Check out the comment section on the bottom of this page (Optical Illusion) or this page (Pumpkin Carving).
This is a fun topic for me 'cause I've been experimenting for the past 2 months. I've written a pretty simple filter that rates text and links. Most of the time form-spam has keywords in it because that's what the spammers want- live links that have their keywords. It works for now, I don't expect it to work forever. Anyway I mostly avoid captchas.
Two weeks ago I started collecting statistics about what is getting tagged spam and how it got caught. Yesterday alone 103 attempts were made (by various botnets as is apparent form their IPs) and 103 messages were filtered out immediately. -Ed
Wouldn't an animated captcha be impossible (or very difficult) to OCR?
for sites of enough traffic:
randomly pair users in private chat rooms (ajax, of course) and have them decide on each other if they are human or computer...
It's good to see someone talking about the disabled as human beings for a change, rather than an irritating extra factor to be taken into account.
Plus, anyone who doesn't have the technical ability and plain understanding of the medium to make their sites accessible from the start, really shouldn't be using the web.
Amateurs.
Are you aware that this method discriminates against users of America Online and users in less developed countries whose ISPs generally offer only web access behind a NAT?
This method discriminates against users in countries where all ISPs bill by the minute.
Implementing the same thing for the blind would be simple..
Have a link for a blind captcha that plays sound files.
simply have a recording say the word to type in.. Over time I'm sure the spammers will add voice recognition to the bots but it'll take time and more processing power to spam.
when that stops working move on to animal sounds or songs..
If you think it's expensive to hire a professional to do the job, wait until you hire an amateur. --Red Adair
just use javascript, spam bots cant parse and evaluate javascript like browsers can (unless the spam bot is automating a browser)|But since most bots dont eval javascript, you can hide hidden form inside java script, java script is still the best way to kill a spiderbot.
Shameless plug! I developed a plugin for Ruby on Rails that uses DNSBLs to combat form spam. (begin shameless self promotion)
dnsbl_check rails plugin
Basically what the plugin does is check clients against one or more DNSBLs. You might know them from mail servers. You see, it turns out that the forms are almost always abused by bots. These bots are quite well known. sbl-xbl from spamhaus catches 80% in my setup, spamcop catches the rest. You enable the plugin for key controllers and it really does work.
(/end shameless self promotion) mod me down if you wish
- Color + Text: "Type the red word below"
- Simple Math Word Problems: "If Jenny has two apples and Tim gives her one, how many does she have?"
- Pictures: "What is this a picture of?" (use things with only one simple name, like cow, sandwich, hand)
- Trivia: "What color is the sky on a clear day?"
Potential problems - cultural and language barriers, color-blindnessI really like the CSS idea; have extra form elements with style="display:none" and ignore entries that fill them in (watch out for auto-fill programs on legit users!)
Use my userscript to add story images to Slashdot. There's no going back.
On the other hand suing a city because their 100 year old historic court house doesn't have a lift in it is pure bullshit.
And what if someone who's disabled has business there? Say they have to see a judge or testify? Many people don't become disabled simply by their own actions.
FalconShould there be a Law?
First off, SSI is for supplementing low income. SSDI is for disability.
Are you sure? I get SSI not SSDI because of a disability.
One reason people don't get the services they need is that people like you assume that if you can't see the disability, then they probably don't have one.
That's a problem I've run into due to my disability. By looks you can't tell I'm disabled, I'm not paralyzed, have no missing body parts, or am disfigured. My disability is neurological, I am a TBI, Traumatic Brain Injury survivor.
FalconShould there be a Law?
They don't.
In fact, all the modern Web 2.0 / CSS / Flash stuff is basically lost on people using Braille terminals or screen readers (I think at this point, screen reading software is more popular for blind people than Braille terms are). And in some cases it makes pages nearly impossible to use or navigate.
I think every web designer should be forced to navigate his or her site at least once, by using Lynx with a window height of one line. That's probably the closest easy approximation to using a Braille terminal that you can get.
I used to know a guy (sighted, actually) who had a Braille terminal and showed me how it works. They're fairly neat devices; I can imagine that one would be a big fan of the CLI with one.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
I think if a company has the means and market it should attempt to go all ADA compliant. I think it's a good thing to get ramps, lifts, braille/etc. I just don't think it's a good idea to FORCE it upon people.
I wonder if you'll feel the same when you become disabled. In college I used to work with disabled people; for a tyme I was a reader for some blind students; I worked for handicapped services; and I learned ASL, American Sign Language because I had some hearing impaired friends. I did this dispite having a Computer Engineering not a socialogy or other related major. Now however I have a disability and I HATE it!!! After classes one day I was riding my bike when a moving van hit me. The driver was a diabetic and they say he had a seizer while driving. So through none of my own fault I ended up with a disability, and it can happen to anyone else.
FalconShould there be a Law?
Your example is completely idiotic. Germans don't call Germans Germans. They call Germans Deutchlanders, and they don't get all pissy when we say "Germany."
Unfortunately it's been too long and my memory is poor but when I was in Germany and took German in college later we used "Deutch".
FalconShould there be a Law?
I started a website where users submit information and it gets posted to the site. It hasn't quite taken off, and it has recently been discovered by Spam bots. The thing is, they can't figure out how to submit the form. The form includes a calendar with links to other months. These links act as submit buttons, but only to persist the data from page to page. The spam bot fills it all out, and then goes to the next month. Then quits, because it thinks its done. Its been a few months now, and not a single bot has ever gotten down to the real submit button. I don't know why, because the month links submit the form via javascript. The real button is just an So try putting in a fake submit button.
In my experience, spambots ignore the result of their posts. so the user clicks "Submit", then is presented with a confirmation page generated by the script. After clicking "Submit" again, the form is processed. I've NEVER gotten a spambot message using this system.
If those became common it would be trivial to write a program that would interpret them. For example, there are a limited number of 'quantity' or 'counting' words. All I'd have to do was look for the word {first,second,third,fourth,...} and then from the second word group, where word groups are delimited by {;,:,.,...}, and count that many words in and insert it. Even if the machine was only right 50% of the time, that would still be acceptable for a botnet that can do it every few seconds.
True text-based CAPCHAS would require something more complicated. Basically, a reading-comprehension test that's beyond the known ability of natural-language processing AIs. For example (and I'm just assuming that an AI couldn't do this, I'm not involved in AI research), something like this. Note that you'd really have to do the whole test on that page, too; you can't just do one multiple-choice test, because then you'd have a 20% false-pass rate, when an attacker could choose randomly and get it right. (For a 1% false-pass, you need to have at least 3, 5-option multiple choice questions, and you can't allow any retries.) If you used that page's test, you'd have 7 4-option questions, giving you a (if I did my math right) 0.000061% chance of passing using random answers.
There are definitely possibilities there, but you'd probably get people complaining that it discriminates against people who don't have the linguistic or cultural background to pass the test, although they're human. That might be fine in an online forum (where knocking out people that don't speak English isn't really a big loss to them or you), but for a government website it would probably not pass muster. At least not unless it was in a country that had a single official language.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Bot operators are not stupid, they learn too. Just like they finally broke through Captchas, they will break through any method that becomes sufficiently popular or pervasive. The way to keep them at bay would be to use as many diverse methods as possible. Many readers described their own simple ideas that (they claim) work. They will only work as long as they dont become popular.
"What is the nth word of this sentence" is an idea that seems to catch everybody's fancy here. I'm sure somebody is already working on breaking that in their basement.
R.. you derive the form field name from a random key (that is included in a hidden field).. so that the receiving page can descipher the field name and associate the correct data.
To make it human readable, randomize the sequence of text lables and position them and the form fields using CSS positioning.
The result should be a very readble nice form for the naked eye.. while the page source would be very hard to interperate.
Yeeeeha!
IANAL but write like a drunk one.