Why the CAPTCHA Approach Is Doomed
TechnoBabble Pro writes "The CAPTCHA idea sounds simple: prevent bots from massively abusing a website (e.g. to get many email or social network accounts, and send spam), by giving users a test which is easy for humans, but impossible for computers. Is there really such a thing as a well-balanced CAPTCHA, easy on human eyes, but tough on bots? TechnoBabble Pro has a piece on 3 CAPTCHA gotchas which show why any puzzle which isn't a nuisance to legitimate users, won't be much hindrance to abusers, either. It looks like we need a different approach to stop the bots."
...is the point going right over the author's head.
A CAPTCHA works well enough for the same reason greylisting works well enough. They may be trivial to bypass (for some definition of 'trivial'), buy many applications only need a tiny speed-bump to make a huge difference in undesirable traffic.
That's where the issue is.
I've been a nerd since I was born. Grew up with early computers. Watched them evolve until now. But nothing makes me feel dumber than trying a CAPTCHA 5 or 6 times and failing every time. Its a serious annoyance and I've seen WORSE that I haven't even attempted.
Job? I don't have time to get a job! Who will sit around and bitch about being broke and unemployed then?
Everyone seems to think that the answer to this is to challenge the user somehow. Why isn't a technical solution possible that doesn't require any interaction from a person?
On my own contact forms, I use a really simple obfuscation technique, it doesn't require any user interaction, and I don't get any spam. I've chosen to name my form elements with meaningless names, because obviously automated spammers rely on field names to fill in the blanks. If they see a form like this:
<input type="text" name="email">
<input type="text" name="subject">
<input type="text" name="message">
Obviously it's pretty easy to fill out. If they see this instead:
<input type="text" name="sj38d74j">
<input type="text" name="9sk2i84h">
<input type="text" name="m29s784j">
Then they probably won't even make it past the email validation part, unless they catch the error that my page is printing and try all combinations (or get lucky).
It makes it even more effective when you use fields with good names, but hide them from users with either CSS or Javascript:
<input type="text" name="email" style="display: none;">
That's a honeypot, if it's filled out then it's a robot. You can use the same CSS or Javascript techniques to also print messages informing users not to fill those out if their browser decides to not run my code and instead shows them.
Really simple solution, requiring no user interaction, and is at least if not more effective than a challenge and response type of solution. I don't know why everyone is hung up on a visual challenge when it's a lot easier to distinguish between a real web browser and a scraper that doesn't bother to execute Javascript or apply CSS. I've been saying this for years though, so I don't really expect anyone to start paying attention now.. at least my own inbox is spam-free though.
Because an open ended question would get a million different responses.
And having the user select a radio button would narrow the probability down to 1/X choices. And when you have a million bots, 1/x is more than enough to get your spam out.
Most posts on this topic have been along the lines of, "Maybe CAPTCHAs as they are implement now don't work, but here is a method that is trivial for people but hard for computers."
TFA's best argument, in my opinion, was that it is trivially inexpensive for a spammer to simply hire people to break CAPTCHAs. So, a method that doesn't annoy people but is hard for computers still won't work because the spammer will just use people. This is not a topic I know a lot about (not being a spammer I don't know what kind of revenue they generate) but would like to hear a response to this. Is the TFA off its gourd and better technology really will solve this problem? Or is gate-keeping for free services essentially pointless?
Everyone has a great idea for a CAPTCHA, but very few people know what the hell is really going on. Remember that the machine doesn't need to solve the CAPTCHA every time, that machines are infinitely patient and have huge memories, and that another machine needs to make sure the human gave the right answer!
Ideas that won't work:
Really, it's very easy to think you've come up with a very clever CAPTCHA. When you think that, all you've done is stoked your ego and screwed yourself over. It's the same reason why we don't roll our own cryptography: CAPTCHA-making is a very hard problem, mainly because your problem space must be infinite (to avoid an attacking machine simply memorizing answers), the answers verifiable by a machine, but the problems not solvable by a machine.
How many questions can be checked by machines but not answered by them?
Not many; fewer every day. There are no questions that can't be answered by a computer (and which can be answered by a human mind). The Church-Turing thesis [wikipedia.org] has some validity: the human mind is no more powerful than a turing machine, and ultimately, computers and our brains are equivalently computationally. There's nothing a computer can't solve: there are just things we haven't figured out yet.
A CAPTCHA is not a Turing test. A Turing test requires that a person tell a computer and a human apart; the CAPTCHA problem is harder, from a certain point of view, because a computer is required to tell a human and a computer apart.
Limit the email the account can send, and you reduce the desire for the account. Reduce the usefullness of the account, and you reduce the desire to crack the captcha on new account signups, or at least the profitability in doing so.
Doesn't this increase the desire to get more accounts faster?