A Vision For a World Free of CAPTCHAs

← Back to Stories (view on slashdot.org)

A Vision For a World Free of CAPTCHAs

Posted by Soulskill on Friday April 24, 2009 @06:12PM from the is-that-an-oh-or-a-zero dept.

An anonymous reader writes "Slate argues that we're going about verifying humans on the Web all wrong: 'As Alan Turing laid out in the 1950 paper that postulated his test, the goal is to determine whether a computer can behave like a human, not perform tasks that a human can. The reason CAPTCHAs have a term limit is that they measure ability, not behavior. ... the random, circuitous way that people interact with Web pages — the scrolling and highlighting and typing and retyping — would be very difficult for a bot to mimic. A system that could capture the way humans interact with forms algorithmically could eventually relieve humans of the need to prove anything altogether.' Seems smart, if an algorithm could actually do that."

34 of 168 comments (clear)

Min score:

Reason:

Sort:

Just a Thought... by ryanleary · 2009-04-24 18:14 · Score: 5, Insightful

It seems to me that if you can design an algorithm to verify how humans interact with a computer, it should be relatively trivial to engineer an algorithm that mimics this interaction?
Maybe someone smarter than I could clarify?
1. Re:Just a Thought... by Nazlfrag · 2009-04-24 18:17 · Score: 5, Insightful
  
  Using anything other than a human to judge the behaviour puts it outside of the Turing test. So not only does their proposed solution not match the goal they set, it should indeed be defeatable by another algorithm.
2. Re:Just a Thought... by l3prador · 2009-04-24 18:25 · Score: 4, Insightful
  
  Yep. If you can characterize the behavior pattern enough to automatically determine that it's "human-like," then you can automatically generate "human-like" behavior. The only way around it that I can see is if there is some sort of asymmetrical information involved, such as the invisible form honeypot mentioned in TFA--the website's creator (and thus the bot-detection script) knows that there is an invisible form present, but it's difficult for a script to see without rendering the site in standards compliant CSS.
3. Re:Just a Thought... by Anonymous Coward · 2009-04-24 18:39 · Score: 3, Insightful
  
  So if I have an algorithm that can verify an integer factorization quickly, it means there must be an algorithm that can factor any integer quickly? How would that work?
4. Re:Just a Thought... by RiotingPacifist · 2009-04-24 19:01 · Score: 3, Insightful
  
  If you have a botnet then a single computer probably dosen't need to try a site more often than a human would.
  
  --
  IranAir Flight 655 never forget!
5. Re:Just a Thought... by cjfs · 2009-04-24 19:03 · Score: 2, Funny
  
  It seems to me that if you can design an algorithm to verify how humans interact with a computer, it should be relatively trivial to engineer an algorithm that mimics this interaction?
  Maybe someone smarter than I could clarify?
  You're looking at this all backwards. This isn't the humans attempting to prevent access to the bots. It's the bots getting the humans to speed up their evolutionary arms race.
  Think of it, bots trying to determine bot from non-bot. Bots honing their human-infiltration skills vs the best of the bots. It'll be the greatest leap since spam filtering. We'll^WThey'll be getting +5s again on Slashdot in no time!
6. Re:Just a Thought... by julesh · 2009-04-24 19:16 · Score: 3, Insightful
  
  It seems to me that if you can design an algorithm to verify how humans interact with a computer, it should be relatively trivial to engineer an algorithm that mimics this interaction?
  Maybe someone smarter than I could clarify?
  Sometimes it's easier to write an algorithm that checks that something is correct than to generate that something in the first place. An example: if you have a public key, checking a message is signed with it is fairly easy; signing a message with it is hard, because it requires you to factor the key.
  I see no evidence that "human behaviour" is such an algorithm. It might be, but we're way too far off understanding it to be able to make any sensible guesses in this field.
  A simplified approach is doomed to failure; simplified human behaviour is much more likely to behave like you suggest than like public keys, I think. Also, because different people interact with their browser in different ways, how do you cope with that? I tend to navigate via keyboard, so would the script reject me because I tabbed to the form field (thus jumping directly to it) rather than scrolling circuitously to reach it? I also make far fewer typos than average and type faster than the average user, so is this going to count against me?
7. Re:Just a Thought... by major_fault · 2009-04-24 19:19 · Score: 5, Insightful
  
  No algorithm will do. Ultimately the question that must be solved is whether the user is malicious or not. Best possibilities so far are the tried and true invitation system and excluding malicious users from the system. Malicious users are also users who keep including other malicious users. Easily detectable with proper moderation system that needn't be gotten into right here and now.
8. Re:Just a Thought... by Devout_IPUite · 2009-04-24 19:29 · Score: 2, Informative
  
  Factoring an integer has one answer. Trial and error doesn't work. Scrolling and clicking tempos have many answers, trial and error does work.
9. Re:Just a Thought... by 1+a+bee · 2009-04-24 19:49 · Score: 4, Insightful
  
  So if I have an algorithm that can verify an integer factorization quickly, it means there must be an algorithm that can factor any integer quickly? How would that work?
  The anonymous poster makes a good counter argument against the idea that the algorithm must be easily defeatible: just because you have an algorithm that detects human behavior does not imply you have an algorithm that emulates the human behavior detected by the original algorithm.
  In fact, there are many, so-called, one-way (correct terminology?) algorithms. So, for example, for a given file, it's easy to compute its MD5; harder to compute a file for a given MD5 (though doable). And of course, the AC's better example which is impossibly hard in reverse for composite numbers made from very large prime factors.
  So no. Labeling the idea flawedbydesign is jumping the gun--logically, speaking.
10. Re:Just a Thought... by Joce640k · 2009-04-24 20:10 · Score: 4, Interesting
  
  I disagree. I don't think there's anything terribly un-mimicable about the way humans interact with web pages.
  Besides, have you considered the effect of false positives (which will be many)?
  With a captcha it's a black/white decision and people know why they passed/failed.
  In the world being proposed in the article people will have to sit dejectedly wiggling their mouse while a web page decides if they're human or not based on some unknown criteria. Pass or fail? It's up to the machine.
  After two or three sessions of this people will be running away screaming from your web pages.
  
  --
  No sig today...
11. Re:Just a Thought... by Joce640k · 2009-04-24 23:31 · Score: 2, Insightful
  
  I'd say it's a lot more of a slam-dunk than this:
  "Read heavily distorted text on random patterned backgrounds with added noise and geometric figures drawn across it"
  My real problem with the proposal is with the false positives. There's no clear feedback to let a user know *why* he's not being allowed into the system, it's just that the machine doesn't like the look of him.
  
  --
  No sig today...
12. Re:Just a Thought... by cskrat · 2009-04-25 00:44 · Score: 2, Interesting
  
  The anonymous poster that you're responding to was actually the one to introduce the word "quickly" to the discussion.
  That being said, I think the method proposed at the end of the article is flawed in that the algorithm is reversible and facing the wrong direction.
  Assuming that the website in question only has access to the message information passed to the GUI window of the browser by the OS, (I'm sure as hell never installing a browser with ring 0 access to my system) it would be fairly trivial to produce an AI algorithm to replicate that behavior. A few hard coded target parameters and a bit of randomization would sufficiently emulate a human based on gathered metrics of a small sample, possibly as small as just one, human subjects. And don't forget that spammers don't need anywhere near a %100 success rate to be viable.
  The checking process, on the other hand, would require a very large, heterogeneous sample of human subjects to determine the limits, distribution, and correlations of tested metrics. A team of statisticians and psychologists would be required to analyze the data so that it can be converted into a working algorithm by software engineers. That's an enormous amount of man hours just to produce the system. Assuming, however, that the system is produced in spite of it's high development cost, it would still be computationally expensive to analyze each potential human to see if it's generating a valid combination of metrics.
  Think of it this way, It's trivial for me to write a PHP script to quickly generate valid XML markup to send to a remote system. Parsing a string of potential XML on the other side, however, is more computationally intensive and the algorithms to do it are more complex, especially if you consider the complexity of any prebuilt parsing tools, such as regular expression tools, as being part of the overall algorithm complexity. While, granted, a parser can be reasonably expected to run in linear time, the script to produce XML can be reduced to constant time if optimized for a specific purpose.
  
  --
  My God! It's full of eval()'s.
13. Re:Just a Thought... by jonaskoelker · 2009-04-25 01:28 · Score: 4, Funny
  
  There's no clear feedback to let a user know *why* he's not being allowed into the system, it's just that the machine doesn't like the look of him.
  So it's like dating? ;)
14. Re:Just a Thought... by xaosflux · 2009-04-25 02:40 · Score: 2, Funny
  
  And yes, that must be a capital "5" !
15. Re:Just a Thought... by dcollins · 2009-04-25 03:24 · Score: 2, Informative
  
  The anonymous poster makes a good counter argument against the idea that the algorithm must be easily defeatible: just because you have an algorithm that detects human behavior does not imply you have an algorithm that emulates the human behavior detected by the original algorithm.
  That's vaguely clever, but it doesn't really pass the sniff test. While "one-way" or "trapdoor" functions may or may not exist, they appear to be pretty rare. That's why it's such a big deal when computer scientists identify a new possible trapdoor function. The chances that any randomly-chosen process happens to be trapdoor (for example, verifying human mouse gestures on a webpage) is monumentally unlikely.
  
  Trapdoor functions came to prominence in cryptography in the mid-1970s with the publication of asymmetric (or public key) encryption techniques by Diffie, Hellman, and Merkle. Indeed, Diffie and Hellman first coined the term (Diffie and Hellman, 1976). Several function classes have been proposed, and it soon became obvious that trapdoor functions are harder to find than was initially thought.
  
  http://en.wikipedia.org/wiki/Trap_door_function
  
  --
  We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
Not so sure by Misanthrope · 2009-04-24 18:19 · Score: 4, Insightful

Assuming you could write an algorithm to determine humanistic behavior, it stands to reason that you could write a bot to fool the initial algorithm.
1. Re:Not so sure by TheRaven64 · 2009-04-24 22:55 · Score: 3, Insightful
  
  Not true. For example, any NP-complete problems can be solved in polynomial time on a nondeterministic Turing machine, but a solution can be verified in polynomial time on a deterministic Turing machine. There are lots of examples of this kind of problem, for example factoring the product of two primes or the travelling salesman problem. In a vast number of cases, it is easier to test whether a solution is correct than it is to produce the solution. Even division is an example of this; it is easier to find c in a*b = c than it is to find a in c/b = a.
  Of course, as the other poster said, there is no evidence that 'seeming human' is in this category, and it's a very wooly description of a problem so it is probably not even possible to prove one way or the other.
  
  --
  I am TheRaven on Soylent News
I read something about this by gcnaddict · 2009-04-24 18:21 · Score: 4, Interesting

I remember reading... I can't remember if it was a post about an algorithm already written or a proposal for an algorithm which would run alongside a CAPTCHA through the entire registration process, but the basic premise was just that: measure the entropy and fluidity of human movement and determine whether or not the user is a bot based on whether or not the user fits typical random human usage patterns.

I also remember the writer of the post noting that this kind of system would basically stretch the human-unwittingly-answers-CAPTCHA out such that humans would have to do the entire setup process manually instead of just the CAPTCHA, thus defeating the point of automated setup.

Does anyone have this article? I can remember reading it but I can't find it.

--
Viable Slashdot alternatives: https://pipedot.org/ and http://soylentnews.org/
1. Re:I read something about this by abolitiontheory · 2009-04-24 19:25 · Score: 4, Insightful
  
  In addition to this, what about those humans who just happen to fall into the seemingly 'mechanical pattern' that a computer registrant would? I know some parents of friends who very meticulously and methodically fill out forms, reading every box and explanation to ensure that they're inputting the right data.
  Any computer judgment of what is authentically human is in a way a reverse Turing test. It's a computer judging if humans are behaving enough like humans. The problem here is too many degrees of separation: a very specific type of human [engineer] designs a computer to assess the 'humanness' of other humans actions. Any such assessment would be based on certain assumptions and biases about how humans act. It sounds like putting a document through Google translator into another language and then back again, before turning it in for a final grade.
2. Re:I read something about this by TheRaven64 · 2009-04-24 23:01 · Score: 2, Insightful
  
  It's a nice idea, but unfortunately it's easy for a computer to work around. How does the client-side JavaScript know how much the page has been scrolled? Because the browser tells it. There is nothing stopping a bot from downloading the page and then submitting the same HTTP requests that the client-side JavaScript would (or even running it in a VM and injecting DOM events into it with some random wait events). Once you know the algorithm used on the server to determine whether something is human, it's easy to work around it. In your simple example, the client just needs to sleep for 30 seconds between downloading and submitting the form - one line of code to program, while the test is likely to need at least four lines. This limits the number of registrations a single bot can do in a single day, but only to one site - the bot can overlap its requests so that it's hitting 30 sites at once, and then it's back up to one spam per second. Or, it may keep using the slow approach, making its traffic harder to spot.
  
  --
  I am TheRaven on Soylent News
3. Re:I read something about this by caramelcarrot · 2009-04-25 00:23 · Score: 3, Interesting
  
  Last time this came up, I suggested the idea of constant bayesian analysis on HTTP logs to determine the likelyhood of the current user being a bot.
  
  It could take things into account like if the user bothered to visit previous pages, request images, the time between requests etc. You could then either just make the webserver kill the connection, or you could add a function to your preferred web language (e.g. PHP) that returned the probability that the current user is a bot, and so redirect them to a more annoying turing test or block them.
  
  This'd also work pretty effectively if people wanted to stop scrapers and bots in browser games. Of course a bot could mimic all this, but it'd raise the cost of entry significantly - and it might end up being that the bot is no more effective than a human working 24/7, though even then you'd need to be changing ips constantly.
  
  I was thinking of trying to implement this over the summer, based on comment spam bots on my website, all without any need for client-side spying
Tech Support by cjfs · 2009-04-24 18:53 · Score: 5, Funny

I can see it now: "have you tried moving your mouse around randomly?", "how about clicking on a few different parts of the page then making coffee?", "still not working? Try slamming the mouse down several times", "okay, as a last resort click on the tabloid pop-up."
What does it mean to be human? by mcrbids · 2009-04-24 19:01 · Score: 4, Insightful

It's a lot tougher do define what a human is than it may seem on the surface, and the difference between man and machine will, by definition become more and more blurred until there is no effective difference.
It's an idea that I've become familiar with esp. aftre reading 'The Singularity is Near' by Ray Kurzweil. As our technology advances, we'll find that our capabilies beyond our technolgy will diminish. Machines have long ago surpassed our running speed (cars/planes/trains) and our ability to farm/grow food (tractors) and our ability to hurl object (guns) and swim (boats) but we've always had the ability to out-think our machines.
Increasingly, this isn't true.
We've already shown that SPAM filters are good enough to be more accurate than the people who read the messages. Machines have long been better than people for math-related stuff, keeping track of stuff, and the like, but now we're getting close to the threshhold for image processing and character recognition. It's already true for voice recognition. Captcha is, therefore, doomed to fall eventually as we approach the singularity, and is already pretty weakened. The next question is, therefore simple: what does it mean to be human?
Remember Lt. Commander Data on Star Trek, trying to be human? It's quaint largely because he/it was a minority on he show, but in reality the machine will outnumber us by a wide margin - they already do!
So what does it mean to be human?
If you have a prosthetic leg, are you still human?
If the leg has a CPU in it, are you still human?
If the CPU is more powerful than your mind, are you still human?
If the chip is wired into your mind, are you still human?
If you use the CPU as though it were part of your mind, are you still human?
If you have transferred modt of your thinking to the CPU, are you still human?
If you transferred all your thinking to the CPU and rarely use your 'wet' brain, are you still human?
If you find th

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.
1. Re:What does it mean to be human? by Devout_IPUite · 2009-04-24 19:32 · Score: 2, Interesting
  
  I might recommend http://en.wikipedia.org/wiki/Homosapien for further reading on this topic. Clearly, you are not a human no matter how smart you are if you're a computer. Are you a person? Well, depends how you define 'person'.
2. Re:What does it mean to be human? by mcrbids · 2009-04-25 01:50 · Score: 2, Interesting
  
  Yes but machines can't sue for damages. If you crash your car, no matter how 'smart' it is, it won't take you to court for driving drunk.
  But what if the car had an intelligence directly derived from a real person, like the logical progression of an amputee to a full machine above?
  That's the point of this discussion. If you develop software that so closely emulates the human mind that it (and anybody else talking to it) can't tell the difference, is it human? If the software is a direct descendant of a 'natural born' human, is it human? Can it sue? Can it get married?
  (kinda makes the whole gay marriage thing pale, huh?)
  
  --
  I have no problem with your religion until you decide it's reason to deprive others of the truth.
Strength in unity-in-diversity by brettz9 · 2009-04-24 19:49 · Score: 2, Insightful

The problem with a lot of sites dealing with spam is that they are using the same software that tries to solve everything at the top. Uniformity doesn't help.
But leaving people to their own devices to create or adapt their own forum/blogging/wiki software is not a good solution either. Uncoordinated diversity leaves a lot of people to fend for themselves.
Having unity-in-diversity (a common strength across systems and organisms), however, might well solve the problem.
If forum/blogging/wiki software creators would give sites the opportunity to make (and be able to change) their own set of question and answers for first-time-users (and not trouble them after that), I think bots would be hard-pressed to be programmed to interpret all such site-specific questions on their own. If bots could actually be programmed to intelligently answer all such human language questions, I think the bot-makers could be making a lot more dough in legitimate business...
Spam Karma? by nilbog · 2009-04-24 20:43 · Score: 2, Informative

It seems like the old Spam Karma module for Wordpress did this. It calculated how long they were on the page vs. how much they had typed, how fast they typed, and a bunch of other factors before it ever hit a captcha. Back when I used wordpress I remember being it pretty accurate too.

--
or else!
Re:Response Times by Jason+Pollock · 2009-04-24 21:30 · Score: 2, Informative

These guys have botnets, and with networks like Tor, you can't limit access to one IP. Besides, if you've got captcha that is being attacked, to limit them by IP, you need to send them all through a single location to perform the detection, completely breaking your load balancing. It becomes a DoS target.
Basically, the attacker has more machines, more IP addresses and more time than the target.
Even if I only have one machine, that's fine, I attack 10 or 100 sites instead of just yours. Or, I use a network like Tor and select random out proxies. The only problem? All of my compatriots will be doing the same.
The target won't see any real decrease in attacks, they will only lose all of their corporate customers who are unable to access the network from home (or dorms, or school, or libraries).
voice recording by Ofloo · 2009-04-24 21:35 · Score: 2, Insightful

Think of every behavior as a voice recording, record and replay ! And there you go bots are able to mimic.
Not a great idea by jgoemat · 2009-04-24 22:03 · Score: 3, Interesting

The article did have links to some interesting topics, such as google experimenting with image orientation as a test. The premise of using how a user interacts with a page is deeply flawed though. There's not even a need for an algorithm or program to 'figure out' the captcha, just record how an actual user interacts once and you can send the same exact thing every time to pass the test. The reason this works is because the 'question' doesn't change. This would be like showing the same text captcha every time. If they ignore identical values being sent, the values can just be fudged a bit.
Use Turbo Tax Lately by SunSpot505 · 2009-04-24 22:17 · Score: 2, Interesting

When I posted question to the Turbo Tax community forum it asked a simple question as a CAPTCHA. Seems like an easy enough solution, and it changes each time to foil a persistent brute force attack.

Of course I'm sure it's only a matter of time before someone has an algorithm smart enought to answer questions. And I suppose that a botnet with enought time would work too. Still an interesting approah I thought.
"Scrolling and typing" by Arancaytar · 2009-04-24 22:21 · Score: 2, Insightful

The user's local behavior before form submission is detectable only via a client-side script. There are therefore two ways this can go.
1.) You maintain accessibility standards and make the client-side script optional. The effectiveness of this approach is comparable to xkcd's "When Littlefoot's mother died in /Land before Time/, did you feel sad? (Bots: NO LYING!)
2.) You require client-side script execution in order to submit the form. The effect is a lot of pissed-off users with NoScript or non-compatible Javascript interpreters (IE or the rest, depending on which one you support).
This idea is basically like visual captchas, but instead of the visually impaired, you're screwing everyone without Javascript.
There is one aspect of user behavior that can be detected, however, and that is the time passed between the user requesting the form and submitting it. From an AI perspective, humans spend an eternity typing, so setting a minimum delay between request and submission will slow the bot right down - especially with a flood control that requires a delay before submitting the next form. Slashdot does both of these things already, by the way.
Here's a test by monoqlith · 2009-04-25 02:37 · Score: 2, Funny

Can Slate stop writing articles about shit it doesn't know about?