A Vision For a World Free of CAPTCHAs
An anonymous reader writes "Slate argues that we're going about verifying humans on the Web all wrong: 'As Alan Turing laid out in the 1950 paper that postulated his test, the goal is to determine whether a computer can behave like a human, not perform tasks that a human can. The reason CAPTCHAs have a term limit is that they measure ability, not behavior. ... the random, circuitous way that people interact with Web pages — the scrolling and highlighting and typing and retyping — would be very difficult for a bot to mimic. A system that could capture the way humans interact with forms algorithmically could eventually relieve humans of the need to prove anything altogether.' Seems smart, if an algorithm could actually do that."
It seems to me that if you can design an algorithm to verify how humans interact with a computer, it should be relatively trivial to engineer an algorithm that mimics this interaction?
Maybe someone smarter than I could clarify?
Assuming you could write an algorithm to determine humanistic behavior, it stands to reason that you could write a bot to fool the initial algorithm.
I remember reading... I can't remember if it was a post about an algorithm already written or a proposal for an algorithm which would run alongside a CAPTCHA through the entire registration process, but the basic premise was just that: measure the entropy and fluidity of human movement and determine whether or not the user is a bot based on whether or not the user fits typical random human usage patterns.
I also remember the writer of the post noting that this kind of system would basically stretch the human-unwittingly-answers-CAPTCHA out such that humans would have to do the entire setup process manually instead of just the CAPTCHA, thus defeating the point of automated setup.
Does anyone have this article? I can remember reading it but I can't find it.
Viable Slashdot alternatives: https://pipedot.org/ and http://soylentnews.org/
If you have algorithms to detect human behavior on a web page, you also have algorithms to simulate it. But it would be a little step for better AI, so go ahead.
It seems to me that if a bot can check whether or not a person is "acting" human, then it must follow that the bot knows what rules are involved with "acting human". If it understands this, then there's nothing stopping someone from telling the computer to obey those rules itself, which means "AI". The main problem with Artificial Intelligence is that we don't have a complete and fully accurate list of rules for what a human can/will do - in other words, we're unpredictable. And it's not like we can't have computers act unpredictably, it's just that we don't know how to make them act unpredictably in the same way a human would act unpredictably.
So, in other words, even if someone could make this test, it would render itself redundant by design..
Commodore64_love: I don't comprehend people who're so frightened of death that they'll bankrupt themselves to stay alive
if there was a way for a computer to determine that the behaviour is human, wouldnt the computer be able to do it anyways?? but what about tricks like telling a user to leave a particular field blank and filling it on the next page instead this field could be indicated by a captcha which contains a URL on opening the URL you get another captcha which has a number, u leave that numbered field empty if the 2nd captcha is entered wrong, then you have to repeat the process from the beginning and fill in 2 captchas on the 2nd page and so on this way most humans would be able to do it in 1-2 attempts, but bots doing it the hit and trial way would be stuck with 1000's of different captchas also, having a central database of all the types of captchas and mixing displaying 2-3 different types would be effective as bots are designed for one type of captcha only, arent they
This system could also reproduce human interactions. So it's only time until this behavourial approach stops working.
BTW: I don't want slashdot to check how I scroll the page, nor is my typing and retyping business of anybody but me. Imagine you can't comment anywhere because you block Google Analytics.
doesn't that just mean a computer can also feed the correct data in, defeating it?
Anyway, the little tests these days are stupid and annoying, and perhaps for some people, getting impossible to do. Perhaps instead of the test being administered at the point of registration, new accounts at places should be automatically monitored for type of activity.
For instance, if the first post at a forum has any links to blacklisted ad sites (could be EasyList USA, whatever), it's probably safe to just kick it out automatically. And things of that nature. Or just the old sign up with a credit card and charge onetime $0.41 trick (or whatever to just cover min fees) to keep bots out of the community's hair.
I'm sure other solutions will have the old How-To-Fix-Email-response "Yes, but your idea won't work because (Mark random amount of 100 checkboxes)"
I can see it now: "have you tried moving your mouse around randomly?", "how about clicking on a few different parts of the page then making coffee?", "still not working? Try slamming the mouse down several times", "okay, as a last resort click on the tabloid pop-up."
The tricky part of the an alternative solution seems to be modelling human behaviour - in order to detect if something is human or not your need to have a pretty good model of what humans do. I suspect there would be a lot of variation in the sort of way people interact, if I'm feeling sleepy I would present a very different profile of use to when I'm on task and in flow. A program to do this will probably have to be statistical in nature with some sort of confidence intervals of humanness. Maybe it will need some Cluster Analysis. This all makes for some pretty hard code and I'm not convinced the difference between two humans will be smaller than the difference between human and bot.
There are four sorts of people in the world: fools, lunatics, idiots and morons. - Umberto Eco, Foucaut's pendulum.
You mean I didn't need a new pair of glasses every time I couldn't read on of those CAPTCHAs? I want my money back.
Great. I can just see myself a year from now, getting banned from a website for acting "too much like a robot".
Honeypots are a satisfying solution. Offer actions that the bots will respond to, but that a human would never take.
Seems some things should be easy. There's a certain minimum amount of time that it takes a human to tab from one field to another as they fill in data, even if they're pasting info in. Even just slowing down bots to the speed that a human could reasonably do a task would put a dent in the problem =\
"I Don't Have Enough Faith to be an Atheist"
It's a lot tougher do define what a human is than it may seem on the surface, and the difference between man and machine will, by definition become more and more blurred until there is no effective difference.
It's an idea that I've become familiar with esp. aftre reading 'The Singularity is Near' by Ray Kurzweil. As our technology advances, we'll find that our capabilies beyond our technolgy will diminish. Machines have long ago surpassed our running speed (cars/planes/trains) and our ability to farm/grow food (tractors) and our ability to hurl object (guns) and swim (boats) but we've always had the ability to out-think our machines.
Increasingly, this isn't true.
We've already shown that SPAM filters are good enough to be more accurate than the people who read the messages. Machines have long been better than people for math-related stuff, keeping track of stuff, and the like, but now we're getting close to the threshhold for image processing and character recognition. It's already true for voice recognition. Captcha is, therefore, doomed to fall eventually as we approach the singularity, and is already pretty weakened. The next question is, therefore simple: what does it mean to be human?
Remember Lt. Commander Data on Star Trek, trying to be human? It's quaint largely because he/it was a minority on he show, but in reality the machine will outnumber us by a wide margin - they already do!
So what does it mean to be human?
If you have a prosthetic leg, are you still human?
If the leg has a CPU in it, are you still human?
If the CPU is more powerful than your mind, are you still human?
If the chip is wired into your mind, are you still human?
If you use the CPU as though it were part of your mind, are you still human?
If you have transferred modt of your thinking to the CPU, are you still human?
If you transferred all your thinking to the CPU and rarely use your 'wet' brain, are you still human?
If you find th
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Capture those "random" interactions of people with some page of your own (or where you can inject script), replay on target.
A system that can determine whether or not a user is human would have built-in characteristics as to what a human would do in such a situation. What's keeping someone from taking that same algorithm and adapting it for means other than their intended purpose?
If a machine knows what to do, another machine can take advantage of that.
Obligatory: import skynet; blah
The game.
If the judge of the test is a computer, then the test will always be passable by a computer.
Everyone has been focusing on the how easy/difficult it would be to reverse this hypothetical algorithm that would determine based on your use of a webpage if you're human or not... ...I see a more fundamental problem. This is on the internet, so they have basically 3 options on how to implement this.
1) server side. The only variable you could track is time between page requests. Don't see how that could possibly be enough information
2) Client side JS. Simple, just modify the JS to return &isHuman=true
3) Client side JS acting as a keylogger, sending back for server side verification. Harder to defeat, but you'll lose my business, the business of all of my friends, and have a horde of angry nerds picket your offices.
Also, this doesn't take into account any edge cases, for example if I've already been to your site, surf straight to /contact.html and paste in a email I previously wrote in Word(err, excuse me, OOo)
The opinions in this post are ficticious. Any similarity to actual opinions, real or imagined, is purely coincidental.
Problem solved. How hard is it to record human mouse and keyboard input and then play it back to "break" the security. Not very. How many seconds did they actually spend thinking about this awful scheme?
I think there might be so much variation in human usage patterns, who all need to be accepted by the algorithm, that it should make it easy to simulate a behaviour that stays within those bounds.
On the other hand, if the algorithm doesn't allow much derivation, it will annoy a lot of people, who get falsely detected as bots. It might hit handicapped people or old people first then.
Just use Javascript, watch for either some mouse movements or onBlur/onFocus.. and if those are present, then isHuman will == 1, and you pass that to the server side. Actually, you'll want to have some obscure variable name to make it less obvious.
The problem with a lot of sites dealing with spam is that they are using the same software that tries to solve everything at the top. Uniformity doesn't help.
But leaving people to their own devices to create or adapt their own forum/blogging/wiki software is not a good solution either. Uncoordinated diversity leaves a lot of people to fend for themselves.
Having unity-in-diversity (a common strength across systems and organisms), however, might well solve the problem.
If forum/blogging/wiki software creators would give sites the opportunity to make (and be able to change) their own set of question and answers for first-time-users (and not trouble them after that), I think bots would be hard-pressed to be programmed to interpret all such site-specific questions on their own. If bots could actually be programmed to intelligently answer all such human language questions, I think the bot-makers could be making a lot more dough in legitimate business...
It takes a human to know one.
The idea that behavior was a better judge of identity than "biometrics" is old old. I wish I could remember the name of the program, but there was a Gnu / Unix utility that measured word frequency, letter frequency, the amount of delay between pressing any two letter combinations on the keyboard, and more... all put together to verify identity. And it worked quite well. I think that program is close to 20 years old.
Biometrics fails for the same reason it always has... as soon as someone comes up with a halfway reliable way to identify somebody, others come up with a fairly reliable way to fake the system. But micro-delays on the keyboard, etc. make for a pretty individual signature.
Simple enough
``Whoever is against captcha (or claims that it has been broken) is someone who would like the web to be something like facebook where every user has a login-id on their database''.
and at the same time is very pissed off because the captcha breaking programs are not really working.
follow the links to the profit...
Wouldn't the ability to collect biometric information require a fairly potent piece of spyware to be loaded on the client system? How would a user, or even a security professional, easily tell the difference between a keylogger that reads our actual strokes, and one that is just timing the key presses?
Sounds like a kernel mode device that would have be part of the input drivers. It's an attack surface, IMO. I would think it's safer to have an separate input device for biometric authentication only than attempt to biometric metadata from highly sensitive input devs like keyboards and mice.
I did enjoy the 'honeypot field' example (in TFA). I suspect it is probably easily defeated, unfortunately. If the field is hidden on the page, can't we write a bot to detect that physical fact, or any source code (javascript?) that hides it. How do you obfuscate something like that without serving it with the page?
Sounds to me like CAPTCHA still wins. Oh well, I didn't expect much. ;^)
--
Toro
It seems like the old Spam Karma module for Wordpress did this. It calculated how long they were on the page vs. how much they had typed, how fast they typed, and a bunch of other factors before it ever hit a captcha. Back when I used wordpress I remember being it pretty accurate too.
or else!
Sinply show 2 pictures of women and ask which one is hotter. Make sure one is ugly and the other fuckable.
Think of every behavior as a voice recording, record and replay ! And there you go bots are able to mimic.
Measuring micro-delays is just another way of authentication based on something you are (as opposed to something you have and something you know). Just another form of biometrics, with similar pitfalls, as others have already pointed out.
Captcha's etc won't work perfectly. Ever. There are always bot(net)s that are able to defeat them. If you use software to make the lettering difficult to read, you can still write software to read it. Like the algorithms, we detect the order in the chaos..
So let's just face it:
The internets needs a unified authentication system if we are to kill spam. If there was a unified authentication system, you would't need to store your passwords around the internet, and your mails would be tracable to you.
So, let those who need anonymity create their own solutions for interacting anonymously.
Stop the brainwash
The article did have links to some interesting topics, such as google experimenting with image orientation as a test. The premise of using how a user interacts with a page is deeply flawed though. There's not even a need for an algorithm or program to 'figure out' the captcha, just record how an actual user interacts once and you can send the same exact thing every time to pass the test. The reason this works is because the 'question' doesn't change. This would be like showing the same text captcha every time. If they ignore identical values being sent, the values can just be fudged a bit.
When I posted question to the Turbo Tax community forum it asked a simple question as a CAPTCHA. Seems like an easy enough solution, and it changes each time to foil a persistent brute force attack.
Of course I'm sure it's only a matter of time before someone has an algorithm smart enought to answer questions. And I suppose that a botnet with enought time would work too. Still an interesting approah I thought.
The user's local behavior before form submission is detectable only via a client-side script. There are therefore two ways this can go.
1.) You maintain accessibility standards and make the client-side script optional. The effectiveness of this approach is comparable to xkcd's "When Littlefoot's mother died in /Land before Time/, did you feel sad? (Bots: NO LYING!)
2.) You require client-side script execution in order to submit the form. The effect is a lot of pissed-off users with NoScript or non-compatible Javascript interpreters (IE or the rest, depending on which one you support).
This idea is basically like visual captchas, but instead of the visually impaired, you're screwing everyone without Javascript.
There is one aspect of user behavior that can be detected, however, and that is the time passed between the user requesting the form and submitting it. From an AI perspective, humans spend an eternity typing, so setting a minimum delay between request and submission will slow the bot right down - especially with a flood control that requires a delay before submitting the next form. Slashdot does both of these things already, by the way.
Some time ago I already noticed that Google Groups has implemented a bot detection based on behaviour.
However, often when I browse through a google group in an efficient way, google thinks I'm a bot and blocks me for quite a while. The only way around is to work inefficiently on purpose, by making my clicks as rondom as possible with as random as possible time intervals. This costs me at least five times as much time as it would cost me the efficient way.
This is very annoying, so I think it would be better for them to ditch the behaviour detection and just rely on properly designed captcha's.
The captcha is entered into a field and submitted to the web server. However our random highlights, backspacing, scrolling etc. all happens in the browser on our system. The web server (thank ______ ) doesn't know about any of that, it just sees the end result. So it doesn't have access to any of that data, to make any kinda of determination. Currently only malware would be collecting this data and sending it somewhere. So the proposal here is to be human verified by malware.
There are other flaws that others have pointed out.
Think Deeply.
First, ask yourself this simple question: Is CAPTCHA popular because no one has thought of anything else, like the alternatives in the article? I doubt it. I'd suggest that CAPTCHA is popular because it is a better solution than those simple alternatives. The only criticism I hear of CAPTCHA in all this debate is that it is inconvenient. The other solutions, while perhaps more convenient for the user, do not solve the problem of sorting bots from humans nearly as well.
To drive this point home, consider the simple fact that CAPTCHA is so effective at sorting out bots from humans that the spammers have taken to paying humans to solve them. Could any of these proposed alternatives be more effective? How will you sort out the humans-paid-by-spammers from the rest of the humans? And if your alternative is no more effective than CAPTCHA, just more convenient, then you have made the humans-paid-by-spammemrs' jobs easier.
Second, I propose a REAL criticizm of CAPTCHA: accessibility. I don't mind that CAPTCHA is inconvenient for 999 out of every 1000 people. I mind that CAPTCHA is impossible for 1 out of every 1000 people. CAPTCHA doesn't just sort bots from humans, it is stronger than that. CAPTCHA sorts fully functioning and healthy humans from everything else, including handicapped humans. Yes, CAPTCHA puts people with disabilities into the bot category, and that is the REAL reason we should move on from CAPTCHA.
Generate a textual representation of user's action on the site, including also timing between clicks, scrolls and so on (but not just as plain numbers, use some words to *describe* relationship between time of actions).
Whenever user posts content, feed the report, perhaps including also the post, to a spam filter (like CRM114?), to check whether the description matches human, or mechanical behavior. Train the filter on posts it got wrong.
The tricky part is how to describe the action in a meaningful way.
Not really seeing a difference between behavior and ability.
Any action that you perform is behavior, and, obviously, if you perform an action you are also capable of performing it. A behavior is therefore an ability. Any algorithm that tries to distinguish between human behavior and computer behavior is still a reverse Turing test.
Given that, testing the quirky way humans navigate through the web is arguably even flimsier test than the captcha. There is a certain degree of randomness, but nothing that rand() can't imitate to fool what would have to be an algorithm based, somewhat, on measuring randomness with a limit to its sensitivity so that false positives can be reduced to a reasonable level.
Can Slate stop writing articles about shit it doesn't know about?
Like many laws intended to prevent undesirable behavior (e.g. gun control, sale of illegal drugs, etc.), CAPTCHAs only block the casual (law abiding) user. It is regrettable that there are so many stupid people is this world. If stupid people didn't respond and make SPAM profitable, SPAM would have died out years ago. Too bad we can't outlaw stupid people!!
Some notes in no particular order. . .
1. I kind of like winning the Turing Test. It makes me feel human. Some days, before the coffee kicks in, this is a plus.
2. It's funny when I can't read the secret warped word. It throws me in existential questioning for about half a second.
3. I like the new idea of having to describe a randomly rotated 3D image. That's a cool system which I'd like to see implemented, though I can't imagine it will be very long before it too is solved.
4. I find it funny that proving one's "Human-ness" is easier to do with a basic kindergarten reading or shape-recognition test than with the old Star Trek method of demonstrating an understanding of Love or being able to write an opera or such. --Especially since you can fairly easily program a computer to compose random Haiku.
5. An interesting test would be to write a short paragraph and ask the potential human how they feel about it. You could probably weed out trolls as well as computers that way. Or potentially learn something disturbing about the head-space of the webmaster and/or feel like a total outsider when you fail at multiple choice emotions.
6. Whatever the case, I think it's pretty sci-fi that we've gotten to the point where major effort is being spent to out-smart AI's. William Gibson and Niel Stephenson keep getting closer to having described our Now.
-FL
Can Slate stop writing articles about shit it doesn't know about?
Right.
First, most of the things Slate suggests have been tried. Timing human input behavior is in use already, and attacks already do some randomization there.
Second, despite what the Slate article quotes, the CAPTCHA for Gmail has been cracked. The success rate is only 20%, but because the cracker is embedded in a botnet, that's good enough to survive IP blacklisting. MessageLabs says Gmail spam went from 1.3 percent of all spam e-mail in January to 2.6 percent in February.
All the proposed tasks - recognizing people, cats vs dogs, etc. - can be done by computers at the 20% accuracy level or better. So that's not going to work.
ReCAPTCHA isn't very good in practice. You get two words, one of which was recognized by an OCR program and one of which wasn't. You only have to re-recognize the one which some OCR program already got to pass the CAPTCHA. If you can do that, you have a 50% chance of success.
Then there are the outsourcing services. "We are 35 seater call center located in Hyderabad, we would be interested." The going rate is US$0.001 to US$0.003 per CAPTCHA solved successfully. There are always ads on GetAFreelancer for CAPTCHA solving. Read Black Hat World for sources.
What stops someone from recording a human looking at the page, and then replaying that behavior from a bot?
Also, will humans actually want to send the information needed for this to remote websites? I don't really want a website to know what part of the page I'm looking at.
Regardless of the Turning Test aspects of this, forms are filled on the client. This hypothetical algorithm would also be running on the client. The server can't trust any "Yes this is a human" that comes from the client. So even if you could make this algorithm it would not solve the intended problem.
They hadn't pointed it out by the time I posted this.
Nevertheless, microdelays and such are not "biometrics". They are behavior-based. The fact is, though, that people generally find this kind of behavior-based approach harder to fake or mask than actual biometrics. That is where the difference lies: in the difficulty of obscuring who you are.
This whole thing is a moving target.
Anything your algorithm can do, my algorithm can do too.
Might work for a while, though, but then again, so did CAPTCHAs.
Wait, did I just say "so did CAPTCHAs"? What I meant was, so are CAPTCHAs, because everyone is still using them, even though they don't work.
Which is the real problem ... not only is the whole thing a moving target, but tackling the problem only works when everyone actually moves.
Remember, it's measure --> countermeasure.
All this really means is now everyone gets to live like we really are in a 1960's spy movie. Sure hope that's what everyone wanted.
A bot will crack it regardless. There is nothing that can be done, other than remove the bot creators.
Honeypots are the Answer! You simply have pages and options which are just distasteful to humans, the reasons for which are not comprehensible to machines! The machines will give themselves away because they cannot distinguish the distasteful options.
Example: A page of Markov-chain nonsense in an otherwise informative website.
This page would be generated using the same technology that spammers use to get past spam filters. Only a real human being or an AI that can achieve some sort of comprehension will be able to tell that it's full of nonsense. Programs that are trying to simulate human browsing behavior will "dwell" on this page, even though it's junk, and give themselves away.
I think this sort of "spam inoculation" can be done in a way that it doesn't detract too much from the website's quality as a whole, much as vaccines incorporate bits of pathogens without harming the patient.
Though the CAPCHA problem is interesting, I think we will se other ways to skip these showstoppers in near future.
Sooner or later google or somebody else will provide a service that will return information on the likeliness that you are human and that your account has not been taken over by malware. Perhaps a kind of an expanded OpenID which may return information on your behavior on several other websites or in the physical world.
Certain actions could provide "human" credits (or some similar or detailed concept)
Go ahead make a list of actions that will make it unfeasible for anybody to automate.
Once you have your credits you may use your ID to bypass captchas. Im sure there are clever ways of solving issues about being anonymous when using the ID.
Thinking about new ways of designing exotic captcha puzzles is just plain waste of time
20 or more of the top-level posts on this page are all "Well yeah, but if a computer can test it, then a computer can emulate it." I'd ask if anybody bothered reading other comments before they posted, but I already know the answer (this /is/ slashdot after all).
On to the topic at hand: this is impractical for another, less complex reason. From what I've been seeing, most of the "bot" registrations these days are not bots, they're people. If those who wish to can pay someone a couple dollars a day to spam registrations and comments, there's really not much defense against it.
In fact, there are many, so-called, one-way (correct terminology?) algorithms.
Background: I'm doing my phd in crypto. I use terms like one-way function (and one-way {,trapdoor} permutation.
If user clicks on boobies popup ad then user = human
The proverbial elephant in this metaphorical room is the "illicit solving by humans". Allow me to clarify this problem, and maybe a solution will be apparent. Specifically, spam barons are paying underprivileged, non-English-literate folks to crack CAPTCHAs in order to spam an English-literate audience.
Why not design a CAPTCHA that requires English literacy to solve? Ok, that's a rhetorical question: it looks like peoplesign.com is at least trying to do just that. They provide a free 3-D CAPTCHA service that challenges a visitor to pick the correct English phrase for 3 curiously colored pictures of familiar objects. When I tried it one of my pictures was "animal shape on a sign" and it was a picture of just that.
On their FAQ they claim they can generate their images and labels faster than their CAPTCHA service can spend them. If that's true then they may have something special.
I have tried the keystroke dynamics authentication systems for example, and my personal opinion is that they don't work. In my opinion, if one human can implement a solution - another human will be able to implement a bot to bypass it. The only way you will be able to defeat bots is to create something that constantly permutates and advances, making development of bots that can defeat it in its current form if not impossible, then at least inefficient. Anything more permanent, will eventually be defeated as we can see on example of CAPTCHA, DVD and BluRay (the latter actually might have something going for it).
Bow before me, for I am root.