A Vision For a World Free of CAPTCHAs
An anonymous reader writes "Slate argues that we're going about verifying humans on the Web all wrong: 'As Alan Turing laid out in the 1950 paper that postulated his test, the goal is to determine whether a computer can behave like a human, not perform tasks that a human can. The reason CAPTCHAs have a term limit is that they measure ability, not behavior. ... the random, circuitous way that people interact with Web pages — the scrolling and highlighting and typing and retyping — would be very difficult for a bot to mimic. A system that could capture the way humans interact with forms algorithmically could eventually relieve humans of the need to prove anything altogether.' Seems smart, if an algorithm could actually do that."
It seems to me that if you can design an algorithm to verify how humans interact with a computer, it should be relatively trivial to engineer an algorithm that mimics this interaction?
Maybe someone smarter than I could clarify?
Assuming you could write an algorithm to determine humanistic behavior, it stands to reason that you could write a bot to fool the initial algorithm.
I remember reading... I can't remember if it was a post about an algorithm already written or a proposal for an algorithm which would run alongside a CAPTCHA through the entire registration process, but the basic premise was just that: measure the entropy and fluidity of human movement and determine whether or not the user is a bot based on whether or not the user fits typical random human usage patterns.
I also remember the writer of the post noting that this kind of system would basically stretch the human-unwittingly-answers-CAPTCHA out such that humans would have to do the entire setup process manually instead of just the CAPTCHA, thus defeating the point of automated setup.
Does anyone have this article? I can remember reading it but I can't find it.
Viable Slashdot alternatives: https://pipedot.org/ and http://soylentnews.org/
if there was a way for a computer to determine that the behaviour is human, wouldnt the computer be able to do it anyways?? but what about tricks like telling a user to leave a particular field blank and filling it on the next page instead this field could be indicated by a captcha which contains a URL on opening the URL you get another captcha which has a number, u leave that numbered field empty if the 2nd captcha is entered wrong, then you have to repeat the process from the beginning and fill in 2 captchas on the 2nd page and so on this way most humans would be able to do it in 1-2 attempts, but bots doing it the hit and trial way would be stuck with 1000's of different captchas also, having a central database of all the types of captchas and mixing displaying 2-3 different types would be effective as bots are designed for one type of captcha only, arent they
This system could also reproduce human interactions. So it's only time until this behavourial approach stops working.
BTW: I don't want slashdot to check how I scroll the page, nor is my typing and retyping business of anybody but me. Imagine you can't comment anywhere because you block Google Analytics.
doesn't that just mean a computer can also feed the correct data in, defeating it?
Anyway, the little tests these days are stupid and annoying, and perhaps for some people, getting impossible to do. Perhaps instead of the test being administered at the point of registration, new accounts at places should be automatically monitored for type of activity.
For instance, if the first post at a forum has any links to blacklisted ad sites (could be EasyList USA, whatever), it's probably safe to just kick it out automatically. And things of that nature. Or just the old sign up with a credit card and charge onetime $0.41 trick (or whatever to just cover min fees) to keep bots out of the community's hair.
I'm sure other solutions will have the old How-To-Fix-Email-response "Yes, but your idea won't work because (Mark random amount of 100 checkboxes)"
I can see it now: "have you tried moving your mouse around randomly?", "how about clicking on a few different parts of the page then making coffee?", "still not working? Try slamming the mouse down several times", "okay, as a last resort click on the tabloid pop-up."
The tricky part of the an alternative solution seems to be modelling human behaviour - in order to detect if something is human or not your need to have a pretty good model of what humans do. I suspect there would be a lot of variation in the sort of way people interact, if I'm feeling sleepy I would present a very different profile of use to when I'm on task and in flow. A program to do this will probably have to be statistical in nature with some sort of confidence intervals of humanness. Maybe it will need some Cluster Analysis. This all makes for some pretty hard code and I'm not convinced the difference between two humans will be smaller than the difference between human and bot.
There are four sorts of people in the world: fools, lunatics, idiots and morons. - Umberto Eco, Foucaut's pendulum.
You mean I didn't need a new pair of glasses every time I couldn't read on of those CAPTCHAs? I want my money back.
Seems some things should be easy. There's a certain minimum amount of time that it takes a human to tab from one field to another as they fill in data, even if they're pasting info in. Even just slowing down bots to the speed that a human could reasonably do a task would put a dent in the problem =\
"I Don't Have Enough Faith to be an Atheist"
It's a lot tougher do define what a human is than it may seem on the surface, and the difference between man and machine will, by definition become more and more blurred until there is no effective difference.
It's an idea that I've become familiar with esp. aftre reading 'The Singularity is Near' by Ray Kurzweil. As our technology advances, we'll find that our capabilies beyond our technolgy will diminish. Machines have long ago surpassed our running speed (cars/planes/trains) and our ability to farm/grow food (tractors) and our ability to hurl object (guns) and swim (boats) but we've always had the ability to out-think our machines.
Increasingly, this isn't true.
We've already shown that SPAM filters are good enough to be more accurate than the people who read the messages. Machines have long been better than people for math-related stuff, keeping track of stuff, and the like, but now we're getting close to the threshhold for image processing and character recognition. It's already true for voice recognition. Captcha is, therefore, doomed to fall eventually as we approach the singularity, and is already pretty weakened. The next question is, therefore simple: what does it mean to be human?
Remember Lt. Commander Data on Star Trek, trying to be human? It's quaint largely because he/it was a minority on he show, but in reality the machine will outnumber us by a wide margin - they already do!
So what does it mean to be human?
If you have a prosthetic leg, are you still human?
If the leg has a CPU in it, are you still human?
If the CPU is more powerful than your mind, are you still human?
If the chip is wired into your mind, are you still human?
If you use the CPU as though it were part of your mind, are you still human?
If you have transferred modt of your thinking to the CPU, are you still human?
If you transferred all your thinking to the CPU and rarely use your 'wet' brain, are you still human?
If you find th
I have no problem with your religion until you decide it's reason to deprive others of the truth.
A system that can determine whether or not a user is human would have built-in characteristics as to what a human would do in such a situation. What's keeping someone from taking that same algorithm and adapting it for means other than their intended purpose?
If a machine knows what to do, another machine can take advantage of that.
Obligatory: import skynet; blah
The game.
If the judge of the test is a computer, then the test will always be passable by a computer.
Everyone has been focusing on the how easy/difficult it would be to reverse this hypothetical algorithm that would determine based on your use of a webpage if you're human or not... ...I see a more fundamental problem. This is on the internet, so they have basically 3 options on how to implement this.
1) server side. The only variable you could track is time between page requests. Don't see how that could possibly be enough information
2) Client side JS. Simple, just modify the JS to return &isHuman=true
3) Client side JS acting as a keylogger, sending back for server side verification. Harder to defeat, but you'll lose my business, the business of all of my friends, and have a horde of angry nerds picket your offices.
Also, this doesn't take into account any edge cases, for example if I've already been to your site, surf straight to /contact.html and paste in a email I previously wrote in Word(err, excuse me, OOo)
The opinions in this post are ficticious. Any similarity to actual opinions, real or imagined, is purely coincidental.
I think there might be so much variation in human usage patterns, who all need to be accepted by the algorithm, that it should make it easy to simulate a behaviour that stays within those bounds.
On the other hand, if the algorithm doesn't allow much derivation, it will annoy a lot of people, who get falsely detected as bots. It might hit handicapped people or old people first then.
Just use Javascript, watch for either some mouse movements or onBlur/onFocus.. and if those are present, then isHuman will == 1, and you pass that to the server side. Actually, you'll want to have some obscure variable name to make it less obvious.
The problem with a lot of sites dealing with spam is that they are using the same software that tries to solve everything at the top. Uniformity doesn't help.
But leaving people to their own devices to create or adapt their own forum/blogging/wiki software is not a good solution either. Uncoordinated diversity leaves a lot of people to fend for themselves.
Having unity-in-diversity (a common strength across systems and organisms), however, might well solve the problem.
If forum/blogging/wiki software creators would give sites the opportunity to make (and be able to change) their own set of question and answers for first-time-users (and not trouble them after that), I think bots would be hard-pressed to be programmed to interpret all such site-specific questions on their own. If bots could actually be programmed to intelligently answer all such human language questions, I think the bot-makers could be making a lot more dough in legitimate business...
The idea that behavior was a better judge of identity than "biometrics" is old old. I wish I could remember the name of the program, but there was a Gnu / Unix utility that measured word frequency, letter frequency, the amount of delay between pressing any two letter combinations on the keyboard, and more... all put together to verify identity. And it worked quite well. I think that program is close to 20 years old.
Biometrics fails for the same reason it always has... as soon as someone comes up with a halfway reliable way to identify somebody, others come up with a fairly reliable way to fake the system. But micro-delays on the keyboard, etc. make for a pretty individual signature.
Simple enough
Wouldn't the ability to collect biometric information require a fairly potent piece of spyware to be loaded on the client system? How would a user, or even a security professional, easily tell the difference between a keylogger that reads our actual strokes, and one that is just timing the key presses?
Sounds like a kernel mode device that would have be part of the input drivers. It's an attack surface, IMO. I would think it's safer to have an separate input device for biometric authentication only than attempt to biometric metadata from highly sensitive input devs like keyboards and mice.
I did enjoy the 'honeypot field' example (in TFA). I suspect it is probably easily defeated, unfortunately. If the field is hidden on the page, can't we write a bot to detect that physical fact, or any source code (javascript?) that hides it. How do you obfuscate something like that without serving it with the page?
Sounds to me like CAPTCHA still wins. Oh well, I didn't expect much. ;^)
--
Toro
It seems like the old Spam Karma module for Wordpress did this. It calculated how long they were on the page vs. how much they had typed, how fast they typed, and a bunch of other factors before it ever hit a captcha. Back when I used wordpress I remember being it pretty accurate too.
or else!
Think of every behavior as a voice recording, record and replay ! And there you go bots are able to mimic.
Captcha's etc won't work perfectly. Ever. There are always bot(net)s that are able to defeat them. If you use software to make the lettering difficult to read, you can still write software to read it. Like the algorithms, we detect the order in the chaos..
So let's just face it:
The internets needs a unified authentication system if we are to kill spam. If there was a unified authentication system, you would't need to store your passwords around the internet, and your mails would be tracable to you.
So, let those who need anonymity create their own solutions for interacting anonymously.
Stop the brainwash
The article did have links to some interesting topics, such as google experimenting with image orientation as a test. The premise of using how a user interacts with a page is deeply flawed though. There's not even a need for an algorithm or program to 'figure out' the captcha, just record how an actual user interacts once and you can send the same exact thing every time to pass the test. The reason this works is because the 'question' doesn't change. This would be like showing the same text captcha every time. If they ignore identical values being sent, the values can just be fudged a bit.
When I posted question to the Turbo Tax community forum it asked a simple question as a CAPTCHA. Seems like an easy enough solution, and it changes each time to foil a persistent brute force attack.
Of course I'm sure it's only a matter of time before someone has an algorithm smart enought to answer questions. And I suppose that a botnet with enought time would work too. Still an interesting approah I thought.
The user's local behavior before form submission is detectable only via a client-side script. There are therefore two ways this can go.
1.) You maintain accessibility standards and make the client-side script optional. The effectiveness of this approach is comparable to xkcd's "When Littlefoot's mother died in /Land before Time/, did you feel sad? (Bots: NO LYING!)
2.) You require client-side script execution in order to submit the form. The effect is a lot of pissed-off users with NoScript or non-compatible Javascript interpreters (IE or the rest, depending on which one you support).
This idea is basically like visual captchas, but instead of the visually impaired, you're screwing everyone without Javascript.
There is one aspect of user behavior that can be detected, however, and that is the time passed between the user requesting the form and submitting it. From an AI perspective, humans spend an eternity typing, so setting a minimum delay between request and submission will slow the bot right down - especially with a flood control that requires a delay before submitting the next form. Slashdot does both of these things already, by the way.
Some time ago I already noticed that Google Groups has implemented a bot detection based on behaviour.
However, often when I browse through a google group in an efficient way, google thinks I'm a bot and blocks me for quite a while. The only way around is to work inefficiently on purpose, by making my clicks as rondom as possible with as random as possible time intervals. This costs me at least five times as much time as it would cost me the efficient way.
This is very annoying, so I think it would be better for them to ditch the behaviour detection and just rely on properly designed captcha's.
The captcha is entered into a field and submitted to the web server. However our random highlights, backspacing, scrolling etc. all happens in the browser on our system. The web server (thank ______ ) doesn't know about any of that, it just sees the end result. So it doesn't have access to any of that data, to make any kinda of determination. Currently only malware would be collecting this data and sending it somewhere. So the proposal here is to be human verified by malware.
There are other flaws that others have pointed out.
Think Deeply.
Generate a textual representation of user's action on the site, including also timing between clicks, scrolls and so on (but not just as plain numbers, use some words to *describe* relationship between time of actions).
Whenever user posts content, feed the report, perhaps including also the post, to a spam filter (like CRM114?), to check whether the description matches human, or mechanical behavior. Train the filter on posts it got wrong.
The tricky part is how to describe the action in a meaningful way.
Not really seeing a difference between behavior and ability.
Any action that you perform is behavior, and, obviously, if you perform an action you are also capable of performing it. A behavior is therefore an ability. Any algorithm that tries to distinguish between human behavior and computer behavior is still a reverse Turing test.
Given that, testing the quirky way humans navigate through the web is arguably even flimsier test than the captcha. There is a certain degree of randomness, but nothing that rand() can't imitate to fool what would have to be an algorithm based, somewhat, on measuring randomness with a limit to its sensitivity so that false positives can be reduced to a reasonable level.
Can Slate stop writing articles about shit it doesn't know about?
This is an insightful post. Too bad it was posted AC. A waste of mod points.
-FL
Like many laws intended to prevent undesirable behavior (e.g. gun control, sale of illegal drugs, etc.), CAPTCHAs only block the casual (law abiding) user. It is regrettable that there are so many stupid people is this world. If stupid people didn't respond and make SPAM profitable, SPAM would have died out years ago. Too bad we can't outlaw stupid people!!
Some notes in no particular order. . .
1. I kind of like winning the Turing Test. It makes me feel human. Some days, before the coffee kicks in, this is a plus.
2. It's funny when I can't read the secret warped word. It throws me in existential questioning for about half a second.
3. I like the new idea of having to describe a randomly rotated 3D image. That's a cool system which I'd like to see implemented, though I can't imagine it will be very long before it too is solved.
4. I find it funny that proving one's "Human-ness" is easier to do with a basic kindergarten reading or shape-recognition test than with the old Star Trek method of demonstrating an understanding of Love or being able to write an opera or such. --Especially since you can fairly easily program a computer to compose random Haiku.
5. An interesting test would be to write a short paragraph and ask the potential human how they feel about it. You could probably weed out trolls as well as computers that way. Or potentially learn something disturbing about the head-space of the webmaster and/or feel like a total outsider when you fail at multiple choice emotions.
6. Whatever the case, I think it's pretty sci-fi that we've gotten to the point where major effort is being spent to out-smart AI's. William Gibson and Niel Stephenson keep getting closer to having described our Now.
-FL
Can Slate stop writing articles about shit it doesn't know about?
Right.
First, most of the things Slate suggests have been tried. Timing human input behavior is in use already, and attacks already do some randomization there.
Second, despite what the Slate article quotes, the CAPTCHA for Gmail has been cracked. The success rate is only 20%, but because the cracker is embedded in a botnet, that's good enough to survive IP blacklisting. MessageLabs says Gmail spam went from 1.3 percent of all spam e-mail in January to 2.6 percent in February.
All the proposed tasks - recognizing people, cats vs dogs, etc. - can be done by computers at the 20% accuracy level or better. So that's not going to work.
ReCAPTCHA isn't very good in practice. You get two words, one of which was recognized by an OCR program and one of which wasn't. You only have to re-recognize the one which some OCR program already got to pass the CAPTCHA. If you can do that, you have a 50% chance of success.
Then there are the outsourcing services. "We are 35 seater call center located in Hyderabad, we would be interested." The going rate is US$0.001 to US$0.003 per CAPTCHA solved successfully. There are always ads on GetAFreelancer for CAPTCHA solving. Read Black Hat World for sources.
What stops someone from recording a human looking at the page, and then replaying that behavior from a bot?
Also, will humans actually want to send the information needed for this to remote websites? I don't really want a website to know what part of the page I'm looking at.
Regardless of the Turning Test aspects of this, forms are filled on the client. This hypothetical algorithm would also be running on the client. The server can't trust any "Yes this is a human" that comes from the client. So even if you could make this algorithm it would not solve the intended problem.
They hadn't pointed it out by the time I posted this.
Nevertheless, microdelays and such are not "biometrics". They are behavior-based. The fact is, though, that people generally find this kind of behavior-based approach harder to fake or mask than actual biometrics. That is where the difference lies: in the difficulty of obscuring who you are.
This whole thing is a moving target.
Anything your algorithm can do, my algorithm can do too.
Might work for a while, though, but then again, so did CAPTCHAs.
Wait, did I just say "so did CAPTCHAs"? What I meant was, so are CAPTCHAs, because everyone is still using them, even though they don't work.
Which is the real problem ... not only is the whole thing a moving target, but tackling the problem only works when everyone actually moves.
Remember, it's measure --> countermeasure.
All this really means is now everyone gets to live like we really are in a 1960's spy movie. Sure hope that's what everyone wanted.
A bot will crack it regardless. There is nothing that can be done, other than remove the bot creators.
Honeypots are the Answer! You simply have pages and options which are just distasteful to humans, the reasons for which are not comprehensible to machines! The machines will give themselves away because they cannot distinguish the distasteful options.
Example: A page of Markov-chain nonsense in an otherwise informative website.
This page would be generated using the same technology that spammers use to get past spam filters. Only a real human being or an AI that can achieve some sort of comprehension will be able to tell that it's full of nonsense. Programs that are trying to simulate human browsing behavior will "dwell" on this page, even though it's junk, and give themselves away.
I think this sort of "spam inoculation" can be done in a way that it doesn't detract too much from the website's quality as a whole, much as vaccines incorporate bits of pathogens without harming the patient.
Though the CAPCHA problem is interesting, I think we will se other ways to skip these showstoppers in near future.
Sooner or later google or somebody else will provide a service that will return information on the likeliness that you are human and that your account has not been taken over by malware. Perhaps a kind of an expanded OpenID which may return information on your behavior on several other websites or in the physical world.
Certain actions could provide "human" credits (or some similar or detailed concept)
Go ahead make a list of actions that will make it unfeasible for anybody to automate.
Once you have your credits you may use your ID to bypass captchas. Im sure there are clever ways of solving issues about being anonymous when using the ID.
Thinking about new ways of designing exotic captcha puzzles is just plain waste of time
20 or more of the top-level posts on this page are all "Well yeah, but if a computer can test it, then a computer can emulate it." I'd ask if anybody bothered reading other comments before they posted, but I already know the answer (this /is/ slashdot after all).
On to the topic at hand: this is impractical for another, less complex reason. From what I've been seeing, most of the "bot" registrations these days are not bots, they're people. If those who wish to can pay someone a couple dollars a day to spam registrations and comments, there's really not much defense against it.
In fact, there are many, so-called, one-way (correct terminology?) algorithms.
Background: I'm doing my phd in crypto. I use terms like one-way function (and one-way {,trapdoor} permutation.
I have tried the keystroke dynamics authentication systems for example, and my personal opinion is that they don't work. In my opinion, if one human can implement a solution - another human will be able to implement a bot to bypass it. The only way you will be able to defeat bots is to create something that constantly permutates and advances, making development of bots that can defeat it in its current form if not impossible, then at least inefficient. Anything more permanent, will eventually be defeated as we can see on example of CAPTCHA, DVD and BluRay (the latter actually might have something going for it).
Bow before me, for I am root.