Building a Better CAPTCHA
jcatcw writes "Steven J. Vaughan-Nichols reports that CAPTCHA cracking isn't that difficult these days. It has even become a business. For example, DeCaptcher.com will solve CAPTCHAs for your spamming needs at a rate of $2 per 1,000 successfully cracked CAPTCHAs. In response, newer systems are in development. Both Carnegie Mellon and Penn State (is there something about the water in PA?) are working on image-based systems. ESP-PIX and SQ-PIX both require the viewer to interpret pictures. Imagination CAPTCHA from Penn has the user find the center of an image. The idea is that humans are better at image recognition that computers, but humans can legitimately disagree on their interpretations and some humans are color blind. Problems remain. For now, sites would be well advised to look at reCAPTCHA — the system that works with Google Books and the Internet Archive to digitize printed texts — which comes with a wide variety of application and programming plug-ins and an open API."
I know _I_ often have trouble seeing those... Maybe some sort of an animated .gif would be better?
I speak for everyone. Captchas SUCK.
Get rid of them.
C.A.P.T.C.H.A - Completely Automated Public Turing test to tell Computers and Humans Apart.
This is a dying technology.
1) Computers and synthetic systems in general are ONLY going to get better at doing anything a human can do. I mean anything.
2) Humans are a substitute for our lack of a synthetic system to solve a CAPTCHA.
A CAPTCHA has two answers to it's owner. This is a Human and this is a Computer. Humans can be hired to solve CAPTCHA at economically viable rates to meet the demand with a supply. Computers are catching up at being able to solve various CAPTCHAs creating an "arms race" between developers and those that need to crack CAPTCHA automatically with high throughput.
The window for this technology to be effective in its use is shrinking rapidly and it will only be a matter of time before it is nearly impossible to tell without phsyical inspection what is a synthetic human reponse and an actual one.
Even if they had a perfect system that could tell a person from a computer, how can they prevent a CAPTCHA for porn system?
(You make a website offering porn for entering the solution to a CAPTCHA from a 2nd site, and then use that solution on that 2nd site)
If I have nothing to hide, don't search me
Instead of one little captcha at the end of a web form, the whole site will be a captcha.
All the form labels will be jumbled images, and there will be 9 form submit buttons, 8 with dogs and 1 with a cat.
All textual content can be a mangled image to stop scrapers as a bonus.
Oh and please don't actually build this.
Let me tell you a little secret about the water here in Pittsburgh...
Please decode the text in the image below to continue reading this comment.
5t33L3r5 t4k3 C4rd1n4l5 1713
Colin Dean Go a year without DRM
I thought the ideal captcha would be worded questions presented in the same image-like format as current captchas, e.g. "Two and Two makes?" or "The opposite of day is..?" Whilst the image recognition is now feasible, making a general system to solve this problem would be somewhat more difficult than just improved single-word captchas.
Annoyingly, however, the system to create such captchas cannot really be automated (in terms of creating the questions). So I suppose as long as the captchas are computer created / can be made automatically, they will also be computer crackable/solvable
As the summary notes, reCAPTCHA uses text that has already failed a text-recognition process and helps digitize books. Why go to the effort of creating a custom CAPTCHA when there's already one that's not broken *and* does something useful?
Any CAPTCHA system can easily be cracked by building a large database with the inputs and outputs that was actually solved by humans and then saved into the database for lookup later. The inputs don't need to be text, they can contain images ( or hash codes representing images ), or css or whatever is needed to define the input data. The only feasable way to stop this kind of caching of answers is to have no duplicate tests. For example, a large field of randomly colored circles that all vary in size and position and move slowly around, then tell the user to hover the mouse over the largest blue circle and then next have them move the mouse over the green triangle, etc. Then base their "pass or fail" on how well they could move the mouse fast enough. And change the test often, like, put the mouse over the shape that looks like a bunny etc.
Have the text/image animated, each frame by itself doesn't contain all the information needed to decipher the text/image.
Interlaced CAPTCHA's is the thing!
So how about a system of paying captcha-creators $2/1000 captchas created? ;)
On a serious note, though, it seems that general knowledge is a better way to do it than simple word recognition...
Or, on the more imaginative side, what about classical music recognition. I don't know how good computers are at analyzing not just "Beethoven's 5th" but analyzing it amidst numerous recordings which all would have very significantly different waveforms. Unfortunately, music is neither universal (it'd have t obe country specific I suppose) nor quite as close to infinite in possibilities as word or image based captchas...
Will this detect Cylons?
No one could ever predict that it would be spammers and porn merchants who would solve the hardest problems in AI.
We could use national celebrities or historic figures instead of text CAPTCHAs. Say you wanted to make a new gmail account and your IP looks like it comes from the US, Google could make you identify either Coolio, Benjamin Franklin, or Evel Knievel before you proceed.
If you didn't come to party don't bother knocking on my door. Prince '1999'
I wish I was FUCKING DEAD!
Necrophile?
Enough with the annoying captcha's stop comment spam by just analyzing the content.
Free and works well:
http://defensio.com/
I really hate image-based CAPTCHAS, because they discriminate against lynx users. I seriously remember at least one occasion where I was using lynx for whatever obscure reason, and I came upon "enter the text shown in the box at the left". Fail. I like the math problem ones better.
Ok, I will happily admit that I know bugger all about cracking CAPTCHAs, but one thing I have noticed is that most sites use their own version of a CAPTCHA, probably to make it harder to crack.
This must mean that sites are specifically targeted by the crackers, specific routines are probably made to maximise the chances of a successful "crack" against that site. So rather than just making them harder and more obscure (Thus making them harder for humans to read), why not just vary them by a great deal?
If an algorithm has a 50% chance at cracking any given CAPTCHA (And 50% is pretty good, as far as I know it's more like 5, 10 or 15% for a "good" crack), but you have 10 variations of CAPTCHAS to crack, then that routine drops from 50% to 5%. A 5% crack only works on 5% out of every 10, so 0.5%. Just by being different, not harder.
And by different, I don't just mean using different colours and symbols, I mean being completely different, but still ultimately simple. Some may be "please input the 5 characters below", others may be "click on the kitty", another one might be "pick the blue pill", it doesn't have to be complicated, just varied. Better yet, vary the possible algorithms that you can use in any given period, rotate them say every 15 or 20mins, making life much harder for them to detect which particular algorithms are in use at any given time (so for example, have about 20 or 30 algorithms, but only use 10 at any given point, then randomly pick 10 new ones after so long).
Then again, maybe I'm talking out of my rear end, but it makes sense to me. Perhaps someone with more foresight could tell me why that wouldn't work?
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
The idea is that humans are better at image recognition that computers, but humans can legitimately disagree on their interpretations and some humans are color blind.
COLOR blind? Some humans are BLIND blind. Others have various vision or vision processing impairments that would make meatware-visual-coprocessor-test CAPTCHAs reject them.
IMHO most CAPTCHAs are already and obviously violating of the Americans with Disabilities Act. So now, in the info-war between weapons and armor (which weapons always win anyhow), even more of us less-than-Aryan-Supermen become collateral damage.
Dogs are (allegedly) color blind and "... on the Internet nobody can tell you're a dog!". Well, maybe PEOPLE can't. But now the web applications can. B-(
The solution to being attacked by better weapons is not better armor. That's only a stopgap. The solution is to hunt down those who misuse weapons and make them incapable of or unwilling to continue.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
...even though CraigsList uses reCAPTCHA and the article talks about a utility that helps spammers automatically post on CL.
Besides, it's fairly easy to set up a Mechanical Turk HIT for users to solve CAPTCHAs for a penny a piece. Assuming you make more than a penny per captcha solved, you're set. If not, make someone successfully solve more than one CAPTCHA per HIT submission.
I claim first use of "Error No. 0B" - or "No. 0B error." It'll be the new ID 10T!
and have their sites taken down. As long as (hosting providers are allowed to harbour spammers (yes, USA, I look at you), and nobody gives a big F visitors and site owners pay the price.
Filtering DOES NOT work. Did it stop email spam? No, see: spam year. What did? Kicking McColo off the Internet. And McColo is not alone in providing services to spammers (Netvision.net.il I look at you).
Ok so I read the article...
The article focuses on OCR as the main problem. CAPTCHA can be broken by OCR, so reCAPTCHA uses text that OCR has already had trouble reading. Ok got it.
So why are they stuck on ASCII characters? Why not use obfuscated animal pictures? "Type one word that best describes the picture above." Answer: Zebra (Moose, Dog, whatever)
Why do they keep putting the right answer in the CAPTCHA? How about obfuscating "__ cups in a pint?" or "A Bakers Dozen is __".
I'm no CMU whiz, but it seems to me that if the problem is OCR then stop putting the correct answer in ASCII characters right in the CAPTCHA.
It's not necessary to make them impossible to crack, it's only necessary to make it too economically infeasible for spammers to bother.
Operator, give me the number for 911!
Jesus Christ. If they make CAPTCHA's any more difficult I'm going to be effectively banned from the internet. I'm sure I'm not the only one.
"Bread and Circuses is the cancer of democracy, the fatal disease for which there is no cure." --Robert Heinlien
DeCaptcher CAPTCHA solving is processed by humans. So the accuracy is way more better than an automated capctha solver ones.
How will a different format solve anything?
Captchas aside, aren't there other ways of preventing bots from registering multiple accounts? Instead of focusing on humans, how about focusing on the behavior of the bots. Do they change their IP address every time? Do they fill forms faster than humanly possible? Does any human register more than one account on your site? Do they enter random text or put in URLs where they shouldn't?
I still do not see any attempts to weed out the bots.
http://xkcd.com/233/ The real question is: What can humans do that computers cannot? The only problem with "which of these images is George W Bush?"-type tests is that spammers could easily use a database and just compare an image against a photo database. Granted, it wouldn't be as easy as regular CAPTCHAs, but it's still easy enough to crack.
I have seen a number of CAPTCHAs that include a link to a wave file containing the word. If you're blind, you download the sound bit and listen instead.
hate it. hate it hate it hate it.
I have to set up gmail accounts periodically for users here and it takes me some fighting every time to make the account. The "wheelchair" icon makes it read it to you, and the idea of course is in case you are having problems with the picture you can listen to it. But it's like trying to make out what your friend is saying to you from the other end of a dance floor. I have yet to figure out what they're saying by the recording.
And if you miss the captcha too many times, it stops letting your IP address try for awhile. Woooonderful.
I work for the Department of Redundancy Department.
The ReCAPTCHA website for cracking CAPTCHA's has a CAPTCHA to register for their service.
Give me the frames of such an animation and I can trivially write a program that simulates persistence of vision by smearing the pixels over time, thus making it solvable by a computer.
In the long run, CAPTCHAs are doomed.
I stole this sig from someone cleverer than me.
So heres the issue: Computers are getting to the point where they can solve CAPTCHAs better than humans, so why don't we flip the tables? Why not build a CAPTCHA that takes human weaknesses into account? For example, use optical illusions and ask the human what it _appears_ to be doing, not what it actually is doing. A computer would perfectly interpret the illusion and output what it is doing, whereas the human would look at it, be fooled, and say what it appears to be doing.
...Had this been an actual emergency, we would have fled in terror, and you would not have been informed.
How about an audio clip where the user has to identify the nth word of a sentence, or get even more complicated and have the user identify an adverb or something. Not as universal as number or letter sequences, but it could work for web pages that serve a specific language demographic.
The summary mentions a service at decaptcher.com where you can pay $2 per 1000 CAPTCHA's solved. If you visit the site, they make it quite clear that the solving is being done by humans. The technology of the CAPTCHA has not been 'cracked' by this site; the concept of a CAPTCHA itself was proven ineffective. There is no 'more difficult for a computer to figure out' technology that can solve this problem... anything that a legitimate user is able to solve will be able to be solved by the people working at decaptcher... the only thing you might accomplish is making it harder for the people who work there to solve the puzzle, but anything that works in that method will also make it more difficult for an end user. The whole discussion is moot after this.
Comment removed based on user account deletion
The solution to being attacked by better weapons is not better armor. That's only a stopgap. The solution is to hunt down those who misuse weapons and make them incapable of or unwilling to continue.
But, Obama said we were not going to use torture, anymore.
now we need to go OSS in diesel cars
There are some people that are both blind and deaf [gratuitous meme], you insensitive clod.[/meme]
I listen to both RIAA and non-RIAA stuff if I like the music, tangential business/politics nonwithstanding.
I'm not sure how, yet, but I want people to start thinking about it this way.
Just like DRM.
See, with DRM, start with the assumption that all DRM can and will be cracked, and that all software and media can and will be pirated. Your challenge, then, is to make the legitimate product provide at least the quality and value of the pirated copy (something most DRM'd solutions fail miserably at), and ideally make it desirable enough that your price starts to seem reasonable, even when the alternative is "free".
So, the same applies to CAPTCHAs. Start with the assumption that all CAPTCHAs can and will be cracked, even if "cracking" means "using Mechanical Turk and/or a real sweatshop to have humans crack it". Now, start thinking in terms of economics. Build a system which doesn't have sufficiently good payoff for cracking it for anyone to bother -- a system which, by its very nature, can't be spammed.
If you can at least get it to where the only waste is bandwidth and disk space, you're doing pretty good. That's about my current spam situation -- it's a statistical filter which operates on the entire message, but it works incredibly well.
Until then, an automated hack that seems to work well, at least to stop blog spam, is to require AJAX, and send a bit of programmatically generated (but always different) JavaScript, and verify that it was executed. This will stop most automated systems until they start specifically targeting you with embedded Javascript engines. Next: Make it computationally expensive, so that they have to use a botnet if they're to get any real results.
Don't thank God, thank a doctor!
(see http://it.slashdot.org/comments.pl?sid=1102967&cid=26584721)
in all seriousness, being deaf and blind is a small enough corner case overall, even if deafness and blindness aren't always caused independently of one another.
specific statistics are evidently not available in the relevant WP articles. Trying a general Google search:
http://gri.gallaudet.edu/Demographics/deaf-US.php Deafness @ 0.1% to 0.2%-0.4%
http://www.cde.state.co.us/cdesped/SD-Deafblind.asp
Lits deafblindees as 0.003% at birth
I listen to both RIAA and non-RIAA stuff if I like the music, tangential business/politics nonwithstanding.
I can't find the post where it was discussed but codinghorror.com has one CAPTCHA, or a very all set of them and it seems to work.
I just read the blog so I have no idea how heavily the site gets hit, or how much cleanup the author does, but with that one never changing CAPTCHA there isn't any comment spam.
So CAPTCHAs are another example of a classic security trade off, just needs to be enough to get the malicious entities to go somewhere else.
Should be discussed in one of these articles: http://www.google.com/search?hl=en&q=captcha+site%3Acodinghorror.com&btnG=Google+Search&aq=f&oq=
Some humans are BLIND blind. Others have various vision or vision processing impairments that would make meatware-visual-coprocessor-test CAPTCHAs reject them.
IMHO most CAPTCHAs are already and obviously violating of the Americans with Disabilities Act.
If a vision impaired person wants to sign up and explains in an email why he or she cannot solve image based CAPTCHAs, any sysop would surely grant access. If not, that might be an ADA violation. Now if he got thousands of such requests every day...
Fifty years of Yippie! 1968-2018
On a related note, at my forum, I just have a system that doesn't let you post links or images in your first n posts (currently 5). Haven't had a single piece of spam since I put that in. Sure, plenty of fake accounts, but I filter out those with less than 5 posts from the member listing. Comment spammers don't tend to reuse accounts. :)
No matter what type of hard-for-computers-to-crack system is used, it will be vulnerable to the mechanical-turk type service of decapther.com.
I'm seriously tired of all the media articles claiming CAPTCHAs are useless. There is a reason no serious Web site has stopped using them (that includes Slahsdot): if they stopped using CAPTCHAS, all hell would break lose.
Yes, spammers can pay a human to type a few CAPTCHAs for them. But arguing that this implies CAPTCHAs are useless is like arguing that door locks are useless because anybody can hire a locksmith to break them.
Find a way to pay third world people $2 to verify that 1000 website visitors are human (to replace the captchas, not defeat them). Then, it becomes a war of money-attrition: whoever is willing to spend the most money wins.
Some humans are BLIND blind.
I always thought when someone acted like they couldn't see blind people they were just being insensitive clods. I never knew it was an actual condition!
Now that I think about it, I'm pretty sure everything I just said is completely wrong.
Speaking for myself (I do not have 20/20 vision, but with glasses I get by OK), I often have to struggle to read captchas, and I have got to the stage where I will sometimes only persist with that website if they have something I really want. At this point, where captchas are almost easier for machines to read than for us, they become self-defeating, and it is time to find a different means to filter out spammers.
I've seen an idea somewhere on the web, maybe it was linked by slashdot...
Basically, you display some number of icon-sized pictures randomly selected from a larger set. ask the user what they are pictures of. pictures and answers could be stuff like: cat, dog, mouse, house, telephone
i think that system would have a very long life-span. the time needed to crack would depend on the complexity of the pictures and the size of the set. (you're basically creating a pictograph-alphabet) when it looks like it's been cracked, you just change your icon set and the answers to match.
but then again, it seems to me these same principals could be applied to current captcha. (changing image sets and answers when it looks to be cracked).
the only real solutions would require so much big-brother type scenarios that i'd rather have spam.
That oughtn't rule out painless amputation, lobotomy, or castration.
-b
myselfmusic
Wouldn't IPv6 adoption solve this problem? The whole reason that you have to use CAPTCHAS, I thought, was to guard against machine generated registrations. If you have a high number of registrations per IP address, then you could probably rule that out as a bot. But... you can't do that now because of NATs. In an IPv6, un-NATed world, you could. Even more, you could create a world wide database of suspected BOT computers and simply block them altogether. Perhaps if companies doing business online began pushing for IPv6 adoption themselves, the process might be moved along a bit more rapidly.
This is my sig.
I was going to show you how easy that was to crack by submitting those strings to google, but the answers i got was "five" and "a topsy". :(
FRA: STFU GTFO
Then it's a historical artifact.
Seriously the guy has the systemic perception of a really slow thing.
Help stamp out iliturcy.
Since robots don't readily suffer illusions ...
As a failing peculiar to animate visual systems, visual illusions might be used to distinguish humans from "computer bots", or any other artificial intelligence empowered with a visual capacity. Any such entity is unlikely to suffer the same illusions as our own, unless, of course, it has been specifically engineered to do so. This approach inverts, and complements, the logic of the Turing test: not requiring evidence of an intelligent capacity equivalent to that of human beings, but rather that of a characteristic human failing.
Artificial intelligence is the study of how to make real computers act like the ones in the movies.
I kind of look at this from a different point where Captcha is only a tool in a logical sequence of events.
If you have 100% open system and all users need do to post is solve a captcha, you will eventually get spam. (You could solve this with a spam filter and moderation)
But if you force users to have an account, and then solve a captcha. Now you have a access control point. As well as a moderation filter. (Remove the account, Remove all posts)
If you add limits to numbers of posted messages for certain accounts. You limit spam even further as this could be used to eliminate the need for spam filters, as it's easy to track down say 5-15 spam comments from a user than it is 500-5000.
If you further add random time limits and hidden punishments, like being logged out, or read only, or IP banned. Automated scripts can start to be blocked. Surely some will still get around this, with proxies, fake accounts, and cron jobs. But you now have much tighter security.
Add in some modsec2 rules and you might be able to knock out individual scripts.
You could manually approve all user accounts. And pretty much control everything right there. Too many users? Don't whine to me about that. If you have too many users and you can't be bothered to pop your head in as a sanity check, you won't be around long anyway.
I think what I am trying to say here is CAPTCHA is only a tool, and how you use that tool in conjunction with the other tools at your disposal, will determine how well your results will be.
Reading logs and analyzing IP address's can also help cut the crap using iptables.
This is one of the reasons why governments need to invest in Internet detectives. At some point, we need to capture the bad guys, like we do in "real life", instead of trying to find technological solutions that annoy the hell out of everyone.
Language. Not everyone has english as their native tongue. Americans for one.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
>>The solution is to hunt down those who misuse weapons and make them incapable of or unwilling to continue.
Given that those spammers can be in a different country, your alternative solution isn't very feasible: even if you caught all the one who are in countries with anti-spam laws, this would mean only that they would use contry without anti-spam laws as proxy..
And beside in the meantime what are you going to do?
'Penn' is the University of Pennsylvania.
'Penn State' is the Pennsylvania State University and is never called 'Penn'.
Dogs are (allegedly) color blind
For what it's worth: dogs are not color blind. Not even allegedly. However, they don't see colors as vividly as humans do.
I know that KittenAuth is an old idea, but can anyone tell me why isn't this system ideal to replace current captchas?
The whole point of those systems is to differentiate humans from bots.
However, unless you believe we have a soul of some kind, there is simply no way to do that, as we are a machine ourselves.
CAPTCHAs are thus doomed to fail.
Get rid of captchas, and DON'T use reCAPTCHA by any means - the letter/number combo is incoherent, even getting it right I've gotten error messages. It sucks, captchas suck.
If you believe in privacy, and believe you have "nothing to hide" at the same time, you're a goddammed idiot
...most CAPTCHAs are already and obviously violating of the Americans with Disabilities Act...
The main problem here is that a hell of a lot of websites are not within American territory, thus NOT required to follow the Americans with Disabilities Act. Not saying that this shouldn't be fixed, but not a lot of people are that considerate, especially if it means more work for them. I guess if enough people request for the feature, they'd do something about it.
The solution is to hunt down those who misuse weapons and make them incapable of or unwilling to continue.
How do you propose that this be done? There are a lot of different problems to consider:
Of course I understand that we can't let the spammers win, what we have to do is to apply ingenuity and creativity to try solve this problem. Sure, they'll devise ways to circumvent defenses, but they'd attempt to fight head-on, too.
Anyway, our developing defenses and them answering in kind would spur technological evolution.
This is why CAPTCHAS on most major sites also have an audio version.
Hi
I changed a thousands of times used guestbook script so that the URL field says now: "if you are a spammer fill out this:" and of course the bots do that and get dumped.
But even human spammers fill that out! Dumb...
So to make humans not notice that they failed I present, after sumbmitting, their added spam in the secondary spammers only guestbook. If you're not careful you don't notice that your spam did not make it on the real site.
I didn't need to put a captcha on the site to kill the spam.
I'm probably just insensitive, but if I am creating an online service for others to use, I should be under no obligation to make the service usable by every single human being on the planet. So long as you are not paying me to use my website, you should have no right to tell me how to run my service.
I'm providing an OPTIONAL service that NOBODY is being FORCED to use. I can see the need for enforcement of such laws for government websites (this even falls under the category of the disabled paying taxes that support such sites). Other than that, buzz off.
Haha, no shit, my captcha is "retard". How's that for... interesting.
way could be to choose seven figures from a very large pool, then combine them in another that shows as a layered landscape. Then you could ask: "The first layer is 'monday'. Enter the name of the image that corresponds with today".
Randomizing the images, mildly distorting them and hashing the name of the generated landscape can add security.
I think that CAPTCHAS aren't going anywhere, it will be another race like the virus/antivirus writers been keeping on all these years.
Don't the inventors realise that most CAPTCHA's are solved by employing sweat-shop labour in developing countries? Using things like CAPTCHA not only don't help solve the problem of spam, but rather annoy legitimate users.
'He who has to break a thing to find out what it is, has left the path of wisdom.' -- Gandalf to Saruman