Now Even Photo CAPTCHAs Have Been Cracked
MoonUnit writes "Technology Review has an interesting article about the way CAPTCHAS are fueling AI research. Following recent news about various textual CAPTCHAs being cracked, the article notes that a researcher at Palo Alto Research Center has now found a way crack photo-based CAPTCHAs too. Most approaches are based on statistical learning, however, so Luis von Ahn (one of the inventors of the CAPTCHA) says it is usually possible to make a CAPTCHA more difficult to break by making a few simple changes."
They're already hard to read. Why do I feel that soon I wont be able to read ANY of them!?
PS: I don't reply to ACs.
I'm sure I read a short story somewhere that featured the spam-bot arms-race triggering the singularity...
To detect humans, wouldn't it be easier and less costly, and perhaps even more effective, to hold a large database of questions that are readable and solvable only by humans?
Asking simple math or site-relevant questions are not only easier for humans (I'm talking about "What's 5 - 3") to read, but they're harder for automated parsing by software to crack.
ilovegeorgebush
Instead of asking someone to type in the letters, numbers or how many cats there are in the photo, just randomly generate some scenario:
"Jim and Sue go to the park on Sunday. Billy the dog goes too."
Then you can ask random questions like:
"What is the name of the dog?"
"What day did they go to the park?"
"Where did they go?"
That might work OK for a while...
Summation 2
...will we learn that, if there's a fundamental flaw in a protocol, there's no way we can prevent it from being abused. every measure will sooner or later have its counterpart and fail.
It it that people just can't be arsed to submit stories, or is there a clique at work here?
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
CAPTCHA is not a security feature. It's a way to help avoid robots pretending to be humans. Anyone using it as a security feature is just giving more reasons for people to find ways to break them.
All in all, it's time to get rid of CAPTCHA and move on to some more logical system that would be more difficult, such as a system where users are asked to answer a simple question that contains the answer, such as:
If you were born in 1973 and JFK was shot in 1961, were you alive when he was shot?
How many liters of water fit into a five-liter bottle?
Oh wait...
Even though the software can recognise the cats 87% of the time, you need to input 12 pictures, so the chance of the attack succeeding drops to 10%.
You could probably make this even harder by putting a cat and a dog in a photo and telling the user to pick photos that ONLY have cats in them.
Summation 2
"...says it is usually possible to make a CAPTCHA more difficult to break by making a few simple changes."
Yes, it's possible: But keep in mind that you also have to serve the USER. When the captcha is getting so hard I can't even decipher it anymore (let alone someone with a visual handicap), it's of no use.
I stopped using Rapidshare because of its ultra annoying 'mark the cats'-captcha: I found it near-impossible to get that right (though the other day I noticed changed that back to ordinary letters).
When you shoot a mime, do you use a silencer?
If humans cannot design a CAPTCHA that computers can't break, but it's trivial to design a CAPTCHA that's easy for computers but impossible for humans to do in the time limit (simple arithmetic with really big numbers), then surely computers are smarter than humans, right?
All of this scientific research has caused one thing... making it potentially easier for spammers to successfully pass thru the captcha checks. Now when do researches finally start on doing the reverse - figuring out a scheme that holds them off?
I mean, fuck the motherfuckers! I hate captchas, and the better they are breaking them the better for me, with some luck we'll stop having these silly things... Really, they are even using captchas as an excuse to force you to enable javascript on sites, not to mention how difficult to read these things are and how much of a waste of time they are...
Copyright infringement is "piracy" in the same way DRM is "consumer rape"
It's probably more like 30-cents in the 3rd world. I don't think it would be possible for even a machine to significantly beat that rate. The energy to "run" a human is roughly comparable to that of a computer running AI-ware. Plus, the cost of the cat-and-mouse AI software adjustments that a human-based approach doesn't need.
One may say that 3rd-world IP addresses can be filtered or better monitored, but its easy to mask such via remoting screen control etc.
Table-ized A.I.
How about asking every nth person successfully logging in to generate a question? Apply a lameness filter and then perhaps ask another randomly chosen user to verify that the question is reasonable. Reject duplicates and questions that too many people can't answer.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
One password and authentication repository for all, handled by a single entity. Or, to paraphrase:
"Nuke the site from orbit. It's the only way to be sure."
Comment removed based on user account deletion
Well, it seems to me that spammers ARE humans. So trying to detect if the creator of the account is human or not doesn't separate the spammers from the non-spammers.
Think about it: the authenticating machines are designed by humans, and the perpetrating machines are also designed by humans, and the legitimate users are humans too.
Perhaps the problem itself needs to be restated: Allow accounts to legitimate users, deny accounts to spammers. Whether or not there is a human involved on either end seems irrelevant.
- Wyck
I think CAPTCHAs should show images from goatse, tubgirl, 2g1c, etc..
Surely the human reaction to these images would be unique.
What we need for fraud-resistant voting and fraud-resistant registration is a national, if not world-wide identity certificate that we can present at the polling booth or interface with our computers for registrations, age checks, and online purchases. Get over the fact that proving who you are is going to result in the downfall of freedom as you know it and accept the fact that this identity card/document will remain under your personal control on when to present it (when you need to positively identify yourself) and when you don't (sorry Officer, but I left it at home because I'm not required by law to carry it at all times). Do you really want some snot-nosed college kid who hasn't paid a dime of taxes in his entire life undoing your vote and dozens of your neighbor's votes because he registered 73 times and now intends to vote for every one of those registrations -- and thinks he's doing a great thing by it?!
Fair elections is the very foundation of a democratic society and everything that preserve One (Wo)Man One Vote Only(!) is a step in the only right direction. It's a shame that voter ID laws only exist in a couple states and look who cries out against them every time. (Clue: people who benefit by massive voter fraud.)
This can be worked out folks and we'll be better for it, whether in actually fair elections, or the decrease in spam and other crapware that captchas and other methods use to try and authenticate users to prevent. Anonymity in all circumstances Is Not a Right. (Neither is Health Care a "Right" as one candidate has very incorrectly proclaimed. Rights are delinated in the Constitution for the United States, and other governing documents in other countries, and free national Health Care is not on that list.) If you have an over the top determination to preserve your anonymity then there are simply some places you cannot go (e.g. legally cross an international boarder) and some things you cannot do (e.g. fly on an airline these days). Once we get over it and realize that a person needs to be able to prove who they are, and that other people and institutions are not out of line in demanding to know who they're dealing with so that they can make the informed decision on whether or not to continue dealing with that person then a lot of the problems, spam, identity theft, terrorism (which thrives on anonymity) will be much reduced to the full benefit of the majority of us who don't actively profit from preying on our fellow humans.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
or Skynet!
(Of course if Skynet can give us intelligent self-willed robots like Cameron, that might not be such a bad thing.)
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
So, why then, don't we think out some learning phases we need to build a really good AI and stepwise implement them as capcha's?
Ofcourse they will be cracked eventually, so why not use the challenge constructively?
Each time a new captcha algorithm is cracked, we could use a next phase and end up with a true AI, in a collaborated effort with "the evil crackers". Each time utilizing an aspect of "human intelligence" which we cannot teach a computer yet, and have someone desperate solve a captcha challenge, solving the problem of emulating a cognitive ability, one at the time?
I think we can keep recursing like this until someone returns 1
African or European water?
If your site has non-English speakers, they are going to have more difficulty grokking the nuance of your challenge than a computer will.
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
How much wood _W_ould a woodchuck chuck if a woodchuck could chuck wood?
Remember Dan Simmons' sci-fi series "Hyperion"? AIs emerged from viruses there. We will likely have AIs emerge from spam-bots. Not a bad guess, I suppose.
Well, I understand that cursive recognition is still weak, so why not solve 2 problems. 1) Use cursive as the captia, which is an easy implimentation and 2) when they finally crack the captia, we'll have a good cursive recognizer.
Wow. I have seen raccoon baculum for $10 to $15. I don't know if giants have a baculum, but I guess it would be worth quite a sum.
Why, without your clothes, you're naked, Miss Dudley!
Ah. So you appreciate Cameron for her intelligence huh?
Me too. Exactly.
(Model T-6969 I think right?)
"Strangers have the best candy" -Me
Maybe we should turn the tables around. Instead of DARPA funding cutting edge pattern recognition for military apps, we could just present their problems as CAPTCHAs.
Find the tank hidden in the photograph to sign up for a new GMail account.
Have gnu, will travel.
Couldn't it be done much the same way I have my email posted on my homepage... break the image file containing the captcha into multiple images. For example, my captcha is 'Starve'... inside a .jpg image 128 pixels across. Couldn't that same image be cut into say... 8 or whatever pieces across, maybe a few down as well if you want. Then you just need to write the html code for the page to have the images lined up. Maybe even put them in a table so all of the IMG SRC tags aren't all right beside eachother.
And while we're at it, surely a script can be written to randomize the filenames of the pieces of the image, and insert them into the .html file in a server side include or something.
So is that idea just easily broken, or why is noone doing that?
Planet Zebeth - Metroid with a twist
Is it going to come down to people needing these programs to read the CAPTCHAs? Is it coming down to a war between computers and computers and the humans are getting in the middle of it? Good God Man!!! :-(
Uncle Mantis
How about putting two pictures of animals next to each other and writing "Which animal in real life is larger?"
It's official. Spam is the new porn.
I would assume that these algorithms are equally adept, with minor tweaking, at identifying pretty much anything that a human could. I'm sure the British and Chinese governments are already planning to deploy said software in the near future.
I have no good reply to that. If you want every idiot to be able to enter your site prepare for spam because most bots can be smarter than the average idiot. If you want no spam train your users or accept that you will have less traffic.
ics
What is the air speed velocity of an unladen swallow?
Why don't they try and use animated GIF's?
Seems like an automated one could not tell which frame was the actual one with a code or it would screw up the parser.
Another possibility would be to use Flash and make you uncover portions with a mouse to see the code?
Show particularly hot or awful pictures, with low std-dev. People agreed, computers have no clue.
Of course, remove obvious pictures with lots of skin area giving artificially high marks .. :)
(And get your lawyers ready, the OrNot's won't like being publicly tagged so on GMail...)
I had a vague idea that I thought to share. Someone with more time please expand on it. Simply get the email spammers to fight against the CAPTCHA breakers! Email spammers are bots that are constantly trying to not be filtered by filter programs and yet still be understandable by humans. CAPTCHA breakers are bots that are constantly trying to not be filtered by filter programs but they are *not* trying to be understandable to any human.
If we use the understandability of email spam as a CAPTCHA that also feeds back to email filters we will, eventually, either eliminate spam or CAPTCHA breakers or come up with some totally ass kicking AI that rules us all.
Maybe this isn't such a good idea after all. ^_^
All your attention are belong to my old internet meme.
Hashcash type solution with Flash or Java where the applet computes a n-bit collision while the user fills the form. JS would be too slow because the spammer would not be using a browser and could compute the collisions much faster.
Rapidshare captchas were defeated by download programs, that's why they abolished them. The cats and dogs were solved within days. I use Cryptload, it circumvents most DDL sites' captchas and redirects. It's just a matter of pasting the links into it and have it download in the background. So captcha solving isn't all about spam, there are some nice applications too.
What I just don't get is:
Luis von Ahn, a computer scientist at Carnegie Mellon University, who helped coin the term CAPTCHA, says that it's not clear that any common CAPTCHAs have been broken by machine attack in the real world. "I don't know of anybody who's thinking of getting rid of the CAPTCHA because it doesn't work," he says.
What galaxy did he move to after inventing them?
It seems to me that we need a system with entirely unpredictable text, that requires minimal admin time for creation and maintenance. Further it needs to direct the user to some action that a human could do perform but that a bot either couldn't understand or would be unable to do.
Here is my idea, someone tell me why this wouldn't work.
In place of the captcha you have instructions directing the user to go to a certain website and copy / paste a certain bit of information into the field. Before the system does this it goes to the page and captures that information.
So it would look something like this:
"Before you can use this system please provide the following information: Go to $designated_website and copy and paste the $requested_information into the following field."
You then randomize $designated_website and $requested_information. It could be any website, including sub page, and any information.
Now a bot could be designed to read your text and try and interpret the results so that it would know where to go and watch to fetch IF your request was simple but if the request was more complicated then it wouldn't know how to respond.
For instance:
Go to news.google.com and give the title of the second story under Top Stories.
Go to http://www.linux.com/articles/feature/ and give me the name of the author of the third article on the left.
Dynamic non-admin generated data, a ruleset that's easy to write, and instructions that are hard to follow for a bot because they change.
What am I missing?
"To submit this form successfully, make yourself look like Scarlett Johansson, and give the webmaster a fantastic blowjob."
Once we have bots that can do that, they can HAVE the internet for all I care.
Just send the confirmation request codes to a mobile phone.
There were ideas to make sending email "expensive", would it be possible to apply this here? Use a calculation that is expensive to solve but where the solution is easy to test, such as factoring a large number. The biggest problem with the scheme is that a solver has to be added to the browser somehow.
A web site that allows a user to post messages would send a random number to be factored as soon as possible. The browser would then work on this in the background, before the user even decides to post something. When the user decides to post something, the form contains a hidden field that the browser fills in with the factored value (the browser would pop up a message if the calculation is not yet done when the user tries to submit the form, and offer the user the ability to wait or cancel the submission). The web site would then immediately send a new random number for the next submission.
Any bot would have to continuously solve these things and thus would not be able to post very fast. Also hopefully the fans will turn on and make lots of noise so the user might get an idea that their machine is infected.
Another idea that would not use up your computer's battery is to have a third-party service that provides a random key, but only after a long delay. This third-party service would refuse to process more than one key at a time per host, so a bot could not do many requests in parallel.
Does any of this sound at all useful or possible?
"hot shit"...
Previously: "Linux... Toward the Sunrise..." Now: "Linux... Toward the-- No, now, part of Every Sunrise"
as in, make it a law that all computers sold from now on must have a genetic sequencer attached to it. Any time you want to open your email, the server will show you a, uh, suggestive jpeg and you, uh, express your, um, genetic material, into the genetic sequencer. Its totally fool proof and pleasurable as well, even if you have someone pointing a gun to your head. Crap...I just realised this won't work for women. Back to the drawing board.
I have an idea. Take an np-complete problem, give it to the user to solve and if they get it right, then cool. Since there is no way for a computer to solve an np-complete problem except by brute force, and some are relatively simple by human standards, shouldn't that be a contender?
For example, the coloring problem. Given a map and three colors, color in all the countries so that no country's color borders a country of the same color
Closed Tour (TOUR). Given n cities and an integer k, is there a tour, of length less than k, of the cities which begins and ends at the same city?
OR
Knapsack. Given n items, each with a weight and a value, and two integers k and m where m less than k, is there a collection of items with total weight less than k, which has a total value greater than m?
OR
Examination Scheduling (EXAM). Given a list of courses, a list of conflicts between them, and an integer k; is there an exam schedule consisting of k dates such that there are no conflicts between courses which have examinations on the same date?
There are lots of problems that a human can just guess and check a couple of times (using some reason) to answer, but a computer brute forcing has to actually spend the time determining the answer through blind guessing and checking, or worst case scenario with some heuristics. Even simple problems can make a computer stand on it's head... Is this not a good solution? This would satisfy the "expensive" suggestion and not require additional user-side software (which can be abused and users tricked into installing for a bad website because Gmail made them do it first).
Okay okay... So it's authentication we need that can only be completed by a human. Why not make a simple template of questions, with random subjects; math, English, very very general knowledge (intelligence overlap), and display the question as a captcha? Even the most advanced bots aren't fully reliable in deciphering the horrible mangled letters, and a full sentence will almost guarantee it will fail some of it. Thus a human can work out the sentence (most of the time), answer the question, and hey presto we're in and the bots aren't.
What about this?
Instead of failing on a failed photo captcha, why not just report a success but dump the registration?
You might get a few false positives but if someone signing up for my site can't tell the difference between a kitten and an elephant, I probably don't want them using it...
The bots would have no genuine way of figure out if the attempt they made was successful unless it kept the data and later compared it with a login. As well when the images are generated you could generate them with a tiny bit of random noise and alter the file size/signature to make comparing different images impossible.
Okay, okay... So what we need is a form of authentication that can be completed by humans, but not computers. Why not create a large template of questions, using different subjects such as math, English and very easy general knowledge (due to the intelligence cross over). What's more, we can have the questions split through a simple randomizing engine. Such as a question made up of three parts. 1. "Jenny has" 2. "a red ball" 3. "at the park". Text question... "Where was she?". Finally these questions although crackable, can be displayed as a CAPTCHA Image. Bots aren't 100% accurate with them, so the chances of it reading the question, let alone answering it is not likely at all. So... Hey presto, Human entry, Bots not included:)
01001001 00100000 01101100 01101111 01110110 01100101 00100000 01001111 01101100 01101001 01110110 01100101 01110011
So,
Research to fight Spam is what creates Skynet?
Porn and cheap pills made of baking powder really are the doom of us?
That's really depressing.
The ultimate solution is to give up, and just have a turing test solution.
Everyone joining has to chat to one other member for a minute, and that member flags them real or bot.
That's why I think we need to move to pictures that reflect scenarios from which future or past scenarios can be inferred by a human. Add to that some use of slang.
For example, show (amongst others) a picture of a gay-looking guy with a Michael Jackson glove in a country-western bar and have the server ask "Click on the picture with a dude about to get his ass kicked".
Or, show (amongst others) a picture of a prison inmate behind bars and ask "Click on the picture of a dude who had somebody drop a dime on him".
Of course, it's probably not feasible to generate these automatically, so you'd need a human to prepare each one... which limits the variety, which is a vulnerability. But still... my point is that humans can infer ancillary information about the scenario in the picture, which could prove very difficult for a computer to overcome.
For example, you could display separate images of a tire, a bumper, and a hood, and ask what all of the images are used to build. If the captcha systems used a collection of otherwise random images that collectively determined the correct answer, not only would it keep the test simple for humans, but it would make the cipher that much harder for the bots.
Seems the spammers are hiring boat loads of people to train their CAPTCHA-breaking software. Google and the like could do the same and hire call centers to screen applications for an email account. You want a gmail account, call a 1-800 number that connects you to some vast call center in India.
They Should use emotions instead, like what kind of emotion does this kid express (picture of a crying kid) or is this picture beautiful?
The underlying problem is that we're running out of things that are easy for people but hard for computers.
How about solving a small traveling salesman problem -- at least to within 10% of minimal path length? Maybe 15-20 points to quickly connect -- most humans can see the best path at a glance. Let's see a computer solve that.
DNA is a Turing machine. You, however, being dynamic and emergent, are not.
The holy grail of course is to find something that humans can do easily, but is impossible (or very very unlikely statistically) for a program to be able to do.
Solve a small version (15-20) points of the traveling salesman problem. Most humans can just look at it and solve it. Or you could ask questions: "Here is a partial path. Point 7 should now connect to 1) point 4, 2) point 9, 3) point 16, 4) point 5, e) point 19."
Let's see a computer just do that.
DNA is a Turing machine. You, however, being dynamic and emergent, are not.
The problem, though, you need a better quality of AI to generate arbitrary easy-but-obscure questions as you do to solve them... Keep in mind you need questions that anyone with a 3rd-grade education could read and solve, which limits you to simple grammar, small words, concrete ideas, and no math harder than addition, subtraction, and inequality.
Why not exploit humans' visual pattern matching ability? For example, a small version of the Traveling Salesman problem (say 15-20) points is easy to generate, and a typical human could easily solve it at a glance, or answer questions about a partial path: (Of the remaining open points, which one would be the best one for point number 7 to connect to?)
DNA is a Turing machine. You, however, being dynamic and emergent, are not.
I'm working on a website in which I have very strict spam filters, but when you trigger them, you can enter a CAPTCHA to continue, this allows for words such as "[Ff][Rr][Ee][Ee]" to be filtered, and still allow people (albeit, ironically) to say "Freedom"
nonconformity at work