Audio CAPTCHAs Cracked; ReCAPTCHA Remains Strong
Falkkin writes "Ars Technica reports that audio CAPTCHAs consisting of only distorted digits or letters can be easy to crack using machine learning techniques. This includes most of the audio CAPTCHAs currently in use on the Web. The reCAPTCHA team has discussed their new audio CAPTCHA, which is resistant to this attack."
It was okay at first, but now it's reached the point where it takes me 3 or 4 tries to finally guess the letters.
It's become more hassle than it's worth. Isn't there a better way to stop bots from getting accounts?
FOX NEWS.com should be BANNED from television and internet. Have the Congress take it over and give us Truespeak.
I'm half afraid to admit this publicly, but did anyone else try clicking the "play" button on screenshot of the audio CAPTCHA player in the first article? I took me a few tries before I realized it was only an image.
Better known as 318230.
I'm a human being and I can't break audio captcha. Sounds like gibberish to me.
If you can make it to a longer time for a human to crack it, it would increase the costs. Double the time, double the cost.
But, say, if it now takes 10 seconds to crack a captcha, it would need to take more than an hour to cost $1 per captcha :-).
I wonder how a web-of-trust system combined with more difficult captchas (more trust -> easier captchas) would work; if a branch of the web is a spammer, it's easier to cut off.. But, this must've been suggested even in this context already, so hit me with the "your spam protection idea doesn't work, because.." form ;-).
i thought RECAPATCHA was susceptible, as if enough bots guess the same answer on an image they will make that a valid answer. Does this not work or has nobody bothered?
IranAir Flight 655 never forget!
They should just make a CAPTCHA that requires strong AI to crack; we could make a great leap ahead in AI by letting the spammers solve all the problems for us!
Isn't this just an advertisement for ReCAPTCHA disguised as a news item?
People crack CAPTCHAs for profit. They either sell the algorithms to spammers or spam themselves.
The thing is, if you managed to reliably crack RECAPTCHA, then you've succeeded where all the best OCR software on the market has failed (All Recaptcha's are words that couldn't be deciphered by existing software). At which point there's big bucks to be made legally selling the software.
You feel sleepy. Close your eyes. The opinions stated above are yours. You cannot imagine why you ever felt otherwise.
Banning that way doesn't work real well when you consider dynamic IPs, distributed attacks (bot nets), proxies, etc.
Unless you're willing to ban at least a third of the world, you're not going to get much out of that.
upon the advice of my lawyer, i have no sig at this time
In my crystal ball I see some fool who does not turn off the sound on the PC in an office.
By law, offices of companies over a certain size must accommodate people whose disability requires sound to do their jobs.
Unfortunately, history has shown that many people also still have digital camera's that make the *click* noise
By law, camera phones must make the click noise when operated within some countries to help fight voyeurism.
Captchas are user unfriendly and relatively ineffective.
A more effective route is to require a new user to submit their postal address and a phone number. Then the service mails a post card containing a verification code to the postal address and/or calls the phone number. Google does this for AdSense publishers.
Ron
Only until someone finds a way to make cracking the captcha more efficient and suddenly it is back to the original cost to crack the same captcha again. This is what that machine learning is all about.
Meanwhile, the problem is that this back and forth with captchas is essentially causing programmers who wish to break it, to come up with very complex AI.
At some point, if the AI is smarter than the person, as mentioned above people won't be able to crack the captcha.
On this very article the only reason this "captcha has yet to be cracked" is because they just brought it out. Once it gets attention, it'll be cracked like all the rest.
I don't really understand how translating from speech into text is equal to translating from speech to text in a different language.
I could listen to every word you say and write it down no problem, but ask me to translate it into Japanese or something and I wouldn't have a clue.
You only have to look at games like Endwar to see how good speech recognition has gotten, it requires no calibration (well, maybe a word or two at the start) and has yet to fail me once and it seems to work for people with many different accents.
That said, Endwar does use specific commands so I suppose it could be a somewhat simplified scenario in that if the command words are selected sensibly there is no overlap in commands sounding nearly similar, but regardless even much of the voice reconigtion software for dictating documents etc. out there now does a great job with little to no training now.
The tricky bit with CAPTCHA is not just asking questions that are easy for humans and hard for AI. There is a huge field of well known stuff, common sense, basic knowledge, etc, etc. that would work. The problem is asking questions that are easy for AI to ask, easy for humans to answer and hard for AI to answer.
If you have to manually populate your CAPTCHA, you have a problem. It costs just about as much(in money and time) to manually document a set of CAPTCHA questions as it would to build the set. If you can't generate questions automatically, your CAPTCHA will be expensive, or useless, or both. RECAPTCHA is interesting in that is a something of a hybrid. It makes use of real world complexity, from scanned documents; but largely automates the conversion of real world complexity into CAPTCHAs, which makes it fairly practical to use at a large scale.
One of the requirements is that there will be an extremely large number of possible questions (and answers) to keep attackers from making a small database for every question or simply brute forcing it too quickly. As a result it is preferable not to need human interaction to create the question/answer sets. Varying pictures of animals/etc are not something computers can generate on their own, but would require human beings to collect. The amount of additional manpower needed using such a method over what we use today is substantial... too much.
"A witty saying proves nothing." - Voltaire
Is this why handwriting won't work? Fancy elderly handwriting is especially hard to read. OCR software is rather helpless against it. (I propose hiring retired people to write words sloppily and scan them!)
The government can't save you.
And if the posts were held before becoming visible, there wouldn't even have been one.
The community your are a member of seems to be near this level of completeness.
Having a few trusted reviewers who read all posts before letting them pass would be the last step.
People often complain about schemes like this that their messages need to be seen immediately so people can respond immediately but I say having two or three moderators would make the whole process pretty quickly anyway.
Remember when you used to mail things? THAT took time and the world STILL progressed.
I don't know the meaning of the word 'don't' - J
Oh, the other thing, that I forgot: certain sorts of natural language questions would actually be trivially easy to answer, and thus would have to be avoided. Consider your "how many?" examples.
Obviously there can't be fewer than 0 of something in a picture, and you can assume that(for the sake of not pissing people off) you won't make your customers count more than 20 of something. Thus, if I am trying to crack your CAPTCHA, If my script sees "how many...?" it will just pick a number between 0 and 20, inclusive. That is ~5% accuracy without anything cleverer than one line of regex. Since you can tell whether or not you solved a given CAPTCHA, your script could even, with some additional logic, chose future guesses based on past success.
Questions about colors and animals and things have some similar vulnerabilities. How many colors can you reasonably expect your average viewer to verbally distinguish between? Maybe 30, tops? A fairly basic image processing heuristic(say, have a human identify a bunch of visually distinct color groups and name them, then have your script identify all color groups that make up more than 10% of the target image, and make a guess from among those) could thus achieve decent success on any "what color?" questions. Animals are tricker, because you start to get into nontrivial identification of shape; but there also aren't that many plausible choices. I suspect that you couldn't presume the ability to distinguish more than 100 or so animals, which makes even naive guessing a functional strategy, with basic imagine processing tightening up considerably from there.
Captcha is really security by obscurity. Readily identifiable information is obscured in such a way as the computers (supposedly) can't find it.
Real security requires a secret. It's as simple as that. So long as the secret can be identified without knowing the secret, your security system is a joke.
Computers are getting better, faster, smarter, cheaper. Moore's wall gets higher every single year, and soon, it will be routine for computers to match or exceed human intelligence. (It can be argued that they already do, particularly in the case of a certain US President)
Therefore, anything that relies on human intelligence to "weed out" machine intelligence will eventually fail. Captcha is the testing ground for the passing of the Turing Test!
I have no problem with your religion until you decide it's reason to deprive others of the truth.
What if the applicant for access submits a facial photograph along with his/her application information?
(1) Use facial recognition software to decide whether a human picture has been submitted. Deny access to those not submitting a picture of a human. Store the picture. Keep refining the algorithm.
(2) Determine whether the pictured person has been used in a previous attempt to obtain access. If access has been obtained, don't let them create another account unless their present account is terminated. If access has been rejected, then you have a presumptively bad applicant.
(3) Websites could share database information about the rejected pictured-people. This would bring in more data (like time and volume of a single facial picture's use, for example). That additional information could be used to help refine the algorithm.