Google's Audio CAPTCHA Falls To Automated Attack
SkiifGeek writes "Early in March, Wintercore Labs published proof of a generic approach to defeating audio CAPTCHAs, using Google's as the case study for their demonstration. With claims of over 90% success rate and expectations that this can be significantly improved with the right mix of filtering algorithms, the in-house tool remains unreleased. But it shouldn't take long for other developers to create their own tools and start targeting not only Google, but other sites that use audio CAPTCHAs for the vision-impaired. It isn't the first time that major sites (significantly major webmail providers) have had their CAPTCHAs broken, but it is the first reporting of defeating an audio CAPTCHA using a generic software approach. News about the discovery is slowly starting to spread."
some of the advanced IVR solutions (Interactive Voice Response... for like customer support or paying bills on the phone) can pick out numbers and words pretty well even under some noise conditions. so I am not totally surprised that this cracked the audio CAPTCHA.
Right from the start it was clear that audio captchas were theoretically easier to break than visual ones.
An image captcha is designed to require a mixture of perception and thought, but an audio one has to rely on pure perception, because it's temporary. You hear it then it's gone: you can't analyse it. This makes it infinitely less complicated that a video one.
It's only because of low uptake that it's taken so long for a true proof-of-concept attack.
HAL.
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
I hardly ever fail CAPTCHAs before, but ever since RapidShare implements their new CAPTCHAs it made me realized of how many more people suffered through annoyance of this. Kinda ironic though, it was supposed to weed out non-human. Reminds me of the Dilbert strip where PHB is considered the first human to fail the Turing Test.
"News about the discovery is slowly starting to spread."
And, thanks to Slashdot, news about the discovery is now RAPIDLY spreading.
do something else. show me a picture of an object and ask me (in a multiple-choice test?) what it is...a tree, a car, a house, a flower, whatever.
and for the sight-impaired, how about a read description or definition of something? "this thing is the entrance to a house or a room" => door
come on, webdesigner, it's not that hard to abandon those old and, above all, ANNOYING captchas
So given that (I assume) all audio CAPTCHAs have the same problem (i.e., the numbers and clearer voices can easily be found using audio analysis), does that mean that all audio-based CAPTCHAs are bound to fail?
Apart from OCRing books, I can't think of anything else that is not a total waste of human time. How about meta-moderating as a CAPTCHA activity; probably too fuzzy to work to a reasonable degree of accuracy.
Basically I think the arms race is already over, and a new paradigms is needed,
CAPTCHA technology is going to have a very difficult time over the next few years. Finding tasks (which can be implemented on standard computer systems and transmitted over the internet) that are trivial for humans but exceedingly difficult for computers is going to be rough.
This is especially true because the computer doesn't need a 100% success rate to effectively "break" the CAPTCHA. Heck, if the CAPTCHA gives you 3 tries before rejecting you, then a 30% success rate = fully broken.
For right now, they are still working their way through tasks that CAN be easy for computers, but no one has bothered with yet. This means that breaking the CAPTCHA is simply a matter of writing and tuning some algorithms.
I think the next step (but not the be-all/end-all of CAPTCHAs) will be a parallel approach. Give the person 4 visual or auditory CAPTCHAs, and require them to successfully solve 3 out of 4 to pass, preferably with some kind of relational puzzle regarding the answers, or at least a simple question...
EXAMPLE:
A typical obfuscated-word type CAPTCHA in 4-way parallel, the four words are KITTEN PIGLET PUPPY TOASTER, then you are asked, "Which of these is NOT a baby animal?"
Obviously this technique requires either a complete solution from the user (4/4 words correct), or requires the system to reveal the answers, which could lead to an attack based upon a dictionary-building system, which would require a massive database size (and/or a frequently updated database) to prevent.
There is room for some really innovative work in this field, as the battle will probably continue for quite a while, with ever-increasing computational speed making it more difficult.
In the end, it comes down to this:
There is nothing non-biological that every human can do but no computer can do.
Paying 3rd-world human beings usually gets past captchas.
A partial solution is to limit the services you offer based on how well you know them. Anonymous? Offer very limited services.
Anonymous but tied to an existing email address? Offer a bit more.
Authenticated by credit card, which could be stolen? Offer a bit more.
Authenticated by PO box? Offer more.
Authenticated by street address, driver's license number, and a notary? Assume they are legit, you can always sue the notary if they aren't.
Authenticated against an email address that you know has X degree of authentication? Treat them like they have X degree of authentication.
For email, USENET, and IM services, offer a relatively low limit on outgoing data for free services, charge $1/year to a credit card or checking account OR require a copy of a state-issued ID to remove the limit. Watch for multiple free accounts from the same person and give them a collective limit the same as a single free account.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Comment removed based on user account deletion
In the case of a high profile target like gmail, they're doing it from thousands of IPs in a botnet.
This space intentionally left blank
If only somebody could distribute their bots into a kind of network? Then you'd get traffic arriving from all over the place, that would be significantly more difficult to detect!
Quick, mod this post down, in case a neer-do-well were to get any ideas.