How Hackers Listened Their Way Around Google's Recaptcha
An anonymous reader writes with this story at Ars Technica: "Three self-taught hackers from the DC949 hacker collective managed to use a combination of techniques to beat ReCaptcha with 99.1% accuracy (better than most humans!)" In short, the hackers skipped the visual part of the Recaptcha system entirely, focusing on the audio alternative, which gave them a few convenient angles of attack. Google responded with changes to the system, but that doesn't minimize their accomplishment.
They wisely chose the weakest link to attack.
Since they beat the Turing Test, this means we've reached the AI singularity... right?
There's no -1 for "I don't get it."
I realized there's an interesting aspect to this, in that gVoice transcription is actively trying to do basically the same thing these guys did* (albeit in a far more general way). Wonder how gVoice would do transcribing google's own recaptcha audio. Someone go try that. Either way though, it's an interesting dilemma if they ever got automatic transcription good enough to defeat these audio recaptchas.
* Well, after RTFA, I realize that a fair bit of what they did was actually more related to hashing (and the pseudo-random generator) vs actually trying to parse the audio, but still.
Most of the spammers who circumvent captcha's use real people to fill in their captcha's for them. How they do it:
1) A pay-per-filled-in-captcha site (where members solve captcha's, not really getting paid eventhough they think they will be) OR a high traffic site (false/scam sites, hacked sites, etc)
2) Mirror the image from the site you want to spam to your own site
3) A person visits your own site with the mirrored image and solves the captcha
4) Mirror the answer back to the site you want to spam
5) ???
6) Profit! (literally)
That's it! Make all users do a SERIES of incredibly hard recaptchas. Those who get too many correct are machines! Brilliant!
I had one of these the other day that was beyond absurd. The visual was a complete scrambled mess, with nearly every letter seemingly equally likely too be 2 or 3 different letters. The audio was even worse: loud gibberish in the foreground with what sounded like someone whispering the actual text in the background. It wasn't until 2 reloads later that I was lucky enough to get a recaptcha that was only slightly ambiguous, and I was able to get it on the 2nd guess. I was far more annoyed at this than I ever have been at a spambot. I'm not sure this is a step in the right direction. Time to move away from garbled text.
It EXACTLY minimizes their accomplishment. Everyone knew the day that was easily exploited, google would get a little less accessable to the disabled. Everyone knew it was the weakest attack point. (jerks!)
Google's captchas are the worst I've ever seen. They're almost always unreadable and need to be refreshed all the time. I like Recaptcha (which isn't what Google uses on their sites despite owning it), they're generally pretty clear and in addition provide a free service to anyone that wants to use it. I have no clue why Google sticks with their awful in-house captchas for Gmail, Youtube, etc.
I bet Siri could solve it.
All the voice tools out there could be harnessed to this sad end.
Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
Quote summary:
Google responded with changes to the system, but that doesn't minimize their accomplishment.
On the contrary, yet is does minimize their accomplishment. It makes it all for nothing, a technical exercise, with no near term or long term payback.
Recaptcha is a huge con, no more secure then the original captcha. The second (or first) portion being there only to serve some other purpose, and any answer will do.
Adding the audio option (probably forced by ADA) did nothing for security. At best this demonstrates that adding multiple different keys to the same lock makes things worse, not better.
Captcha's original intent was to slow down bots, by making the user prove they were human. They are seldom used to protect anything
of value, simply to keep the nuisance bots to a dull roar.
Now it appears that machines can beat captcha and recaptcha very easily. So WHY do we still see these schemes in use?
Sig Battery depleted. Reverting to safe mode.
Because even a very "high" accuracy machine system is still going to add a significant barrier to automatically cracking the results, especially if Google continues altering reCAPTCHA like they do. While you won't eliminate 100% of attackers, you can eliminate the vast majority, and slow down the attackers that do get through. The alternative is to use nothing, and believe me: you absolutely do not want that. The Internet would be 99.99999999% spam almost overnight if that happened.
"None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
intelligence on /. bravo dear sir...
Now it appears that machines can beat captcha and recaptcha very easily. So WHY do we still see these schemes in use?
Could you give me your address, and let me know when you won't be home? (I presume you no longer lock your house.)
Re:How far behind were the criminals/spammers?
At about 75%, from what I read on the black hat forums.
There's a whole social spam ecosystem out there now, with tools and services for spamming Facebook, Twitter, Instagram, Google+, Yelp, Tumblr, Youtube, random blogs, and for retro types, Myspace. It's not just a few people doing this. It's an industry with a supply chain. Read my "Social is bad for search, and search is bad for social" paper for an overview. If it feeds into Google search rankings, it's being spammed.
On the contrary, yet is does minimize their accomplishment. It makes it all for nothing, a technical exercise, with no near term or long term payback. Recaptcha is a huge con, no more secure then the original captcha. The second (or first) portion being there only to serve some other purpose, and any answer will do.
It's funny that you'd complain about a waste of effort and then bemoan Recaptcha, which was developed to prevent all those man-years of solving CAPTCHA's from going to waste.
BTW, the founder of Recaptcha has expressed that he will be happy when it can be defeated trivially because at that point the other job it's trying to do can be completely automated, which is still a win.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Now *that's* impressive. The closest approximation I've heard to the audio captchas I've encountered would be the few recordings I've heard that John Lennon used to give out as gifts: he'd record multiple radios playing different stations.
I did once get an audio captcha that was almost solvable -- AFAICT, it was a conversation between C'thullu in his native tongue and Tom Waits responding in Aramaic, recorded in a crowded airport terminal that had lots of loudspeaker announcements.
Ah but click on the "accessible" option and lookie lookie, an mp3 audio file with gibberish and a background voice. "enter the words you hear".
So this exploit would at least prevent using that option.
The game concept is pretty good though, they just need to make an accessible version.
A fool throws a stone into a well and a thousand sages can not remove it.
When idiots spam every thread with worthless "First!" posts, how could any one of these posts not be redundant?