Google ReCAPTCHA Cracked
stormdesign writes "Despite denials from Google, a security researcher continues to assert that the Search King's reCAPTCHA system for protecting Web sites from spammers can be successfully exploited by Internet junk mail panderers."
FTA:
Researcher Jonathan Wilkins published a paper recently that included an analysis of reCAPTCHA’s security. In automated attacks he conducted against the system, he reported he had an alarming success rate of 17.5 percent.
Well, last year someone showed ad DEFCON that he could solve the reCAPTCHA CAPTCHAs with an efficacy of 30% already.
So how is this news? Am I missing something?
...last year.
Google reCAPTCHA cracked
Written by John P Mello Jr on January 5, 2010
As much as it's nice to know reCAPTCHA is working towards a good cause (digitising old books, if you live under a rock or something), the amount of times I've got incomprehensible jibberish from it makes me rather unsympathetic towards their cause. It'd be nice to think there was some better way of keeping spam out, but I guess developer laziness and Google's endless crusade to rule the Internet we'll be stuck trying to decipher nonsense from the 1900s for a good while yet.
The trouble with this (and less funny image suggestions) is that the "CA" in "CAPTCHA" stands for "Completely Automated".
CAPTCHAs work as a sort of AI hash function: it's easy for a computer to generate, but hard for one to solve. Using images for tests like "what position is this", or, more realistically, "is this a cat or dog" violates that principle: Creating the CAPTCHA is just as much work as it is to solve! On top of that, the finite availibility of images allows for a database attack. Even having 5-10% of the images known makes the CAPTCHA fairly useless.
One possible furture, though, is rendered images. So, for example, have a creature creator generate a dog and cat then ask which one's bigger. There are a few discussions/papers on the topic (e.g. a least one suggests determining which object is in front of another). The point is though, that using photos is a dead end. There are too few and/or it's too difficult to determine the correct answer.
With reCaptcha, you don't have to successfully OCR the scanned word, just the control word. Usually they are indistinguishable by sight (you don't know which one is the control word), but I've seen reCaptcha instances where one word is clear and the other one is unreadable. In these cases, you can type the control word correctly and just write some gibberish for the other, and you'll beat the captcha.
Which means that the spammer won't have to OCR the hardest of the words... just the simpler one. Run the OCR to the full text, post both words, and if the simpler one matches, you broke the captcha.
(I make it sound so easy! It really isn't! I'm amazed that they did break it! I just wanted to point out that it isn't "OCR words that haven't been OCRd before", rather than "OCR words that have been OCRd previously and are now a bit distorted".)
Another fun trick is how easy it is to catch spambots by using "invisible" form fields. Bots are too "stupid" to negotiate around these traps. They fill in those fields just like they do the visible ones, allowing you, the site operator, to instantly bin their nonsense to /dev/null with scripts and ban their IP addresses.
@Mindless Drivel: 100% of Twitter posts ever Tweeted.