ReCAPTCHA.net Now Vulnerable to Algorithmic Attack
n3ond4x writes "reCAPTCHA.net algorithms have been developed to solve the current CAPTCHA at an efficacy of 30%. The algorithms were disclosed at DEFCON 18 over the weekend and have since been made available online. Also available is a video demonstration of random reCAPTCHA.net CAPTCHAs being subjected to the algorithms." There's probably an excellent Firefox plugin to render this page's color scheme more bearable. Note: the PowerPoint presentation linked opens fine in OpenOffice, and the video speaks for itself.
So what is the average human success rate? I think mine is only about 50%
Why would anyone want to do this? It's like attacking the UN peace keeping troops or the Red Cross. reCAPTCHA is doing good work, digitizing scanned printed books so that the the text can be made available for online searching. Breaking reCAPTCHA is like defecating in the village well, ensuring that everyone suffers. No one benefits from reCAPTCHA being broken. No one.
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
When it is claimed to be 30% accurate, I'd expect some 30% of all captchas being correcly guessed. Watching the video, I noticed the algorithm gives itself 30-40% scores for getting just one of the two words right or sometimes even for getting the right length and a few correct letters. Didn't watch it to the end, but in the few minutes I watched, ZERO entire captcha's were solved. So that's ZERO% acurate in my book. For instance, actual captcha text "ware readiness", guessed captcha "votarry rehabbed", reported accuracy 38.24%... how the hell is that over 38% accurate? If you had that level of accuracy when trying to get past a captcha (which is pretty much the definition of it being vulnerable, right?), you wouldn't get past a single captcha. it's 30% accurate if it correcly guessed about 3 out of every 10 captcha's, not if it fails every single captcha.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
The problem is that since you are *probably* solving the verification words with higher accuracy to begin with, you are actually poisoning the data being gathered regarding the book words. So, while a book word becoming a verification word based on your "solutions" will keep your solution rate constant, it actually damages the system when it comes time for humans to solve the CAPTCHA, or worse when the solutions are used as OCR corrections.
To clarify, given a classically OCR-able "foo" and a non-OCR-able-but-human-readable "bar", a human is expected to recognize the slightly-deformed-by-reCAPTCHA "foo" and is trusted to get "bar" right more often than OCR would. This attack only defeats the deformation applied by reCAPTCHA, it doesn't actually improve the OCR on the non-deformed words, which means you are going to submit an answer of "foo ban" every time this pair is encounted (or "blah ban" for a different scenario), and the reCAPTCHA system is eventually going to decide that the book word really is "ban".