ReCAPTCHA.net Now Vulnerable to Algorithmic Attack

← Back to Stories (view on slashdot.org)

ReCAPTCHA.net Now Vulnerable to Algorithmic Attack

Posted by timothy on Thursday August 5, 2010 @08:59AM from the bless-you! dept.

n3ond4x writes "reCAPTCHA.net algorithms have been developed to solve the current CAPTCHA at an efficacy of 30%. The algorithms were disclosed at DEFCON 18 over the weekend and have since been made available online. Also available is a video demonstration of random reCAPTCHA.net CAPTCHAs being subjected to the algorithms." There's probably an excellent Firefox plugin to render this page's color scheme more bearable. Note: the PowerPoint presentation linked opens fine in OpenOffice, and the video speaks for itself.

29 of 251 comments (clear)

Min score:

Reason:

Sort:

Human Success? by Anonymous Coward · 2010-08-05 09:03 · Score: 5, Insightful

So what is the average human success rate? I think mine is only about 50%
1. Re:Human Success? by Kalriath · 2010-08-05 15:41 · Score: 3, Insightful
  
  Yeah, I agree with this. Recaptcha is one of the easiest out there.
  Admittedly though, I have around about 3% success rate with vBulletin captchas. Hear that forum owners? I'm not joining your forum because I can't read your captcha!
  
  --
  For a site about things like basic rights, Slashdot users sure do like to censor "dissent".
OCR improvements? by Anonymous Coward · 2010-08-05 09:05 · Score: 3, Interesting

Can these attack algorithms actually increase the accuracy of normal OCR programs?
1. Re:OCR improvements? by Sparr0 · 2010-08-05 19:44 · Score: 3, Insightful
  
  The problem is that since you are *probably* solving the verification words with higher accuracy to begin with, you are actually poisoning the data being gathered regarding the book words. So, while a book word becoming a verification word based on your "solutions" will keep your solution rate constant, it actually damages the system when it comes time for humans to solve the CAPTCHA, or worse when the solutions are used as OCR corrections.
  To clarify, given a classically OCR-able "foo" and a non-OCR-able-but-human-readable "bar", a human is expected to recognize the slightly-deformed-by-reCAPTCHA "foo" and is trusted to get "bar" right more often than OCR would. This attack only defeats the deformation applied by reCAPTCHA, it doesn't actually improve the OCR on the non-deformed words, which means you are going to submit an answer of "foo ban" every time this pair is encounted (or "blah ban" for a different scenario), and the reCAPTCHA system is eventually going to decide that the book word really is "ban".
Speaking about re-captcha by imsabbel · 2010-08-05 09:08 · Score: 3, Informative

I recently went to their homepage and looked _really_ hard for any statistics about which books are transcriped. I read their Science paper. Tried all sections.
Its all about the captcha part, and _nothing_ about the RE.
The way they state how it works ("We are using 100.000 unique words") sounds like they have given up on that part long ago and just recycle their old database again and again...

--
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
1. Re:Speaking about re-captcha by icebraining · 2010-08-05 09:14 · Score: 4, Informative
  
  Currently, we are helping to digitize old editions of the New York Times and books from Google Books.
  http://www.google.com/recaptcha/learnmore
  
  --
  Dilbert RSS feed
2. Re:Speaking about re-captcha by imsabbel · 2010-08-05 09:49 · Score: 4, Interesting
  
  Hm.
  So its for-profit work for the biggest advertising firm in the world.
  Sort of expected project gutenberg or something.
  Too bad.
  
  --
  HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
far from it by MagicM · 2010-08-05 09:12 · Score: 3, Informative

I'm watching the video, and the end result is "b:1/78 1.28% s:27/78 34.62%" indicating that out of 78 tests of two words per test it got a single word right 35% of the time, and both words right only once or 1% of the time.
Since both words need to be correct "solve the current CAPTCHA at an efficacy of 1%" would be closer to the truth.
1. Re:far from it by hydrofix · 2010-08-05 09:53 · Score: 5, Informative
  
  Since both words need to be correct "solve the current CAPTCHA at an efficacy of 1%" would be closer to the truth.
  Actually, that is incorrect. The other word is already positively known by the OCR, and serves as a control, while the other is the one that the OCR could not read. It will of course only check the one that it knowns, and assumes the other one is then correct as well. So, if you get one of the words correct AND this is the same word that as their OCR identified correctly (which is very likely the case), then you pass, but most of the time (99%) give a bad answer for the harder, non-OCR word. Sadly, this leads to pollution of their database in the long run.
2. Re:far from it by Jorl17 · 2010-08-05 13:20 · Score: 4, Informative
  
  This is not informative. As many have said. If You read: http://www.google.com/recaptcha/learnmore , you'll get it.
  
  Here is the deal: reCAPTCHA presents two words. One is picked by it and is previously known. The other one is a word from a book that has been scanned. Said word is unknown to the reCAPTCHA system. When the user enters both words, reCAPTCHA checks to see if the known word has been properly recognized. If that is the case, then reCAPTCHA can assume that a human is answering. Given that a human is answering, then the second unknown word given by the human is most likely correct, because he/she will be able to recognize it as well. Using this system, reCAPTCHA works as a CAPTCHA (spam prevention) mechanism and also helps transforming old books/papers into digital format, such as the New York Times.
  
  So, in practice, only one word has to be correct -- the word that reCAPTCHA knows. What's sad is that bots may contribute incorrect second words...
  
  Next time, get informed before going all crazy.
  
  And here is the relevant info, quoted from the aforementioned website:
  
  reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly. But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.
  
  --
  Have you heard about SoylentNews?
Plugin not needed... by knarf · 2010-08-05 09:13 · Score: 3, Informative

There's probably an excellent Firefox plugin to render this page's color scheme more bearable
No plugin needed:
View->Use Style->None
That is what it looks like in Seamonkey, Firefox will be similar. This more or less always works.

--
--frank[at]unternet.org
Hmm by Tailhook · 2010-08-05 09:15 · Score: 5, Funny

Should I run the DEFCON presenter's giant SWF or not?
o_O

--
Maw! Fire up the karma burner!
Bad Hacking by pz · 2010-08-05 09:16 · Score: 4, Insightful

Why would anyone want to do this? It's like attacking the UN peace keeping troops or the Red Cross. reCAPTCHA is doing good work, digitizing scanned printed books so that the the text can be made available for online searching. Breaking reCAPTCHA is like defecating in the village well, ensuring that everyone suffers. No one benefits from reCAPTCHA being broken. No one.

--

Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
1. Re:Bad Hacking by Dhalka226 · 2010-08-05 09:31 · Score: 5, Insightful
  
  No one benefits from reCAPTCHA being broken. No one.
  Spammers.
2. Re:Bad Hacking by maxume · 2010-08-05 09:32 · Score: 5, Insightful
  
  Actually, it could be of use to reCAPTCHA, they can just pass their test words through this system before they make them public and then use the output to help prevent similar attacks.
  
  --
  Nerd rage is the funniest rage.
3. Re:Bad Hacking by Flyne · 2010-08-05 09:42 · Score: 4, Insightful
  
  The problem of breaking reCAPTHCA is precisely the same problem as increasing computer OCR abilities, since reCAPTCHA by design uses words which current OCR abilities are inadequate for. This is a good thing for AI and computer vision and text digitization.
4. Re:Bad Hacking by sbayless · 2010-08-05 09:58 · Score: 5, Insightful
  
  No one benefits from reCAPTCHA being broken. No one
  You couldn't be more wrong. Sure, breaking reCAPTCHA would create a headache for website admins (including me, for example), but in order to break reCAPTCHA someone has to devise a better text recognition program. And that's great news! This is an example of a general side effect of the cat and mouse game that are captchas. Captcha's are a simple form of Turing Test, where website admins are trying to determine who is a computer and who is a real human being. Every time a captcha gets broken, we get a sophisticated new algorithm for doing something that previously only humans could do (or only humans could do well, at least).
5. Re:Bad Hacking by Timmmm · 2010-08-05 11:10 · Score: 3, Insightful
  
  The problem of breaking reCAPTHCA is precisely the same problem as increasing computer OCR abilities
  No it isn't. Well, not unless you read books with wavy crossed-out words and don't mind 30% accuracy.
Re:colours by electrostatic · 2010-08-05 09:17 · Score: 4, Informative

"...an excellent Firefox plugin to render this page's color scheme more bearable."

Yep. Color Toggle

https://addons.mozilla.org/en-US/firefox/addon/9408/

I have it set so Ctl-Shift-Z set light yellow background, black text, and blue links.
Re:Offtopic by Anonymous Coward · 2010-08-05 09:29 · Score: 4, Informative

No, Firefox addons used to be called extensions, plugins are still plugins.
Re:My eyes! by SomeJoel · 2010-08-05 09:40 · Score: 4, Funny

Did you not learn when I explained this yesterday? The quote is: "My eyes! The goggles do nothing!". There is no "they", nor is there any bad pronunciation. Indeed, it is correctly articulated and enunciated, with an accent.
Easy there champ, nobody appreciates a Family Guy nerd correcting everyone's quotes.

--
<Complete your profile by adding a signature!>
Is this related? by Khyber · 2010-08-05 09:48 · Score: 4, Interesting

Anybody that pays attention to 4chan recently knows they had to implement captcha due to a massive spamflood of infected morons. recaptcha got busted thanks to someone in /g/ who leaked the vulnerability in the sound system for reCAPTCHA, and the whole site was again inundated with spam, though not to the degree as the original spam attack.

--
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Re:So many better ways than recaptcha by JesseMcDonald · 2010-08-05 09:49 · Score: 3, Informative

There is ZERO reason to use worthless tests like these as opposed to using real identification. That is instead of using computer generated difficult test, use actual pictures of actual 'difficult text' that an OCR agent failed to identify. Each person is given one alread tested sample and one unknown sample. If you get the already tested sample, then your answer is accepted as 'probable' correct for the unknown sample.
Congratulations, you've just described ReCAPTCHA! This is exactly how the current system works.

--
"The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
Re:Offtopic by Cougar+Town · 2010-08-05 09:59 · Score: 3, Informative

Wrong. Plugins have been around since Netscape and are still called plugins. They have a different function than an extension (and an extension is what we would want in this case to fix the site's colours).
Both plugins and extensions, along with themes, are collectively referred to as "addons." "Plugin" is the wrong word in the summary. "Extension" or "addon" would have been acceptable.
How is this 30% accurate??? by mwvdlee · 2010-08-05 10:02 · Score: 3, Insightful

When it is claimed to be 30% accurate, I'd expect some 30% of all captchas being correcly guessed. Watching the video, I noticed the algorithm gives itself 30-40% scores for getting just one of the two words right or sometimes even for getting the right length and a few correct letters. Didn't watch it to the end, but in the few minutes I watched, ZERO entire captcha's were solved. So that's ZERO% acurate in my book. For instance, actual captcha text "ware readiness", guessed captcha "votarry rehabbed", reported accuracy 38.24%... how the hell is that over 38% accurate? If you had that level of accuracy when trying to get past a captcha (which is pretty much the definition of it being vulnerable, right?), you wouldn't get past a single captcha. it's 30% accurate if it correcly guessed about 3 out of every 10 captcha's, not if it fails every single captcha.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:My eyes! by SomeJoel · 2010-08-05 11:53 · Score: 4, Funny

Judging from the other replies, meta-humor is a little hard for you guys...

It works wonders though. For instance, the next time someone is talking about "the force" or jedis and such, tell them "Get a life, Star Trek sucks!". You'll find the reaction much more interesting than if you correctly identify the franchise.

--
<Complete your profile by adding a signature!>
New Human Verification Scheme by BlueMonk · 2010-08-05 12:39 · Score: 3, Interesting

Seeing this article gave me an idea to come up with a new human verification process. I created a C# program in about an hour that loads images from Google images based on searching for 3 of 2000+ nouns. It shows 3 examples of each noun and asks the user to pick the correct noun from a list of 6. This program is just a proof of concept of course. Could this become useful? (Binary and source code included.)
http://enigmadream.com/misc/HumanVerification.zip
Re:My eye's... by Peach+Rings · 2010-08-05 12:52 · Score: 4, Funny

You know a hacker is hard core when his site is monochrome in a monospace font, and he saves his files as straight up docx.
Re:My eye's... by hairyfeet · 2010-08-05 13:41 · Score: 5, Funny

You young ones and your complaining. "Ohhh the colors suck" SO WHAT! You don't remember when the Internet was invaded by those dual demons from hell, Geocities and Comet Cursors! Now THAT was torture buddy! YOU try dealing with a page that looks like it was designed by Unicorns on a crack binge, while having a fricking pocketwatch suddenly appear and hang from your cursor like a ball of snot on a string, all while having your shotgunned modems drug down to 300 baud land thanks to a bazillion puke inspiring GIFs spinning all out of time!
Now THAT is real suffering kid! /wanders off muttering/

--
ACs don't waste your time replying, your posts are never seen by me.