ReCAPTCHA.net Now Vulnerable to Algorithmic Attack
n3ond4x writes "reCAPTCHA.net algorithms have been developed to solve the current CAPTCHA at an efficacy of 30%. The algorithms were disclosed at DEFCON 18 over the weekend and have since been made available online. Also available is a video demonstration of random reCAPTCHA.net CAPTCHAs being subjected to the algorithms." There's probably an excellent Firefox plugin to render this page's color scheme more bearable. Note: the PowerPoint presentation linked opens fine in OpenOffice, and the video speaks for itself.
"There's probably an excellent Firefox plugin to render this page's color scheme more bearable."
just select all page, its better.
So what is the average human success rate? I think mine is only about 50%
The goggles, they do nothing!
Can these attack algorithms actually increase the accuracy of normal OCR programs?
But that just means more spambots, right?
I recently went to their homepage and looked _really_ hard for any statistics about which books are transcriped. I read their Science paper. Tried all sections.
Its all about the captcha part, and _nothing_ about the RE.
The way they state how it works ("We are using 100.000 unique words") sounds like they have given up on that part long ago and just recycle their old database again and again...
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
If not, then the captcha should only be visible when the mouse cursor is over it.
The key to a successful captcha is to make it accessible only by a user sitting in front of the screen.
It looks like that tool is better at deciphering the captchas than I am.
I'm watching the video, and the end result is "b:1/78 1.28% s:27/78 34.62%" indicating that out of 78 tests of two words per test it got a single word right 35% of the time, and both words right only once or 1% of the time.
Since both words need to be correct "solve the current CAPTCHA at an efficacy of 1%" would be closer to the truth.
No plugin needed:
View->Use Style->None
That is what it looks like in Seamonkey, Firefox will be similar. This more or less always works.
--frank[at]unternet.org
Should I run the DEFCON presenter's giant SWF or not?
o_O
Maw! Fire up the karma burner!
Why would anyone want to do this? It's like attacking the UN peace keeping troops or the Red Cross. reCAPTCHA is doing good work, digitizing scanned printed books so that the the text can be made available for online searching. Breaking reCAPTCHA is like defecating in the village well, ensuring that everyone suffers. No one benefits from reCAPTCHA being broken. No one.
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
There's probably an excellent Firefox plugin to render this page's color scheme more bearable.
I like using a Readability bookmarklet in my bookmarks bar: Readability - An Arc90 Lab Experiment
No, Firefox addons used to be called extensions, plugins are still plugins.
You wrote, "There is more than enough written and audio samples that the world would love to see OCR'ed." -- Where do you get those?
http://stephan.sugarmotor.org
Anybody that pays attention to 4chan recently knows they had to implement captcha due to a massive spamflood of infected morons. recaptcha got busted thanks to someone in /g/ who leaked the vulnerability in the sound system for reCAPTCHA, and the whole site was again inundated with spam, though not to the degree as the original spam attack.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
There is ZERO reason to use worthless tests like these as opposed to using real identification. That is instead of using computer generated difficult test, use actual pictures of actual 'difficult text' that an OCR agent failed to identify. Each person is given one alread tested sample and one unknown sample. If you get the already tested sample, then your answer is accepted as 'probable' correct for the unknown sample.
Congratulations, you've just described ReCAPTCHA! This is exactly how the current system works.
"The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
The percentages shown are a running total of all the captchas tested against in that run.
b is the % of cases where BOTH words were correctly recognized
s is the % of cases where AT LEAST ONE word was correctly recognized
You only need to know ONE word to pass a recaptcha captcha. Though it has to be the CORRECT word, and I don't know if the developers of this program knew which word was known, or if they took that into account when displaying the percentages.
The worst case scenario is that they can solve it about 1/6th of the time (getting one right 1/3 of the time, and having it be the right one 1/2 of those times). It stands to reason, however, that the "known" captchas (the ones recaptcha tests against) are the ones that are easier to solve, and thus, the actual success rate is indeed about 33%.
From TFS:
There's probably an excellent Firefox plugin to render this page's color scheme more bearable.
Halfway through this sentence I realized someone will now implement a nice little extension such that I never again have to answer these recaptchas. Pretty sure they would break this extension shortly with cunning, though. Anyway, at 30% accuracy now, it's easier to <F5> or click refresh 3 or 4 times than to get my hands off the mouse to type 2 word captchas that sometimes are eye-straining.
You don't have to reply here if you don't want to lose karma with such guilty-pleasure extension, brave spammers^Wcoders! :) I'll be googling the currently "virgin" string "captcha this fox" to find your work posted wherever.
Wrong. Plugins have been around since Netscape and are still called plugins. They have a different function than an extension (and an extension is what we would want in this case to fix the site's colours).
Both plugins and extensions, along with themes, are collectively referred to as "addons." "Plugin" is the wrong word in the summary. "Extension" or "addon" would have been acceptable.
When it is claimed to be 30% accurate, I'd expect some 30% of all captchas being correcly guessed. Watching the video, I noticed the algorithm gives itself 30-40% scores for getting just one of the two words right or sometimes even for getting the right length and a few correct letters. Didn't watch it to the end, but in the few minutes I watched, ZERO entire captcha's were solved. So that's ZERO% acurate in my book. For instance, actual captcha text "ware readiness", guessed captcha "votarry rehabbed", reported accuracy 38.24%... how the hell is that over 38% accurate? If you had that level of accuracy when trying to get past a captcha (which is pretty much the definition of it being vulnerable, right?), you wouldn't get past a single captcha. it's 30% accurate if it correcly guessed about 3 out of every 10 captcha's, not if it fails every single captcha.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
In other words.... use reCAPTCHA?
It's a bit like watching scary women fight over who was married to some guy first on Ricky Lake.
Maybe this hack can be used to improve book scanning.
Requiem for the American Dream
A Firefox extension is not the same thing as a plugin.
B.F.D.
"I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)
If reCAPTCHA's too easily breakable, then Bad Guys will figure out how, and will start exploiting sites that use reCAPTCHA for protection.
So we need to know how vulnerable it is, and the reCAPTCHA folks need to figure out how to fix it. It's an arms race, always has been, probably always will be.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
since thats about the accuracy of a human
I think you underestimate just how much I just dont care.
Seeing this article gave me an idea to come up with a new human verification process. I created a C# program in about an hour that loads images from Google images based on searching for 3 of 2000+ nouns. It shows 3 examples of each noun and asks the user to pick the correct noun from a list of 6. This program is just a proof of concept of course. Could this become useful? (Binary and source code included.)
http://enigmadream.com/misc/HumanVerification.zip
reCAPTCHA isn't bad, but Google's captchas are so hard they're probably more easily solved by learning algorithms than actual human beings.
You know a hacker is hard core when his site is monochrome in a monospace font, and he saves his files as straight up docx.
Then we can just put reCAPTCHA on all pages being used for spam, and get transcription services for free.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
By the way, that wasn't just a facetious comment. TFA isn't a serious paper. It's not even typeset, just typed into Microsoft Word. And god knows why I'm being warned about VBScript macros when I try to open it.
And this isn't a case where the little guy is making real scientific progress right under the nose of the obsolete establishment. The author doesn't even have a freshman understanding of big-O notation, it's completely juvenile.
reCAPTCHA is already on the road to beating this. When your images are on the verge of being discovered algorithmically, use Hebrew.
You young ones and your complaining. "Ohhh the colors suck" SO WHAT! You don't remember when the Internet was invaded by those dual demons from hell, Geocities and Comet Cursors! Now THAT was torture buddy! YOU try dealing with a page that looks like it was designed by Unicorns on a crack binge, while having a fricking pocketwatch suddenly appear and hang from your cursor like a ball of snot on a string, all while having your shotgunned modems drug down to 300 baud land thanks to a bazillion puke inspiring GIFs spinning all out of time!
Now THAT is real suffering kid! /wanders off muttering/
ACs don't waste your time replying, your posts are never seen by me.
When OCR gets so good that recaptcha becomes pointless, my idea for the next step of harder-for-AI captchas is to stop using line art and start using gradients. That is, currently, they use text, which is line art, and then warp it, chop it up, and run miscellaneous clutter through it. It's getting harder and harder for people to read, and machines are still catching up.
I propose that if you start with a photograph, make a selection that's block text, feather the edges, than shift the colors in the selection (Hue, saturation, inversion, remapping, whatever) that it's going to be easier for humans and harder for computers than some of the stuff we've got now. But generating it can be automated just as easily, I scripted Photoshop to make these in a few minutes.
Here's an example
Can anyone tell me how to set my sig on Slashdot?
...and how in the hell do you OCR an audio sample???
The spammers can just choose a random option until they get in. All that will do is slow them down a bit.
Try taking a picture of the CAPTCHA with your phone using the google goggles app. It works... remarkably well!
I downloaded it now and there are no macros, not to mention this is a .docx not a .docm.
I had such a site on Geocities. Flashing gifs, blink tags, a horrible MIDI playing in the background with no way of turning it of. Blue text on pink background and, naturally, CometCursor.
And of course no meaningful content what so ever.
Innocent times...
Comment removed based on user account deletion
The problem is, because they're serving up words that a computer has failed to recognise as part of their OCR project, those same words are often impossible for humans to identify also (maybe they're smudged on the original source for instance) - this does result in some incredibly difficult words to read. According to the powerpoint, you only have to get one word right, I tried this and sometimes it worked, other times it gave me an incorrect result - I think the truth is probably more like (and I'm sure I read once this is how it works) they serve one recognised word and one unrecognised word - the requirement for success is only getting the recognised word right, they just compile the results of the unrecognised word to advance their OCR projects. Usually the recognised word is more readable (because we know it at least started out readable whereas we can't make the same assumption for the unrecognised word), so in the majority of cases so long as you type the word you can read and then make a best guess at the other you will still successfully solve the captcha. Of course, it might still be easier to hit refresh a few times until you get a more readable pair.
If you get one of the answers right, and it's not the known, then you're still stuck, though. So its success rate is closer to 18%: it identifies one word correctly 35% of the time, and on 50% of those occasions, it's the known word.
No kidding!!! What do you say at this point?
Yes that's how it works, and everyone already knows that. I'm not even talking about recaptcha though, I'm talking about the impossible captchas you get when failing too many times to log into a Google account.
D4t b t3h stuff h4x0rz 1z m4d3 0f.
Step 1: open it in notepad...
Judging by the way you spell "off" or "whatsoever", the grammar mistakes (perhaps "I used to have"?) and how you place your commas, you are still as innocent as they come.