ReCAPTCHA.net Now Vulnerable to Algorithmic Attack
n3ond4x writes "reCAPTCHA.net algorithms have been developed to solve the current CAPTCHA at an efficacy of 30%. The algorithms were disclosed at DEFCON 18 over the weekend and have since been made available online. Also available is a video demonstration of random reCAPTCHA.net CAPTCHAs being subjected to the algorithms." There's probably an excellent Firefox plugin to render this page's color scheme more bearable. Note: the PowerPoint presentation linked opens fine in OpenOffice, and the video speaks for itself.
... They bleed nope wait just a shitty color scheme
Does the PowerPoint open fine in Keynote?
"There's probably an excellent Firefox plugin to render this page's color scheme more bearable."
just select all page, its better.
So what is the average human success rate? I think mine is only about 50%
The goggles, they do nothing!
Can these attack algorithms actually increase the accuracy of normal OCR programs?
But that just means more spambots, right?
I recently went to their homepage and looked _really_ hard for any statistics about which books are transcriped. I read their Science paper. Tried all sections.
Its all about the captcha part, and _nothing_ about the RE.
The way they state how it works ("We are using 100.000 unique words") sounds like they have given up on that part long ago and just recycle their old database again and again...
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
If not, then the captcha should only be visible when the mouse cursor is over it.
The key to a successful captcha is to make it accessible only by a user sitting in front of the screen.
It looks like that tool is better at deciphering the captchas than I am.
I'm watching the video, and the end result is "b:1/78 1.28% s:27/78 34.62%" indicating that out of 78 tests of two words per test it got a single word right 35% of the time, and both words right only once or 1% of the time.
Since both words need to be correct "solve the current CAPTCHA at an efficacy of 1%" would be closer to the truth.
It isn't sufficient to get 30% of the characters right. "im bailiwick" is recognized as "iffy ballboy" and that result gets a 32.73% rating. Doesn't look broken to me.
Now 30% of the captchas, that would be something.
No plugin needed:
View->Use Style->None
That is what it looks like in Seamonkey, Firefox will be similar. This more or less always works.
--frank[at]unternet.org
Should I run the DEFCON presenter's giant SWF or not?
o_O
Maw! Fire up the karma burner!
Why would anyone want to do this? It's like attacking the UN peace keeping troops or the Red Cross. reCAPTCHA is doing good work, digitizing scanned printed books so that the the text can be made available for online searching. Breaking reCAPTCHA is like defecating in the village well, ensuring that everyone suffers. No one benefits from reCAPTCHA being broken. No one.
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
There's probably an excellent Firefox plugin to render this page's color scheme more bearable.
I like using a Readability bookmarklet in my bookmarks bar: Readability - An Arc90 Lab Experiment
A Firefox extension is not the same thing as a plugin.
Firefox plugins ***used*** to be called Firefox extensions. You must just be too young to know this.
GETCHA
Try hitting ctrl-a.
And that, timothy, is the difference between a dork and a geek. You failed the Twit Filter at reCAPTCHA.
No, Firefox addons used to be called extensions, plugins are still plugins.
There is ZERO reason to use worthless tests like these as opposed to using real identification. That is instead of using computer generated difficult test, use actual pictures of actual 'difficult text' that an OCR agent failed to identify. Each person is given one alread tested sample and one unknown sample. If you get the already tested sample, then your answer is accepted as 'probable' correct for the unknown sample. Three matching probable correct = confirmed as correct, and move the unknown sample to the "already tested" section
There is more than enough written and audio samples that the world would love to see OCR'ed. We don't have to generate fake ones.
excitingthingstodo.blogspot.com
Anybody that pays attention to 4chan recently knows they had to implement captcha due to a massive spamflood of infected morons. recaptcha got busted thanks to someone in /g/ who leaked the vulnerability in the sound system for reCAPTCHA, and the whole site was again inundated with spam, though not to the degree as the original spam attack.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
From TFS:
There's probably an excellent Firefox plugin to render this page's color scheme more bearable.
Halfway through this sentence I realized someone will now implement a nice little extension such that I never again have to answer these recaptchas. Pretty sure they would break this extension shortly with cunning, though. Anyway, at 30% accuracy now, it's easier to <F5> or click refresh 3 or 4 times than to get my hands off the mouse to type 2 word captchas that sometimes are eye-straining.
You don't have to reply here if you don't want to lose karma with such guilty-pleasure extension, brave spammers^Wcoders! :) I'll be googling the currently "virgin" string "captcha this fox" to find your work posted wherever.
Wrong. Plugins have been around since Netscape and are still called plugins. They have a different function than an extension (and an extension is what we would want in this case to fix the site's colours).
Both plugins and extensions, along with themes, are collectively referred to as "addons." "Plugin" is the wrong word in the summary. "Extension" or "addon" would have been acceptable.
When it is claimed to be 30% accurate, I'd expect some 30% of all captchas being correcly guessed. Watching the video, I noticed the algorithm gives itself 30-40% scores for getting just one of the two words right or sometimes even for getting the right length and a few correct letters. Didn't watch it to the end, but in the few minutes I watched, ZERO entire captcha's were solved. So that's ZERO% acurate in my book. For instance, actual captcha text "ware readiness", guessed captcha "votarry rehabbed", reported accuracy 38.24%... how the hell is that over 38% accurate? If you had that level of accuracy when trying to get past a captcha (which is pretty much the definition of it being vulnerable, right?), you wouldn't get past a single captcha. it's 30% accurate if it correcly guessed about 3 out of every 10 captcha's, not if it fails every single captcha.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Talking about 4chan, there's currently an hilarious thread about impossible captcha's: http://boards.4chan.org/r9k/res/10509296 (note, it is of course 4chan, be careful there, although this is r9k, not the worst of boards.. )
It's a bit like watching scary women fight over who was married to some guy first on Ricky Lake.
Maybe this hack can be used to improve book scanning.
Requiem for the American Dream
A Firefox extension is not the same thing as a plugin.
B.F.D.
"I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)
where's the /. story on wikileaks "insurance" file? come on come on......
If reCAPTCHA's too easily breakable, then Bad Guys will figure out how, and will start exploiting sites that use reCAPTCHA for protection.
So we need to know how vulnerable it is, and the reCAPTCHA folks need to figure out how to fix it. It's an arms race, always has been, probably always will be.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
since thats about the accuracy of a human
I think you underestimate just how much I just dont care.
Seeing this article gave me an idea to come up with a new human verification process. I created a C# program in about an hour that loads images from Google images based on searching for 3 of 2000+ nouns. It shows 3 examples of each noun and asks the user to pick the correct noun from a list of 6. This program is just a proof of concept of course. Could this become useful? (Binary and source code included.)
http://enigmadream.com/misc/HumanVerification.zip
reCAPTCHA isn't bad, but Google's captchas are so hard they're probably more easily solved by learning algorithms than actual human beings.
Then we can just put reCAPTCHA on all pages being used for spam, and get transcription services for free.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
reCAPTCHA is already on the road to beating this. When your images are on the verge of being discovered algorithmically, use Hebrew.
Bookmarklets for Zapping Annoyances
https://www.squarefree.com/bookmarklets/zap.html
Try the "zap colors" bookmarklet. There are a few other useful bookmarklets on that page too.
When OCR gets so good that recaptcha becomes pointless, my idea for the next step of harder-for-AI captchas is to stop using line art and start using gradients. That is, currently, they use text, which is line art, and then warp it, chop it up, and run miscellaneous clutter through it. It's getting harder and harder for people to read, and machines are still catching up.
I propose that if you start with a photograph, make a selection that's block text, feather the edges, than shift the colors in the selection (Hue, saturation, inversion, remapping, whatever) that it's going to be easier for humans and harder for computers than some of the stuff we've got now. But generating it can be automated just as easily, I scripted Photoshop to make these in a few minutes.
Here's an example
Can anyone tell me how to set my sig on Slashdot?
The spammers can just choose a random option until they get in. All that will do is slow them down a bit.
Try taking a picture of the CAPTCHA with your phone using the google goggles app. It works... remarkably well!
Comment removed based on user account deletion
Think of a blind person using a image captcha, ever tried understanding the audio versions!?
Best alternative... http://textcaptcha.com/
(actually, the audio one on here is not bad)
The problem is, because they're serving up words that a computer has failed to recognise as part of their OCR project, those same words are often impossible for humans to identify also (maybe they're smudged on the original source for instance) - this does result in some incredibly difficult words to read. According to the powerpoint, you only have to get one word right, I tried this and sometimes it worked, other times it gave me an incorrect result - I think the truth is probably more like (and I'm sure I read once this is how it works) they serve one recognised word and one unrecognised word - the requirement for success is only getting the recognised word right, they just compile the results of the unrecognised word to advance their OCR projects. Usually the recognised word is more readable (because we know it at least started out readable whereas we can't make the same assumption for the unrecognised word), so in the majority of cases so long as you type the word you can read and then make a best guess at the other you will still successfully solve the captcha. Of course, it might still be easier to hit refresh a few times until you get a more readable pair.
Mod parent down.
RTFA.
Note: the PowerPoint presentation linked opens fine in OpenOffice, and the video speaks for itself.
That's OpenOffice.ORG for you and no IT DOES NOT OPEN FINE. Text is bleeding through all over the slides. And there's no video, just an "swf" file that can't be opened.
If you get one of the answers right, and it's not the known, then you're still stuck, though. So its success rate is closer to 18%: it identifies one word correctly 35% of the time, and on 50% of those occasions, it's the known word.
No kidding!!! What do you say at this point?
Yes that's how it works, and everyone already knows that. I'm not even talking about recaptcha though, I'm talking about the impossible captchas you get when failing too many times to log into a Google account.
Here's the code:
There's an app spreading around and posting itself around /b/. The interesting thing about it is that the app presents itself as an image requesting itself to be pasted to mspaint and saved as *.hta and ran, then starts posting itself again. Somehow the code survives image compression intact.
The recaptcha breaking code is: // CAPTCHA
var threadurl = "http://boards.4chan.org/" + dir[board] + "/";
if (thread != "") threadurl += "res/" ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ + thread;
get("http://www.google.com/recaptcha/api/challenge?k=6Ldp2bsSAAAAAAJ5uyx_lx34lJeEpTLVkP5k04qc", 1);
var challenge1 = request.responseText.match(/challenge : '([^']+)'/)[1];
get("http://www.google.com/recaptcha/api/reload?c=" + challenge1 + "&k=6Ldp2bsSAAAAAAJ5uyx_lx34lJeEpTLVkP5k04qc&reason=a&type=audio&lang=en&new_audio_default=1", 1);
var challenge2 = request.responseText.match(/finish_reload\('([^']+)'/)[1];
var nwords = 10 + Math.floor(3*Math.random());
response = "";
for (var i = 0; i 0) response += " ";
response += randomchoice(wordlist);
}
There's a quite large random list of common words in english-
I always tought of a similar attack since I tried once the sound re-captcha and couldn't understand a thing on the audio and still was granted access.
I guess that "accuracy" is calculated by how many letters were OCRd correctly. Sorry, that measure might make sense for playing hangman, but not for solving a captcha. It's pass or fail.
According to their Powerpoint,
So taking into account that you only need to recognize one word correctly but don't know which one, the bot would have to try about 30 times before getting it right. So it would get banned all the time.