ReCAPTCHA.net Now Vulnerable to Algorithmic Attack
n3ond4x writes "reCAPTCHA.net algorithms have been developed to solve the current CAPTCHA at an efficacy of 30%. The algorithms were disclosed at DEFCON 18 over the weekend and have since been made available online. Also available is a video demonstration of random reCAPTCHA.net CAPTCHAs being subjected to the algorithms." There's probably an excellent Firefox plugin to render this page's color scheme more bearable. Note: the PowerPoint presentation linked opens fine in OpenOffice, and the video speaks for itself.
Obvious technical errors in summaries bother me more than spelling/grammar errors.
A Firefox extension is not the same thing as a plugin.
# cat
Damn, my RAM is full of llamas.
... They bleed nope wait just a shitty color scheme
Does the PowerPoint open fine in Keynote?
"There's probably an excellent Firefox plugin to render this page's color scheme more bearable."
just select all page, its better.
So what is the average human success rate? I think mine is only about 50%
The goggles, they do nothing!
Can these attack algorithms actually increase the accuracy of normal OCR programs?
But that just means more spambots, right?
I recently went to their homepage and looked _really_ hard for any statistics about which books are transcriped. I read their Science paper. Tried all sections.
Its all about the captcha part, and _nothing_ about the RE.
The way they state how it works ("We are using 100.000 unique words") sounds like they have given up on that part long ago and just recycle their old database again and again...
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
If not, then the captcha should only be visible when the mouse cursor is over it.
The key to a successful captcha is to make it accessible only by a user sitting in front of the screen.
It looks like that tool is better at deciphering the captchas than I am.
I'm watching the video, and the end result is "b:1/78 1.28% s:27/78 34.62%" indicating that out of 78 tests of two words per test it got a single word right 35% of the time, and both words right only once or 1% of the time.
Since both words need to be correct "solve the current CAPTCHA at an efficacy of 1%" would be closer to the truth.
It isn't sufficient to get 30% of the characters right. "im bailiwick" is recognized as "iffy ballboy" and that result gets a 32.73% rating. Doesn't look broken to me.
Now 30% of the captchas, that would be something.
No plugin needed:
View->Use Style->None
That is what it looks like in Seamonkey, Firefox will be similar. This more or less always works.
--frank[at]unternet.org
Should I run the DEFCON presenter's giant SWF or not?
o_O
Maw! Fire up the karma burner!
Why would anyone want to do this? It's like attacking the UN peace keeping troops or the Red Cross. reCAPTCHA is doing good work, digitizing scanned printed books so that the the text can be made available for online searching. Breaking reCAPTCHA is like defecating in the village well, ensuring that everyone suffers. No one benefits from reCAPTCHA being broken. No one.
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
There's probably an excellent Firefox plugin to render this page's color scheme more bearable.
I like using a Readability bookmarklet in my bookmarks bar: Readability - An Arc90 Lab Experiment
GETCHA
Try hitting ctrl-a.
And that, timothy, is the difference between a dork and a geek. You failed the Twit Filter at reCAPTCHA.
There is ZERO reason to use worthless tests like these as opposed to using real identification. That is instead of using computer generated difficult test, use actual pictures of actual 'difficult text' that an OCR agent failed to identify. Each person is given one alread tested sample and one unknown sample. If you get the already tested sample, then your answer is accepted as 'probable' correct for the unknown sample. Three matching probable correct = confirmed as correct, and move the unknown sample to the "already tested" section
There is more than enough written and audio samples that the world would love to see OCR'ed. We don't have to generate fake ones.
excitingthingstodo.blogspot.com
Anybody that pays attention to 4chan recently knows they had to implement captcha due to a massive spamflood of infected morons. recaptcha got busted thanks to someone in /g/ who leaked the vulnerability in the sound system for reCAPTCHA, and the whole site was again inundated with spam, though not to the degree as the original spam attack.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
When it is claimed to be 30% accurate, I'd expect some 30% of all captchas being correcly guessed. Watching the video, I noticed the algorithm gives itself 30-40% scores for getting just one of the two words right or sometimes even for getting the right length and a few correct letters. Didn't watch it to the end, but in the few minutes I watched, ZERO entire captcha's were solved. So that's ZERO% acurate in my book. For instance, actual captcha text "ware readiness", guessed captcha "votarry rehabbed", reported accuracy 38.24%... how the hell is that over 38% accurate? If you had that level of accuracy when trying to get past a captcha (which is pretty much the definition of it being vulnerable, right?), you wouldn't get past a single captcha. it's 30% accurate if it correcly guessed about 3 out of every 10 captcha's, not if it fails every single captcha.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Talking about 4chan, there's currently an hilarious thread about impossible captcha's: http://boards.4chan.org/r9k/res/10509296 (note, it is of course 4chan, be careful there, although this is r9k, not the worst of boards.. )
Maybe this hack can be used to improve book scanning.
where's the /. story on wikileaks "insurance" file? come on come on......
If reCAPTCHA's too easily breakable, then Bad Guys will figure out how, and will start exploiting sites that use reCAPTCHA for protection.
So we need to know how vulnerable it is, and the reCAPTCHA folks need to figure out how to fix it. It's an arms race, always has been, probably always will be.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
since thats about the accuracy of a human
I think you underestimate just how much I just dont care.
Seeing this article gave me an idea to come up with a new human verification process. I created a C# program in about an hour that loads images from Google images based on searching for 3 of 2000+ nouns. It shows 3 examples of each noun and asks the user to pick the correct noun from a list of 6. This program is just a proof of concept of course. Could this become useful? (Binary and source code included.)
http://enigmadream.com/misc/HumanVerification.zip
Then we can just put reCAPTCHA on all pages being used for spam, and get transcription services for free.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
reCAPTCHA is already on the road to beating this. When your images are on the verge of being discovered algorithmically, use Hebrew.
Bookmarklets for Zapping Annoyances
https://www.squarefree.com/bookmarklets/zap.html
Try the "zap colors" bookmarklet. There are a few other useful bookmarklets on that page too.
When OCR gets so good that recaptcha becomes pointless, my idea for the next step of harder-for-AI captchas is to stop using line art and start using gradients. That is, currently, they use text, which is line art, and then warp it, chop it up, and run miscellaneous clutter through it. It's getting harder and harder for people to read, and machines are still catching up.
I propose that if you start with a photograph, make a selection that's block text, feather the edges, than shift the colors in the selection (Hue, saturation, inversion, remapping, whatever) that it's going to be easier for humans and harder for computers than some of the stuff we've got now. But generating it can be automated just as easily, I scripted Photoshop to make these in a few minutes.
Here's an example
Can anyone tell me how to set my sig on Slashdot?
The spammers can just choose a random option until they get in. All that will do is slow them down a bit.
Try taking a picture of the CAPTCHA with your phone using the google goggles app. It works... remarkably well!
Comment removed based on user account deletion
Think of a blind person using a image captcha, ever tried understanding the audio versions!?
Best alternative... http://textcaptcha.com/
(actually, the audio one on here is not bad)
Mod parent down.
RTFA.
Note: the PowerPoint presentation linked opens fine in OpenOffice, and the video speaks for itself.
That's OpenOffice.ORG for you and no IT DOES NOT OPEN FINE. Text is bleeding through all over the slides. And there's no video, just an "swf" file that can't be opened.
If you get one of the answers right, and it's not the known, then you're still stuck, though. So its success rate is closer to 18%: it identifies one word correctly 35% of the time, and on 50% of those occasions, it's the known word.
No kidding!!! What do you say at this point?
Here's the code:
There's an app spreading around and posting itself around /b/. The interesting thing about it is that the app presents itself as an image requesting itself to be pasted to mspaint and saved as *.hta and ran, then starts posting itself again. Somehow the code survives image compression intact.
The recaptcha breaking code is: // CAPTCHA
var threadurl = "http://boards.4chan.org/" + dir[board] + "/";
if (thread != "") threadurl += "res/" ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ + thread;
get("http://www.google.com/recaptcha/api/challenge?k=6Ldp2bsSAAAAAAJ5uyx_lx34lJeEpTLVkP5k04qc", 1);
var challenge1 = request.responseText.match(/challenge : '([^']+)'/)[1];
get("http://www.google.com/recaptcha/api/reload?c=" + challenge1 + "&k=6Ldp2bsSAAAAAAJ5uyx_lx34lJeEpTLVkP5k04qc&reason=a&type=audio&lang=en&new_audio_default=1", 1);
var challenge2 = request.responseText.match(/finish_reload\('([^']+)'/)[1];
var nwords = 10 + Math.floor(3*Math.random());
response = "";
for (var i = 0; i 0) response += " ";
response += randomchoice(wordlist);
}
There's a quite large random list of common words in english-
I always tought of a similar attack since I tried once the sound re-captcha and couldn't understand a thing on the audio and still was granted access.
I guess that "accuracy" is calculated by how many letters were OCRd correctly. Sorry, that measure might make sense for playing hangman, but not for solving a captcha. It's pass or fail.
According to their Powerpoint,
So taking into account that you only need to recognize one word correctly but don't know which one, the bot would have to try about 30 times before getting it right. So it would get banned all the time.