Yahoo CAPTCHA Hacked
Hell Yeah! reminds us of a 2-week-old development that somehow escaped notice here. A team of Russian hackers has found a way to decipher a Yahoo CAPTCHA, thought to be one of the most difficult, with 35% accuracy. The Russian group's notice, posted by one "John Wane," is dated January 16. This site hosts a rapidshare link to what looks to be demonstration software for Windows, and quotes the Russian researchers: "It's not necessary to achieve high degree of accuracy when designing automated recognition software. The accuracy of 15% is enough when attacker is able to run 100,000 tries per day, taking into the consideration the price of not automated recognition — one cent per one CAPTCHA."
by having a teenage boy do it in exchange for letting him see porn.
They're used to seeing Cyrillic, the captcha has got to be easier to read!
A few months ago Yahoo introduced a CAPTCHA to prevent bots entering their chatrooms. Within a few days every room on yahoo was filled with bots once more, and still are to this day.
Given the current situation of the chat rooms on yahoo, it comes as no suprise at all that the other parts of the Yahoo system are inadequately protected from bots either.
Natural language processing etc:
To register, answer these questions and click the button on the right
What colour are buses in London?
What is three times three?
[Red] [Green] [Blue]
I've found Yahoo's CAPTCHA to be really annoying. I probably get it wrong about 20% of the time because the picture is so distorted (and I've been surprised that I got it right a lot of the time). I even considered writing them an email complaining about it, but then I realized they probably don't give a crap.
33% of Yahoo capitchas isn't really impressive - you still get a large quantity of negative hits, and unless you have an array of IP addresses (most people don't), there will still be a large quantity of addresses registered from a given IP. Also, a large quantity of negatives would cast doubt on any positive matches from the same IP.
Also, Yahoo captchas aren't that "hard" - they are black text from known font pools on a white background that get slightly warped and have black lines drawn on some characters. This is hardly strong since it doesn't hit all letters within the word (which is done by reCAPTCHA) or use a large font-pool variety.
Even the Slashdot Captcha is harder - it hits the whole image and uses different fonts within the word.
That reminds me of the age check for Leisure Suit Larry back in the day... Who knew that the desire of a horny teen to see pixellated boobs would lead to history research?
What doesn't kill you only delays the inevitable
without a chord is fine... ...it's when you're missing a cord that you need to worry
To register, answer these questions and click the button on the right
What colour are buses in London?
What is three times three?
[Red] [Green] [Blue]
Yes, those are undoubtedly hard questions for a computer. How, exactly, do you plan to generate billions of these questions? For a CAPTCHA to work, it must still be hard even if the generation algorithm is public knowledge.
The character outlines are nicely distinct, which means that even basic OCR software should be able to break the CAPTCHA. Since it's so easy to break, you want to hide it from any bots that come by: remove all references to "captcha" from the page source, and you might want to move the HTML for the image away from the HTML for the entry box.
"They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
What about introducing spelling and grammatical errors? This would be difficult for a computer to interpret, but doable for a human.
Paul Anderson
"I drank WHAT?!" -- Socrates
I'm impressed. That's better than I can do. Some CAPTCHAs take me five or six tries to get right.
-William Brendel
Did anyone notice that the image recognition code is imported from a binary DLL? I was under the impression that the Russian hackers would provide the source for the recognition code as well. But then, the people who released this are only interested in generating as much spam. Why should you trust them? You would be foolish enough to _not_ execute your test program that imports this dll in a vmware instance instead of your actual machine. Anybody done a comprehensive strace to determine sockets/descriptors opened by using this dll?
Hence all good modern captchas have moved away from character recognition captchas (such as yours) to segmentation based captchas. You only need to read the wikipedia article on CAPTCHAs to see some examples: http://en.wikipedia.org/wiki/Captcha.
Yahoo!'s captcha has been hacked, perhaps not as well, in the past. I've seen open http proxies pounding away at Yahoo to the tune of 100,000 per hour and more. Hotmail's is broken, so are others. The real shame is that the Storm Worm controllers are being protected by a national government and law enforecement system.
So what's the answer?
I'm sure I don't know. I do know that the wild west theory of accepting any kind of behaviour isn't acceptable. I know that some minimum standard of what's allowed and what isn't is going to have to take place. Where these limits are placed is a thing for a global conversation, and there will be differances of opinion.
Is cracking a captcha acceptalbe? Is phishing and identity theft acceptable? Is fraud and uncontrolled spam acceptable? What limits, and on what actions?
I'm just not that smart. But I think we can agree on a few things. Let's start to find out what those things are... and acting in concert with other network operators to enforce those standards. Fail to meet them, and your network routing gets dropped...
Necessity is the plea for every infringement of human freedom. It is the argument of tyrants; it is the creed of slaves.
Segmentation and intersecting arcs can be difficult for automated attacks: http://portal.acm.org/citation.cfm?id=1054972.1055070
You know those annoying flash advertisement games (shoot the monkey for a free iPod)? Well, they could potentially be adapted for CAPTCHAs as well: http://cups.cs.cmu.edu/soups/2006/posters/misra-poster_abstract.pdf
Not really. After a couple of (thousand) runs through, the attacker would have a reasonably accurate database of the questions. They can then analyze the text to find the nearest match to one of the questions in its database.
That's true. I've found, however, that introducing custom spam blocking methods, such as this, no matter how easy to break, often does a better job at stopping spam bots than more robust publicly available methods. For a target as big as Yahoo, this probably won't work, but I've found on PHPbb for instance, instead of using any of the publicly available captchas, which are easily defeated by bots, creating a simple question of this sort does wonders for bot-blocking. Even if it's just one question. If your site isn't big enough to be specifically targeted by bot farmers, sometimes a simple solution is better than a more complex one that everybody else is using.
ZuluPad, the wiki notepad on crack
That has been my experience, too. I admin a small bb and was having horrible problems with spam sign ups. CAPTCHAs didn't slow the spammers down at all. I went to a simple question that will be easily known by all of my target audience but probably won't be known by someone half way around the world entering CAPTCHAs for a penny a piece and allowed any spelling that is even close. I haven't had any spammers sign up for a couple years now. That obviously won't work for a major target like YAHOO though.
I say that a lot of people are color blind.
Just put some hard to read perl code in there and ask the user to say what it does. If the answer is correct it's a bot, if the answer is wrong it's probably a human ;)