Yahoo CAPTCHA Hacked
Hell Yeah! reminds us of a 2-week-old development that somehow escaped notice here. A team of Russian hackers has found a way to decipher a Yahoo CAPTCHA, thought to be one of the most difficult, with 35% accuracy. The Russian group's notice, posted by one "John Wane," is dated January 16. This site hosts a rapidshare link to what looks to be demonstration software for Windows, and quotes the Russian researchers: "It's not necessary to achieve high degree of accuracy when designing automated recognition software. The accuracy of 15% is enough when attacker is able to run 100,000 tries per day, taking into the consideration the price of not automated recognition — one cent per one CAPTCHA."
33% of Yahoo capitchas isn't really impressive - you still get a large quantity of negative hits, and unless you have an array of IP addresses (most people don't), there will still be a large quantity of addresses registered from a given IP. Also, a large quantity of negatives would cast doubt on any positive matches from the same IP.
Also, Yahoo captchas aren't that "hard" - they are black text from known font pools on a white background that get slightly warped and have black lines drawn on some characters. This is hardly strong since it doesn't hit all letters within the word (which is done by reCAPTCHA) or use a large font-pool variety.
Even the Slashdot Captcha is harder - it hits the whole image and uses different fonts within the word.
The letters are too far away from each other - makes it easy to separate them for proccessing. In fact, the only challenging aspect for OCRs in your captcha is the letter rotation/skewing. However, I don't think anyone will bother to write a captcha OCR for your site, unless it's Yahoo sized.
The character outlines are nicely distinct, which means that even basic OCR software should be able to break the CAPTCHA. Since it's so easy to break, you want to hide it from any bots that come by: remove all references to "captcha" from the page source, and you might want to move the HTML for the image away from the HTML for the entry box.
"They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
I'm impressed. That's better than I can do. Some CAPTCHAs take me five or six tries to get right.
-William Brendel
Hence all good modern captchas have moved away from character recognition captchas (such as yours) to segmentation based captchas. You only need to read the wikipedia article on CAPTCHAs to see some examples: http://en.wikipedia.org/wiki/Captcha.
http://news.bbc.co.uk/2/hi/technology/7067962.stm
Here is a link to a BBC article about something like that. It's a Windows program that rewards typing in captchas by showing a woman that takes off progressively more and more clothes.
OK here is Cory Doctorow discussing it.
If you've ever tried the Yahoo chatrooms, you know they're overrun by spam bots. The problem wasn't with the captcha, it was that it challenged users only once and at the beginning of the session. So as long as your spam bot didn't appear idle or lose connection, it could stay on indefinitely. Now with the captcha broken, spammers don't even have to do captchas manually.
The topic of "are you human" was covered on Security Now a while back and someone brought up a great point. Tools to deter bots also makes it difficult for accessibility software since they use many of the same concepts as bots. Even audio captchas are no longer a strong bot deterrence.
With advocacy groups like the National Federation of the Blind suing Target for their inaccessible website it'll be a very tough challenge to develop new good captchas while maintaining accessibility to everyone.
On another note, could an organization representing the mathematically challenged sue companies using math captchas?
You want fun, go home and buy a monkey!
I have a little site, only really intended to share stuff with family and friends, served with custom scripts. I couldn't believe it when it was targetted by spammers. I could even see the test posts they made, checking to see if html was allowed etc., before unleashing the the bot to post dozens of links a day.
Worst BBC News Stories
I don't know exactly how large porn images are, never having looked at them, but if you guess a round number of 0.1 MB per picture, it's only about $0.0001, or 0.01 cent per captcha. I suppose it's better than nothing, but it's not yet very cost-prohibitive.
As these CAPTCHAs get more complicated, it becomes more difficult for non-speakers of the language to interpret them.
The saddest poem