Defeating Captcha
An anonymous reader pointed us at PWNtcha, a package that breaks various on-line captcha algorithms. The site provides numerous examples of easy (Paypal, and an older version of Slashdot make the list) and hard Captcha. It also links various sources explaining why Captcha is a bad idea.
A while ago, I remember hearing about how some spammers whould post the Yahoo Mail (or other free email services) Captchas on the registration forms on pr0n sites. The pr0n registrants would have to fill out the Captcha, but this would then be used by the spammer to get around the Captcha without any fancy software.
I doubt it. I'm willing to give him the benefit of the doubt and assume he's just trying to make sure what he's doing is responsible by releassing the code. And what he's doing at this site is mainly pointing out the weaknesses in some common captchas.
Helium balloons want to be free.
Interesting that an article talking about (among other things) why Captcha is a bad idea is submitted by an anonymous reader, who is forced to validate their human status every time they attempt to post.
(And yes, I'm aware that the submitter may be a member who has merely chosen to submit the story anonymously, but where would the joke be then?)
____
~ |rip/\/\aster /\/\onkey
While it is an interesting project from a hobbyist and academic standpoint, I'm not really sure what practical value it holds (unless the intent is to sell a mature algorithm to spammers, which is not the case since the project is being published). This is nothing more than a personal scripting project - no new forray into new concepts of computer science or pattern recognition; no new breakthroughs of computer-based heuristics.
Rex is 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
It's a cheap and scaleable method to defeat such algorithms. There will always be enough humans willing to do this for very little reward (some free pics).
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
Audio captchas?
"Remember, there never were pineapple-almond cookies here."
Instead of an image based Turing test like Captcha, I just have the last question on a log in screen or form be a randomly selected super easy question. For example, "Spell the number 7" or "What is the next logical number in the sequence 1, 3, 5, 7, ...?
Check it out here: http://www.donnyspi.com/contact.php
I just saw a great flash-based Captcha designed to combat just this sort of attack. The test was composed of white text on a white background. Colored shapes of various sizes swirled in the background behind the text in a pseudo-random pattern, and the text was visible or obfuscated depending on whether there was a shape behind it at the moment. After watching it for a few minutes to see if there were any obvious flaws, I noticed that the entire phrase was never visible all at once.
A little patience was required, but I was able to verify in less than 10 seconds. Animation seems to be very useful for this kind of application.
Even Jesus hates listening to Creed.
Livejournal has a "If you can't read the text, type "AUDIO" and take a sound test instead." thing, and other sites have other ways around the visual test.
Unfortunately, not all sites have non-visual humanity tests.
Having to wade through 60+ spam comments a day on a WordPress blog (with all the stock antispam options enabled) just sucked . . . and the blog didn't even get much traffic (PageRank of 4). I installed the AuthImage plugin and used it on its stock settings, and for awhile didn't get a single bit of spam. Then, magically, it started up again. It seems some industrious little script kiddies have written a crawler to massively bombard AuthImage-enabled blogs with words from the stock word list. I switched from the wordlist file to randomly-generated strings and increased the size of the image for readability, and I never had another piece of comment spam in that blog again.
As for blind folks, I suppose every webmaster has to make that decision based on their target demographic, but I've seen a few text-only captchas that work well enough ("What color is an orange?") but will inevitably have the same limitation as the AuthImage word list above.
Captchas are next to useless and for the visually impaired very frustrating. One more of a example of a technology which annoys everyone and yet doesn't really stop the determined miscreant. <cough>airport shoe inspections</cough>
-- "Most people prefer a popular myth to an unpopular truth"
This is a real problem for visualy impaired and not only blinds.
Distored fonts, noisy lines, random dots and low contrast used in such pictures, makes it at least very hard or impossible to read.
Accessibility recommandations and W3C standards would require such important content, to be backuped with alternate formats like an audio record.
I believe these rules should apply not only to government sites.
But, I know no site, providing alternativ audio captcha for now. My husband and I, require a tier person to read most captchas actualy.
Léa Gris
The sites with really good captcha's should run anti-captcha's... to filter out the *reallly* hard to read ones. =P
But there are still a lot of ways that haven't been used yet to make the image hard to read for the computer but less hard than the expreme distortions, such as overlapping letters and words. For example, if say only 25% of a word is covered up by another word on top of it, it should still be very easy for a normal person to read both words. Or use different colors and transparency. Or chain capchas together, for example one captcha that says "green" or "small" and another full of letters of various color/size/whatever. Then ask the user to enter the right code (ie, so they have to use reasoning instead of just pattern recognition).
all captchas should timeout after, oh, say 10 minutes?
In all honesty, do you really think you're going to get that many people to regularly visit a pr0n site? The sector is extreemly cut-throat and vastly bigger than the market can justifiably support (hence why many pr0n sites close each month).
The only way to get to the top of the engines in the first few months would be to use PPC advertising (costs money). After that, even if you get to the top of the SERPS by using nefarious means, you'll need to give people a viable reason to sign-up to your service, i.e. you'll need content which costs money (unless you want to steal it, at which point you can probably expect some real mean types to track you down and kill you, them porn businesspeople are crazy).
I am NaN
The problem with blending images and so on is that blind people still cannot see them.
This slide demonstrates the problem beautifully, I think.
With regard to the simple questions, that is indeed what I do, some simple trivia, and some basic maths, and the library is called SimpleQuestions.
"What colour is the sky?" is actually one of the questions, and the maths question do indeed vary in form, from expression to natural language.
The problem with the drawing requirement is that you're now blocking people who cannot draw.
"A goldfish was his muse, eternally amused"
God bless your monks.
Literally.
Small potatoes make the steak look bigger.
No fun redoing a web form over and over because you can't figure out what the hell the verification box says.
Yahoo! does this and it's asinine. I hit a captcha today that clearly had a ` character in it, but apparently it was a 'confuser' line, not a `. The rules for what character sets are valid are not given, so you don't know if punctuation is valid or not. Apparently it's invalid. How about case? A c and a C are pretty hard to discriminate when they're rendered along a Bezier curve.
Clearing the web form is no hinderance at all to a robot, but makes life difficult for humans. There's no excuse for pissing off users unnecessarily.
The Yahoo! web team is going down hill. The Groups code used to be able to register e-mail addresses with a '+' in it, but that broke recently. You can't get an e-mail into their bug support system. I've tried. I've argued with the helpbots. I lost.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
I thought about this problem on a recent trip to the urinal and here's what I got.
1) Get (or construct) a large database of nouns of well-known objects (car, orange, bottle, phone, pencil, brick, cup, etc. etc.)
2) Retrieve image references from a (safesearch-enabled) Google image search for a random noun from your database. Pick randomly from the result set.
3) Present images to the user. "These are pictures of a..."
4) My next strategy was to figure out a combinatorial way to increase the number of possible replies so that an attacker couldn't simply create a database of knowns (such as a hash database of images)
What do you smart fellers think? other than google being pissed for scraping their site