Defeating Captcha
An anonymous reader pointed us at PWNtcha, a package that breaks various on-line captcha algorithms. The site provides numerous examples of easy (Paypal, and an older version of Slashdot make the list) and hard Captcha. It also links various sources explaining why Captcha is a bad idea.
Entrepreneur : (noun), French for "unemployed"
here
Whew, I had never even heard of Captcha before...
A captcha is a type of challenge-response test used in computing to determine whether or not the user is human.
A while ago, I remember hearing about how some spammers whould post the Yahoo Mail (or other free email services) Captchas on the registration forms on pr0n sites. The pr0n registrants would have to fill out the Captcha, but this would then be used by the spammer to get around the Captcha without any fancy software.
captcha stops bots
pwntcha breaks captcha
slashdot cremates pwntcha
"Win treats sysadmins better than users. Mac treats users better than sysadmins. Linux treats everyone like sysadmins."
While it is an interesting project from a hobbyist and academic standpoint, I'm not really sure what practical value it holds (unless the intent is to sell a mature algorithm to spammers, which is not the case since the project is being published). This is nothing more than a personal scripting project - no new forray into new concepts of computer science or pattern recognition; no new breakthroughs of computer-based heuristics.
Rex is 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
Having a legally blind mother that uses the web, I wonder how captcha complies with the Americans With Disabilities Act (when used by American companies of course)?
Is it compatible with BLINUX? I think by definition it is not.
Perhaps I should ask, what alternate method of identification do sights employ to take into account blind users and the ADA?
The problem is that people are using robots to work in an autonomous manner to find ways around typical human limitations (we can only send several hundred emails a day, robots are not so limited). So people want to stop these "cheater" by making the user prove that they are a human rather than a robot.
Is this really a good thing, though? Even on a site like Slashdot, in a story about defeating bots, the very first comment in this story is posted by a bot. How ironic is that? What is accomplished by banning users who can't read these "captchas" (what a horrendous fake word)? Nothing, apparently, as the story says. It only serves to annoy legitimate users and does nothing to hamper illegitimate robots.
The solution is not this sort of halfway measure. The solution is to make it simply not worth the effort to be a nuisance on a discussion forum. I suppose that requires a glut of intelligent posters, but with the entire citizenry of the Internet available, that can't be so hard.
Jesus saved me from my past. He can save you as well.
It's a cheap and scaleable method to defeat such algorithms. There will always be enough humans willing to do this for very little reward (some free pics).
Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
Uh, that game doesn't work unless, say, bots stop Slashdot. Otherwise everyone just picks Slashdot and it's fifth grade all over again.
This is a good study of how hard it is to design secure systems. It's just like a non-cryptographer trying to create their own cipher, only in the visual processing world. Sadly, the article does not touch on non-visual captchas, which are alternatives for the blind. It would also be interesting to see what Jakob Nielsen might have to say on this technology from a usability perspective.
Of course, one of the primary bad things is that the concept of a captcha is patented, and the patent language is very broad. US Patent# 6,195,698
Also see the Wikipedia article for more information.
Well I'm glad someone is writing code to solve those "prove you aren't a script" images, because a lot of times I can't quite figure them out myself.
Chiefly among them is sometimes you can't tell what the fucking words are. Within the last few months on more than one occasion I simply could not read the letters because they were so distorted and the lines overlapped the letters too much. No fun redoing a web form over and over because you can't figure out what the hell the verification box says.
I can't imagine how people with difficulties cope with this.
If you wanna get rich, you know that payback is a bitch
And then again, maybe he isn't. It doesn't really matter which library he uses for image import, does it? I mean, the interesting part would be the data structures and algorithms used in the "reverse-mapping" from image data to text. It's doubtful that the rudimentary processing methods provided by ImageMagick (although often a god-send of convenience and compatibility) would help here.
Not that this would stop you from plugging some random open-source software package. Even though your plug will probably do more Good-For-The-World than the rest of the discussion in this thread combined, your motives are still strange to me.
Once all these new algorithms get integrated into OCR software... OCR software might just work.
I just saw a great flash-based Captcha designed to combat just this sort of attack. The test was composed of white text on a white background. Colored shapes of various sizes swirled in the background behind the text in a pseudo-random pattern, and the text was visible or obfuscated depending on whether there was a shape behind it at the moment. After watching it for a few minutes to see if there were any obvious flaws, I noticed that the entire phrase was never visible all at once.
A little patience was required, but I was able to verify in less than 10 seconds. Animation seems to be very useful for this kind of application.
Even Jesus hates listening to Creed.
> It doesn't really matter which library he
> uses for image import, does it?
I'd be interested in knowing what it is... but I may well be the only person on the planet that is interested.
> your motives are still strange to me
Most of the time I don't understand them myself!
The Army reading list
Having to wade through 60+ spam comments a day on a WordPress blog (with all the stock antispam options enabled) just sucked . . . and the blog didn't even get much traffic (PageRank of 4). I installed the AuthImage plugin and used it on its stock settings, and for awhile didn't get a single bit of spam. Then, magically, it started up again. It seems some industrious little script kiddies have written a crawler to massively bombard AuthImage-enabled blogs with words from the stock word list. I switched from the wordlist file to randomly-generated strings and increased the size of the image for readability, and I never had another piece of comment spam in that blog again.
As for blind folks, I suppose every webmaster has to make that decision based on their target demographic, but I've seen a few text-only captchas that work well enough ("What color is an orange?") but will inevitably have the same limitation as the AuthImage word list above.
Captchas are next to useless and for the visually impaired very frustrating. One more of a example of a technology which annoys everyone and yet doesn't really stop the determined miscreant. <cough>airport shoe inspections</cough>
-- "Most people prefer a popular myth to an unpopular truth"
As with the Turing test, the entire purpose of a captcha is to distinguish humans from machines. As captcha-defeaters improve, the captchas will need to become more and more sophisticated and require more and more human or human-like intelligence to process. This arms race will culminate in a Turing test-like approach for discerning natural intelligences from artificial ones.
The ultimate irony may occur when the first human-intelligent computer is created by a spammer for the purpose of assaulting our collective intelligences with their commerical drivel. Given the increasing value of online commerce and Google page ranking, there's probably more money in AI for captchas than AI for academic research.
But before captchas get that sophisticated, the system will become self-defeating as the number of real humans defeated by captchas exceeds the number of AIs repelled by them.
Two wrongs don't make a right, but three lefts do.
The main article refers to Inaccessibilyt of Visually-Oriented Anti-Robot Tests, which deserves a read and commentary.
Among the claims:
- captchas are inaccessbile to the blind - true
- a horde of human beings can decode the entire library over time - only true if the images are recycled, not if they are created on-demand or for one-time use.
It also discusses some of the side-effects of making access to real humans harder, or harder for a class of users such as the visually impaired. For example, I've seen sites that say "If you cannot read this, call this phone number for access." Too bad for you if you don't have a phone.
As alternatives, it offers
- logic puzzles
- sound output
- credit-card validation
- live operators
- limited-use of unverified accounts, such as throttling for email
- behavior and heuristic analysis
- already-established credentials, such as single-sign-on systems or public-key-based systems
- biometrics
The article briefly discusses the pros and cons of each.
I rate its conclusion
"Visual verification alone is known to create problems with users. It is imperative that site designers take the needs of users with disabilities into account, and it is likewise hoped that one or more of these potential solutions can make that process easier."
as: insightful +5 obvious -1.
The article as a whole gets an "informative +5."
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
The W3C proposed in 2003 a number of Solutions for the Inaccessibility of Visually-Oriented Anti-Robot Tests, including logic puzzles, audio captchas, credit card validation, etc. It is interesting that they also show how a federated identity system can help users with disabilities.
http://www.gh-sts.com/captcha.txt
This is what slashdot's previous iteration of a captcha looked like in an in-memory associative array after the intersecting lines had been removed and a de-skewing algorithm applied. There was actually a version of the code after that which properly picked out where the lines actually intersected the letters and didn't erase the intersecting section to create those gaps.
Before they switched to the newest CAPTCHA system, I was breaking their CAPTCHAs with a modified SS.pl script with almost 100% accuracy (it had a little trouble properly splitting up the text when a j or other similar character wrapped partially under another letter).
Of course, the new CAPTCHAs are much harder. I can't even read some of them myself, but the point is that breaking CAPTCHA that people can easily read usually isn't really that hard.
Yes, I used ImageMagick's Perlmagick library.
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
In the table for "Cwazymail", I was trying to figure out what the pictures were. One's an elephant, one's an owl, and one is a man pulling apart his anus. Great!
Nice, the site owner probably added it when he added the notice to slashdot readers.
all captchas should timeout after, oh, say 10 minutes?
In all honesty, do you really think you're going to get that many people to regularly visit a pr0n site? The sector is extreemly cut-throat and vastly bigger than the market can justifiably support (hence why many pr0n sites close each month).
The only way to get to the top of the engines in the first few months would be to use PPC advertising (costs money). After that, even if you get to the top of the SERPS by using nefarious means, you'll need to give people a viable reason to sign-up to your service, i.e. you'll need content which costs money (unless you want to steal it, at which point you can probably expect some real mean types to track you down and kill you, them porn businesspeople are crazy).
I am NaN
I'm not sure what Hashcash does, but it sounds like I've already got a great idea for a counter-program: Hashcache.
I Browse at +4 Flamebait
Open Source Sysadmin
Editors -
Please don't link to the goatse man without at least some warning.
Thanks.
that would be a draft beer yes?
The world according to SComps
Thanks for linking the Goatse Man image in the article. Oh how I've missed being tricked into viewing thee.
The link is not work safe.
This post contains benzene, nitrosamines, formaldehyde and hydrogen cyanide.
I'm from Holland. Isn't that veird?
Gamingmuseum.com: Give your 3D accelerator a rest.
THIS IS ONE GIANT TROLL ARTICLE! LOL!
About 3/4ths down the page there is a goatse picture, and the caption at the top thanks the GNAA. Wake up slashdot.
This is my sig. There are many like it, but this one is mine.
I thought about this problem on a recent trip to the urinal and here's what I got.
1) Get (or construct) a large database of nouns of well-known objects (car, orange, bottle, phone, pencil, brick, cup, etc. etc.)
2) Retrieve image references from a (safesearch-enabled) Google image search for a random noun from your database. Pick randomly from the result set.
3) Present images to the user. "These are pictures of a..."
4) My next strategy was to figure out a combinatorial way to increase the number of possible replies so that an attacker couldn't simply create a database of knowns (such as a hash database of images)
What do you smart fellers think? other than google being pissed for scraping their site
This article is a fraud. No source is presented, and goatse.cx is displayed in the examples. This whole thing was contrived just to get goatse.cx in a legitimate front page post. Best troll in years.
this sig limit is too small to put anything good h
1... is... not... a... prime...
For info on why, see the mathworld prime number entry.
Interestingly, it says that, at one time, 1 was considered prime and 2 was not. Pretty amazing, considering importance of the Fundamental Theorem of Arithmetic.
"It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward