Web Users Angered by Anti-Spam 'Captcha'
Carl Bialik from WSJ writes "Captchas -- the jumbles of letters that users must type to gain access to some websites -- are a growing irritation, the Wall Street Journal reports. But programmers hope to make new variations that are both easier to decipher and harder to crack. From the article: 'Some captchas have been solved with more than 90% accuracy by scientists specializing in computer vision research at the University of California, Berkeley, and elsewhere. Hobbyists also regularly write code to solve captchas on commercial sites with a high degree of accuracy. ... Henry Baird, a professor of computer science at Lehigh University who studies PC users' responses to the codes, has been working with colleagues to develop new generations of captchas that are designed to be easier on humans but baffling for computers.'"
I couldn't read the article. They wanted me to type CapTcha. Or was it Cap7cha? Oh well?
And All I Ask is a Tall Ship And a Star to Steer Her By
HOT GRITS
I prefer kitten auth.
liqbase
I had heard once of a very cunning strategy around captchas. I'm not sure if this is true but there is a story of a p0rn site making large sums of cash by selling key sets to the images. Certain sites would not dynamically generate images but instead rely on sets of images with protected keys as a captcha.
In order to use the p0rn site he ran, you had to either pay money or spend time identifying captchas. He would then store them in a database and match it up with a checksum of the image. When he had completed a site's captcha key set, he would sell these lookup tables to anyone with money.
All they then had to do was write their program to do a checksum of the image (or the image itself if he had stored it) and then plug the word from the database into the page for verification.
With the introduction of splashers that spatter the statically stored images with lines or dots, the image is stored and a something like an edit distance is applied to it to find the closest match. Once that is accomplished, it references the keyword out of the database. You turn up the splasher and you risk the user not being able to figure out the word.
It seems that evil always finds a way. This is why captchas should always be dynamically generated on the fly from a very large dictionary! Check out Securimage for PHP.
My work here is dung.
I have a patent on it, of course...
Running Windows^H^H^H^H^H^H^H OSX and Linux in the home. (I don't have time for Solitaire any more.)
"Some captchas have been solved with more than 90% accuracy by scientists specializing in computer vision research at the University of California, Berkeley, and elsewhere."
Hell, that's better than my average. They are getting so cryptic, it seems I get them wrong about 25% of the time these days.
-josh
..a script might do better.
Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
http://sam.zoy.org/pwntcha/
Just throwing this out, but maybe there should be a very basic question asked instead? Since these already presume literacy, maybe something like:
Which of these is a number: A 2 R P?
Seems that regardless of what they come up with there's going to be some part of the population that won't figure it out anyway, and if the whole point is to confuse auto-registerers, then I'd think it'd be harder for those to account for every possible question and answer set.
(Yea, it's in TFA, but mentioned like an aside...)
The captcha concept breaks down if the user can't see the image, either through the limitations of their browser (links) or the limitations of their eyes. A US government site would have a hard time justifying captcha in light of their legal and moral responsibilities to the disabled citizenry.
[
There's a crapflooder here on the trolltalk SID who has proven quite nicely that captchas don't, and can't, work.
Boycott everything - they're all trying to fuck you one way or another
...unless you are blind. Some sites have alternate audio versions for the vision-impaired, but it's still a problem.
And even if you aren't blind, I've run into many a captcha that I couldn't decipher. Poorly designed sites may delete the entire content of your post if you fail the captcha, but I guess that's a design issue for another topic.
Something got me thinking about captchas ... what was it? ... oh yes it was that article on automated Spamcop submissions the other day.
No wonder they're a growing irritation. But websites need to know at least something about you. This site is letting me post now because: 1) I'm not going through a proxy 2) I've enabled cookies 3) I have a login. Now most sites I visit, I can't tick any of those boxes. And yes I'll venture over to bugmenot occasionally as well.
So sites need them. Especially for those functions where they're at risk of DDoSing someone or some such nefarious misuse.
Captchas are a great anti-bot measure, but they're also just maddening sometimes. Ticketmaster's are the worst. Sometimes it takes me 3 or 4 tries to figure out what the hell it says. I'm technologically savvy, and have good vision. This is one of those things that I can't imagine my mother trying to figure out. There has to be a better way.
- There are things called 'Captchas'
- People don't like them
- Computers are getting better at cracking them
- Some boffins are trying to make new ones which people like and computers don't
Really, that's all there is.init 11 - for when you need that edge.
As usual, the problem is approached from the wrong direction. When the dam bursts and the floodwaters cover the town, it's a waste of time to develop bigger and better waders. The correct thing to do is repair the dam. So instead of developing ever more elaborate ways to handle the spam flood, just shoot spammers. Put a cash bounty on them, dead or... dead. Problem quickly solved.
we will end no whine before its time
Most of the proposed solutions rely heavily on command of the english language.
granted, captchas still rely on you knowing the western alphabet and numbers.
Exercise caution when modding this message up: the author acts like a jerk when his karma is excellent.
I like the example images from TFA. The only one I have a difficult time making out is the Hotmail one. Scattering things around the captcha that closely resemble letters only causes confusion. For instance, should you include the character that looks like an 'L' under the '8'? And is that 'T' sitting on top of a slightly distorted 'J'?
This guy's the limit!
Not sure if cryptic is the right word
Sig cannot be found.
Just as the point of DRM isn't to be completely bullet proof (there's always the analog hole), the point of a captcha is to be enough of a nuisance that someone doesn't spend the time to crack it. Obviously, for a site like Yahoo and it's zillions of sites, it pays to spend time breaking the captcha. But for your average site, the captcha just has to be "good enough" such that someone won't bother to write a crack to spam a small fish.
Sometimes it's best to just let stupid people be stupid.
If I wanted to be really sadistic, I could instead present site readers with a sentence, in which they have to fill in either "their," "there," or "they're."
Slashdot Burying Stories About Slashdot Media Owned
Are you listening slashdot?
I am trolling
is Captcha's that people can't read either. digg.com is especially bad for that: it usually takes me 3 or 4 tries to get it "right", even when it looks obvious. Slashdot's are harder to read but I generally always get them the first time.
This is mentioned in the article, as are audio captchas.
Don't thank God, thank a doctor!
there could be words 345y or |\|07 50 345y
i bet megatokyo fans would pass it with 100% accuracy!
* lon3st4r *
Instead of making users type words they see, have them describe a picture.
Example: What animal is in this picture?
If it's a picture of a baboon you could have the script accept a number of responses, like 'baboon', 'a baboon', 'monkey', 'silly willy monkey', etc. to make it easier on humans.
In some cases, you could have say, a silhouette of a cat, in another, a picture of a cat's face. I dare someone to try to write a script that can make that distinction.
Let's see if I can get this message submitted (btw, quite funny captcha -> cryptic ;) 90% accuracy is better than I can do. These frigging puzzles are discriminating against visually impaired.
One of the things that I'm watching in the error logs of SpamOrHam (web site where volunteers sort messages into spam and ham) is the error rate on the CAPTCHA used. Ignoring what appear to be automated attempts bruteforce the CAPTCHA I see an error rate of around 20% of 100,000s of CAPTCHA's.
That's amazingly high. 1 in 5 CAPTCHA's are incorrectly entered by humans doing their best to do the right thing.
No wonder people get mad at them.
John.
- There are tons of pictures of these things floating around
- they're easy to modify (blur, detour, cell-shade, rotate, mirror,
- Getting computers to guess the difference between a dog and cat, while feasable (don't care to fish the link to the program that does just that) is not easy and guessing that a spoon is a spoon (with reflexions in it), is not going to be easy either.
I thus wonder why they haven't implemented that...
My 0,02
One shall speak only if what one has to say is more beautiful than silence
Captchas are not hard to crack, now that someone has produced my favorite crack strategy. A "man in the middle" attack server hits pages with captcha challenges. That server advertises a "free porn" website, presenting to its human audience the captchas it hit. The porn seeking humans decode and enter the captchas, get the porn (or not), the server sends their entries to the original captcha page, and gets past them as often as humans seeking porn would. There's so many humans seeking porn that the middleman transactions happen in realtime, indistinguishable from direct human responses to the original captcha.
This is v1.0 of the Matrix, where human brains are harnessed to solve problems by a more powerful and wise, though less "intelligent" computer network.
--
make install -not war
Seriously -- think how the quality of users/poster would improve if we replaced captchas with some sort of basic test.
:-)
Maybe like the one they give as an entrance exam for the Marines:
The door is:
A) Open
B) Closed
C) Not enough information
Hey, as an ex-Army guy, I'm allowed to give those gyrenes a hard time
Interested in a Flash-based MAME front end? Visit mame.danzbb.com
Well, note that they're scientists specializing in that kind of stuff. And even they are getting just 90% accuracy.
A naive reader could misunderstand you and think that it's a program written by those scientists that gets 90%, but this is obviously not the case. I'm not an idiot (I hope), and I keep getting captchas wrong like half of the time.
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Say you dynamically create a checkbox the user has to check before they can submit the form. I wouldn't think tools that register on sites wouldn't be able to break this system, say if you were randomly naming the checkbox and having some sort of validation check to see if it is checked.
Man, I need one of those, I'm usualy only about 50% accurate.
Chances are any disscution on Slashdot will degrade into a flamewar about ID/Christianity within 14 posts.
Easiest way to defeat any captcha: put up a free porn site that requires users to fill out captchas to get in.
Now, come up with a better way of preventing spam than simply proving that someone is human.
Don't thank God, thank a doctor!
poorly designed captcha implementations can be circumvented 100% of the time, without having to use OCR. more info regarding this is available here http://puremango.co.uk/cm_breaking_captcha_115.php (shameless self promotion - it's my site..)
also, it's no wonder that people are annoyed by CAPTCHAs - half the time they don't explain why the user has to enter the text, and almost all CAPTCHAs are developed around making the text hard to read. At the moment, it's only a few geeks who have managed to bulk-OCR CAPTCHA scripts. Generally even the presence of a totally insecure captcha is enough to stop spam dead in it's tracks - spammers just use a set script and fire it at a bunch of blogs, guestbooks etc; they are not currently targetting scripts at specific websites, and they're certainly not smart enough to perform bulk OCRing.
I paid the going retail price for a Windows screen reader and got a free Unix computer!
need a firefox extension to take care of it...
btw... slashdot also uses "Captcha"s for Anons
... it is annoying for users. Sometimes I get it wrong because I can't tell if the captcha technique they are using is case sensitive and I can't always tell the case of the character! Sometimes a lower-case L can be confused for a number 1 or vice-versa. So yeah, it's REALLY annoying.
HOWEVER. A short and simple multiple-choice or true-false quiz might determine with some level of accuracy if the poster is a person or not. Simple stuff like a random image of a sheep, a lion, a bear or a whale with a radio button selection below it. It's easy to run through, it shouldn't require much skill from the user and has the potential to confuse interpreting software a lot more.
This approach could also even be ENTERTAINING to the user in that funny pictures could be used in the image interpretation drill. Such questions could be "Is this person having a good day?" and you can put all manner of interesting images in there for a true-false scenario. Being an entertaining method will definitely win fans. Being tedius, stressful and mistakable will lose fans.
Sorry, but the CAPTCHA plug-ins I've used with Word Press etc. are *highly* effective. Where people typically screw up in their implementation is to use the default dictionary word list which ships with them. The majority of CAPTCHA-defeating scripts out there today use a dictionary attack rather than successfully decyphering the CAPTCHA image. If one sets the CAPTCHA to generate a string of random letters rather than a word from the stock word list, the amount of comment spam posted drops dramatically.
Until someone comes up with a better alternative to defeating comment spam, I will continue to use CAPTCHAs, as will many, many others -- I just don't have time to sift through and delete hundreds / thousands of comments per day.
An interesting thought -- Slashdot seems to be highly resistant to comment spam, but I suspect this is due to a relatively high percentage of logged-in users and an aggressive subnet blacklist policy.
90% accurate! When will they release software that can read doctor's prescriptions?
My Sig indicates the end of the comment I posted.
In my Word Press I avoid Captcha because it is implemented badly in most cases. People shouldn't have to type more than 3 characters, there must be a way to conceal the appearance, and subsequently approve an IP address to always post.
I use Akismet spam filter instead, and it's blocked 780 so far, and has false positived 4 comments, and missed about 4.
Oh You POS
What if instead of using a word/letter-test an image test was used? The user would be presented with an image, like a ball floating in a pool or a three apples next to a brick. The user would have to describe the image using the basic words, ala "ball in pool", "three apples and brick."
The image could be cropped differently, slight color changes, rotations, and other slight changes to prevent programatic recognition.
Perhaps it wouldn't work, but it seems to me that a computer would have a harder time deciphering and image than a series of letters of a known set of 26 + 10 (A-z, 0-9)...
"It isn't necessary to completely suppress the news; it is sufficient to delay the news until it no longer matters." - N
The Worst part about getting captchas wrong with web forms is the crafty bastards method after refresh.
They usually clear your password fields, and occasionally reset the "Share your address with the devil" ticks.
I noticed last night whilst signing up for something and getting the captcha wrong it did this.
I only noticed the ticks were still in place after submitting, and then its too late to go back.
So folks, be warned...
liqbase
I plan on making a script that prints an image with text on it that asks a very simple question the user must answer. I am concerned about this working across different languages, though, so I'm thinking of making it simple math problems. Do you think that would work?
I like Slashdot's patented Mind Reading Capchas.
-mcgrew (MRC="slants" or "shants" or sluts"... I think. Hmm... "shants" and "sluts" ore off-topic, it must be "slants." Good job, slashdot!)
Some (not all) implementations of captia use a voice synthesizer to speak the letters in question. As a non-blind person, I find this easier than reading some of the more obsure ones.
In the end, captchas are obnoxious for legitimate end users, while only providing temporary relief from spammers. The spammers can and will find ways around the captchas, which may include more sophisticated OCR algorithms, but also other solutions such as the manually created lookup tables that were mentioned earlier.
Other ways need to be found to distinguish humans from spammer's bots.
Meldroc, Waster of Electrons
The people too lazy to protect other people from spam should have their machines taken from them. The machines should then be replaced in the first locatable, preferable from behind, orifice.
Having to work for a living is the root of all evil.
but has anyone tried using animated gifs? Or is that pointless?
There's a project that does "captcha" with text questions, which makes it usable by the blind, and probably less likely to accidentally deny access to humans. Of course, spammers might be able to attack this as well, but if they can get 90% on an image captcha, then maybe this is worth trying.
http://freshmeat.net/projects/textthacaa/
I did my undergrad thesis on reverse Turing tests (a family which CAPTCHAs are part of). Here are the main categories I could identify which can be utilized to effectively and (hopefully) easily prevent automation:
...)
1. Text based passwords
Pro: People are used to them, quick-n-easy
Con: Subject to brute force attacks, trivial to automate a login once you have the password
2. Graphical passwords
Pro: Can use a larger set of images than characters, easy to remember
Con: time consuming, can only present a small set of images at once, variable screensizes (pdas to big screen TVs), not good for accessibility, no native support in basically ANY application, not easily scalable
3. Text based questions (eg. which word in this sentence is underlined? "Mary had a little ___"
Pro: quick-n-easy, not necessarily subject to brute force attacks,
Con: Does not cross over the language barrier well, broken with google queries and sophisticated algorithms, not easy to build a whole set and even harder to do it automatically, requires a large set with no repetition - not easily scalable at all
4. Graphical based questions (eg. How many people are in this photo? What animal is this?)
Pro: quick-n-easy, extremely difficult to automate
Con: Does not cross over the language barrier well, not easy to build a whole set and even harder to do automatically, accessibility issues, requires a large set with no repetition - not easily scalable at all
5. Puzzles (eg. Put (ie. click-n-drag) the basketball into the basket, do a "virtual" jigsaw puzzles )
Pro: Effective (requires some thought and control of the mouse)
Con: Can be time consuming, unfamiliar, not trivial to create or automatically create, no native support in basically any application, can be difficult for children, elderly, or those of lower intelligence, accessibility issues, device input issues (does it require a mouse?). Not scalable at all.
6. Games (eg. miniature-pacman)
Pro: Effective (requires a little intelligence to beat the game), can be fun
Con: time consuming, unfamiliar, almost impossible to automatically create, no native support anywhere, device input issues, can be difficult for those of lesser intelligence or slow reflexes, accessibility issues. Not scalable at all.
7. CAPTCHAs
Pro: Some are effective, easy to deploy, starting to become familiar to users
Con: Many are or can be broken, some are too hard for humans, sometimes there are language issues, some accessibility issues
8. Biometrics
Pro: Most perfect form (how can an automated program provide, say, it's own fingerprint?)
Con: Unfamiliar to most users, Uncomfortable to many users, no guarantees of live data (record and playback techniques would be effective), not well-deployed, some techniques are not effective for some users (eg. voice recognition for anyone who cannot speak)
Out of these, some of the best techniques for deployment might be to automatically wrap mailto tags with some javascript (say via server-side scripting) which won't display the email address until the user passes the above.
Use trusted reverse turing test authorities like certificate authorities to provide and verify reverse turing tests such as CAPTCHA images.
Include native support in software which will prevent automatically tampering with key areas (eg. registry, startup areas)
Bottom line, there are plenty of effective techniques, but they are not all easy to deploy, and they are not all perfect at their job. I truly believe there will never be a perfect solution until biometric devices can somehow guarantee that the biometric data being received is live and not replayed (perhaps through an encrypted timestamp or something)
What the hell does pi have to do with grapes???
Oh, I get it. Oink off you bastard.
I'm not not licking toads.
Why not present the user with a "concentration" type puzzle?
I prefer kitten auth.
OMG PONIEs!!!!!!!!
How do computers do fare against Ishihara colorblindness tests? Besides helping prevent unauthorized intrusion, with certain layered test images, you can help the color vision impaired by accepting the values for both the impaired and unimpaired versions. See page 4 of the above link for how they are contructed.
"Beware of he who would deny you access to information, for in his heart, he dreams himself your master."
It's hard enough for some of us, as it is, with dyslexia, now we have these damned "captchas" (What a stupid term). It's extremely irritating and I wish they'd go the freak away!
http://wrexallen.blogspot.com/
In response to the people asking about animated gifs, I think they could be algorithmically defeated. However, what about something requiring mouse movement? For example, using a mouse gesture as an unlocking code. A text (or audio) cue to the user to do something with the mouse. The above wasn't my first thought after answer the animated gif question. But if follwed from the first thought; instead of animated gifs, what about the Apple Quicktime things that allowed you to move the mouse to view a 3d scene? The entire scene wouldn't be visible and would require mouse movement to view the scene enough to answer the question. Obvious problems- hard to generate. But a mouse gesture based unlocking? Isn't that doable?
You know what really sucks, is I've run into Chinese sites that have them with Chinese characters, for example the popular QQ chat client account signup form. To get an account, you need to fill in a 5 (Chinese) character captcha, and unfortunately for Chinese learners, some fairly uncommon characters sometimes come up and then it's impossible to know how to type them. With no knowledge of characters, it is impossible, as I assume that Chinese OCR isn't up to the task.
Adventures in Shaanxi
couldn't this be solved with a bit of javascript? just see if theres any mouse movement on the screen.
Yes, people who are smarter than 99.9999% of the rest of the world can do it but what about those >100 IQ folks (mostly in the South) going to do? These people can barely open unrequested email attachments (but some how manage everytime) or find their sliding cup holder.
Yeah, Berkeley PhDs verus everyone else? I, for one, welcome our new super intelligent overloards and as a trusted ...
Almost exactly what you describe: http://www.kittenauth.com/
Let's say you have your super-duper captcha generator where no two are ever alike, and thus can't be indexed. Let's say I also want to crap-flood you with automated posts linking to my product, or just site I want brought forward on Google's index. Think you're safe?
Hell, let's use Slashdot as an example, since everyone has seen the captchas here.
It works like this: I'll set up a porn site all right. Gets people's interest easier than anything else. I promise some free porn, or heck, even some links to other thumbnail galleries, but make people go through a captcha each time. Except it's _your_ captcha. Consider the following sequence:
1. Random J Hornyguy wants to see the porn. He makes the request that'll give him the captcha page.
2. My server automatically makes a request for a message posting form on your site. (Think simulating clicking on a "Reply To This" link on Slashdot as Anonymous Coward.) Your server gives me the form, complete with session cookie, etc, which I store, and also a captcha. Ah-ha. Guess what I do with that captcha...
3. Random J Hornyguy finally gets his login page, complete with the captcha I just got at step 2. Which he mutters a bit about and finally fills in as plaintext and submits.
4. I now finally submit my post to your site, complete with the captcha text that Random J Hornyguy dutifully filled in for me.
5. If that doesn't go through, I'll make another request and politely ask Random J Hornyguy to try again. (I'm userfriendly, eh?) If it went through, I'll also let him see the porn. After all, I'll want him to come again later and do some more free work for me, so no use annoying the hell out of him. But if I'm an evil SOB and have an endless supply of suckers (e.g., a spamming or phishing operation reeling in the suckers), I might tell him that he typed wrong anyway, and see how many can I get him to solve before it dawns upon him that there's no reward and he can't ever get past the captcha.
Note that at no point this relied on you having a repeating set of images. My site just acted as a captcha proxy between yours and a human sucker, in real time.
Sure, it needs a bit more work coding it like that, but not much more. (I'd have to store the session, recognize links, simulate form responses, etc, anyway if I want to automatically crap-flood your site.) And it'll keep working no matter how you alter your captcha generator, as long as it's still readable at all by a human. And if I have enough users, I can add modules to automate that captcha proxying for several site: each user randomly gets to break the captcha for another site, so the crapflooding is more distributed instead of swamping one site solid.
Also note that a lot of sites, Slashdot included, only make you use the captcha once, when you create or log in a user. If you choose to get a permanent cookie, you can post thousands of posts without ever seeing a captcha again. So I don't have to rely on you reusing captchas, if I create a new user for each one my users solved for me and store the permanent cookie. Since each such user can post more than once before the site admins catch on to it and ban the user id, I can generate a lot more crapflood posts than I get users solving captchas for me.
(Or maybe I won't crapflood some message board, but generate ids to free mail accounts and send spam from that. Again, that escalates quite nicely. As long as you don't require a captcha for every single bloody email a legit user sends, I can send thousands of emails per captcha solved by some Random J Hornyguy. And when that user gets banned, some other Random J Hornyguy will solve the next captcha for me.)
So, to wrap this long rant up, TFA just made me go "didn't it ever occur to these people that they're doing a brilliant technical solution, but it solves the _wrong_ problem?" It's such a tunnel view of the problem, it ranks up with MPAA's being surprised that people tell their friend whether a movie was good or bad. It's the typical "idiot savant"
A polar bear is a cartesian bear after a coordinate transform.
Just link your message/comment submit tool through a BotOrNot site where real people vote on the validity of messages.
Of course, to avoid bots voting on BotOrNot apply recursivity.
I found this post by Dr. Dave, maker of Spam Karma for Word Press, on the State of Spam interesting reading:
/ the-state-of-spam-karma/
http://unknowngenius.com/blog/archives/2006/01/30
My interest in CAPTCHA relates directly to comment spam so I may be overly narrowing the problem. I had a couple ideas that I plan to implement at some point for dealing with this outside of CAPTCHA:
1. Require poster to give email address (as with most registration systems). Post comment for a limited period of time (say 15 min), but then have it expire if not verified by clicking link emailed to poster. (Impose a 1-3 comment per session max on posters and periodically purge database of unverified comments.)
2. When posting a comment, run a js script that imposes a 1 second delay of some sort on poster -- to thwart automated attacks. Is there a way to do this effectively? Any implemetations of an idea like this?
Most effective systems I've seen use a layered approach, so these could be layers in a system that also uses CAPTCHA situationally as well.
To my thinking, the problem is not so much coming up with a system that discrimination human problem-solving from computer but rather to come up with one that imposes costs unacceptable to automated spam-bots but acceptable to well-intentioned humans.
Do you think these would be of any use?
Tom
Innovation makes enemies of all those who prospered under the old regime... -- Machiavelli
I had Baird as a professor and he would talk about his research once in a while. He's got some pretty neat stuff up his sleeve and I'm happy to see some of it's getting out there. Congrats, Professor Baird.
How about captcha mini-games, Ball in a Hole
Well it seems their computers can negotiate captchas at a higher ratio than I can. Were do I download that software?
While Captcha was designed to prevent scripts from working, it really is a form of a Turing Test - except the winner is the Human, not the AI.
Looking at it from that angle, Captcha can only be a short-term solution- and a constantly changing one at that. With time CPU power only increases, as does development in vision & pattern recognition AI. Captcha, to work, must frequently change to focus on that which is hard for computers (for now), and not too difficult for humans.
Even "Kitten Auth" can be defeated with a some clever programing.
Captcha is doomed in the long run. I wouldn't build a business model that relied on failing a Turing Test...
I recently implemented a new CAPTCHA system using Flash as "secure container" on my guestbook. The spam immediately and completely stopped. You can see it here.
I also wrote an article about using Flash together with CAPTCHAs to achieve 100% security, which can be found here:
Effektives Bot-Blocking mit Flash (Original German)
Effective Bot-Blocking with Flash (Babelfish-translated)
The article outlines the technical implementation, it's advantages and disadvantages and even discusses future hacking possibilities.
CAPTCHAs are meant to prevent scripts from (ab)using services designed for human beings.
Unfortunately, the man-in-the-middle workaround (CAPTCHA presented to human user in a different context, answer used by script) is dead easy to implement. So at best, you're cutting down on the number of registrations a script can make, but not actually solving the problem. Is it worth the effort?
The best side effect of the CAPTCHA arms race is that some amazing pattern-detection algorithms are being invented to defeat them the hard way.
Create a captcha that relies on a currently unsolved problem of computing (such as interpreting scratchy audio into words) and see what technology is hacked together to get past it.
"Enjoy what you're doing! If it becomes drudgery, you're doing it wrong!" - Jim Butterfield
My solution is simple. It also defeats the "porn server in the middle" attack. Assuming the page is in English, just ask a random English language question about the banner ad at the time of the page. You "kill two birds with one stone" by getting people to prove they are human and read the ads at the same time.
This should work fine for all users that don't block banner ... uh ... never mind.
now we need to go OSS in diesel cars
Basic image comparison techniques are pretty easy to fool. Change one pixel and the entire image hashes to something else.
Change one pixel and the peaks of the Fourier transform of the image remain mostly the same. It's the same reason one can hear a tone above white noise.
Some "dupe detectors" reduce the image to a grid of n*m, take the average color of each square, and hash that.
Which is the same as using only the low-pass parts of the Fourier fingerprint.
This can be defeated by changing the color of a significant block of pixels to a random color, though this would need to be arranged based on the picture itself so you don't hide the kitten.
To defeat this, use a short-time Fourier transform or a sharpen filter to detect areas with abnormal spatial-frequency properties, and reject those from the comparison before taking the Fourier transform.
That still leaves things like manually capturing every possible unique base kitten image, then doing a pixel-by-pixel comparison and marking everything mostly matching as a kitten.
That would just increase or decrease the DC component of the picture, again trivial for a Fourier based detector to rule out.
Yeah, I can see how that would stop all the spam.
now we need to go OSS in diesel cars
My own weblog was recently hit by comment spam. I was extremely irritated, and initially considered captchas as a potential solution. But several problems with captchas ultimately lead to me seeking alternate solutions.
The first problem with captchas is the barrier it puts up, however small, between you and the users of your site. Apologies for the corney analogy, but captchas are a speedbump on the information superhighway. People hate running into them.
The impediment to visually disabled users is also a big one to consider. It's not just fully blind people. People can be shortsighted, colour blind, dyslexic or perhaps simply shortsighted users relying on specialist software to read your website. You're letting these people down by adopting this practice and that's something I would really feel bad about doing.
But the biggest reason not to use captchas is spammers increasing abilities to interpret them. At even a five percent success rate in interpreting captchas, a spammer can bombard your site with requests and still get something through. They're just using the same model as they did with email, and it will work.
Instead I chose some other plugins available for Wordpress to help with the spam. Akismet sounds like it could work as a kind of distributed spam check/blacklist of sorts, though I am wary of the fact that a private company is running the service. I also installed Bad Behaviour, though it's clear that eventually some spammers will adapt their behaviours to this.
Ideally what I'd like is a true bayesian comment spam filter plugin for wordpress, but so far I haven't been able to find one. Such filters have done wonders for me in Thunderbird for my email spam, with something like a 99.99% sucess rate and no false positives. Clearly the situation is quite different with comment spam, but all the same it would be nice to have one.
I envisage that the comment spam situation will get a lot worse as time goes by, regardless of any pagerank type algorithm changes. Comment spam will no doubt become as ubiquitous as regualar spam and I can forsee dozens of "splog" post per day in the not too distant futre. My opinion is that Blog software should come with robust, adaptable and self updating anti-spam software on by default before this problem escalates out of control.
May the Maths Be with you!
"Press enter when the clock shows 11:45".
OK, you try being in the body of a person with a tremor and trying to stop the clock at :45 and not a second early or late. Or try doing it without using your eyes.
if due to learning disabilities one does not know that the color of cloudless midday sky is blue or the four round rubber things a car uses to move are tires, then chances are this person has somebody helping them with their daily activities.
Or the person lives outside the United States. For instance, a person's first language may not be English, and the person has never needed to learn English automotive terminology (?cómo se dice neumáticos en inglés?). Or a person's first language may be Commonwealth English (where it's spelt "tyres").
you cant get much simpler than a dictionary word followed by numbers, and its easily readable
Even in JAWS? Or did you intend to "shutt" your web site to blind people?
just ask a random English language question about the banner ad at the time of the page.
If the banner is not textual, then it's just as impossible for potential customers with blindness as any other visual CAPTCHA. If the banner is textual, a porn site can just fetch the whole page, parse it, and send it to the porn customer for evaluation.
I hate captchas too, but one thing I've wondered about is using ASCII art. ... but what about rendering the ;-)
It turns out to be rather automatable to solve if given the plain text ouput
(we did it in a perl quiz of the week)
ASCII art as an image with further obfuscation to foil OCR? Or ASCII art
to SVG? Heck, pick some of the wackier fonts off of the myriad free sites and
render text with them in SVG and you've got a nearly indecipherable mess
Were that I say, pancakes?
I got one from LinkShare once that said "r A p e." It was pretty disconcerting. I should have taken a screenshot.
How about a rapidly animated image where each frame contains seperate pieces of the characters and each frame uses different colors. Persistance of vision would end up yielding an image.
chances are that if youre going to a site, you know enough of that sites language to at least be able to navigate through it.
But unless I'm visiting a site that covers automotive topics, why should I be expected to know about automotive lingo? A textual CAPTCHA should be somehow related to the scope of the site; otherwise it will be seen more as region coding than as anything else. And you misspelled "you're", which a textual CAPTCHA would likely call you on.
so neumáticos would be the correct answer on the italian site.
Nit: It was Spanish. I assume that by "italian" you meant Italian; a textual CAPTCHA would likely call you on the capitalization. In addition, the accents usually go the other way in Italian (Spanish á vs. Italian à), and Italian plurals mutate the final vowel instead of adding "s" or "es" as in Spanish. An "identify this language" CAPTCHA for linguistic sites would likely call you on that too.
if you are joining a site that has all its content in commonwealth english (you know what i mean), then obviously tyres would be the correct answer.
You mean like BBC News? Not all readers of BBC News know that the spelling "tyre" is used in the UK. Just look at the flamewars that occasionally erupt on Slashdot with respect to "color" vs. "colour".
But to an extent, I understand what you are trying to say: if a speaker of Spanish or British English wants to join an automotive community that is known to use the word "tires" throughout the site, "tires" is the correct answer. But not all sites are automotive, and not all sites are that consistent.
Please someone contact them about putting out a firefox plugin. The spammers already have these things figured out (man in the middle attacks described further down in the comments) and I just want to get into my bank account and forums without having to take my glasses off, get about an inch away from the monitor and then have to try two or three times before getting one that's legible. Thankfully I got image-zoom on here so I haven't had to do the first two steps in a while. It's only a matter of time before they start using flash for these things though and then it's back to practically felching the monitor just to read the stupid things.
As an aside I handn't logged in yet to post and the person in the next cube over tells me the captchkacinno thing for this particular comment was "accuracy". Funny.
my point was simple questions that require simple, one word, case insensitive answers. regardless of the type of site one is visiting. the color of blood (three letter word), what is a three letter word for frozen water, what do chickens lay that has a yolk...
and if, in turkish, the word ice has seven letters, then the question, which would be in turkish when on a turkish site, would reflect that.
the point isnt to check for capitals or proper use of apostrophes. the point is that unless a bot is built to recognize specific questions and have a matching answer, it shouldnt be able to enter a correct response. multiple choice questions might be easier but a bot can be built to guess an answer.
an issue that i believe would make textual captcha useless is if the questions are presented too simply, such as the ones above, you could probably build a bot that passes the question to a search engine and tries to determine which word is the answer based on the first page of the results. so maybe it would work better in the form of:
blood is: [input box]
hint: color, 3 letters
again, this is a harder system to develop than the current because it requires a lot of creativity while maintaining simplicity and you dont have the option of random generation. but if done correctly, its easier for humans yet just as difficult (if not more) to decipher programmatically.
I've thought it would be easier on the user and harder for a machine to recognize a little game, like 'drop the red ball into the yellow square' with several colored balls and various targets and a randomized instruction... I think a user would be less annoyed (I say "less" annoyed). One aspect of the annoyance factor is having to switch to keyboard from mouse and back (browsing is usually mouse-only) Also, sometimes these text images are impossible for me to read.
I keep seeing incredibly stupid implimentations of captchas, which can't possibly slow down a script, but only impede legit users.
For instance, visit: http://xoompages.com/cgi-bin/xpanel/register.cgi
After you select a domain name, it will present you with a SWF captcha at the bottom of the page. Not having Flash installed, I couldn't see it, so I used the "View Page Info" option, and it was trivial to figure out. The last value of the request in the number that the captcha will have... So if the embed URL ends in "?cval=31337", you input 31337, and you're through.
It's like they're going out of their way to prevent PEOPLE from using their site, while making it easy for SCRIPTS to create all the accounts they want.
.
But the worst of them all are the smashed-together and overlaping captchas that are almost completely unreadable, and to make matters worse, half of the 20-line form you filled out has to be re-entered every time you get it wrong... Damn idiots designing websites!
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
In a 4 x 4 square, ask them to click the 4 cute animals. There's no way a machine can know "cute"
Same with a human. Someone may think a puppy is cuter than Natalie Portman; others the other way around. Are Precious Moments cuter than Care Bears?
Such as an animated GIF instead of a fixed image; like slowly changing the colours on certain parts of the background or foreground... The animated sequence can emphasize or reveal the writing, without making it obvious to the computer, what is being revealed.
The user is to apprehend the writing, and the human can tell which frames have some text, and which just have a bunch of lines.
But the task is harder for the computer program, since it may have to take multiple frames of the animation to figure out what the text says.
It seems like processing dynamic information, or making predictions about where an animation will go should be much harder for the computer, but easy for the human.
You gotta watch those ticks. Might catch Lyme disease...
i am a soviet space shuttle
I was trying to talk with some security guy who had criticized our antispam project. So I typed a long (about 10 paragraph) letter on the comments... put my name, e-mail address... and filled a 12 word catpcha.
:-/ What kind of joke is this?
:) and I mailed the guy.
SUBMIT
And guess what? It required a valid login!
At least he could have said "Note: To post comments you are required to login to our advertising service" or something. Fortunately, I had taken care of copying the text to the clipboard before submitting
Having said that, the problem with captchas is that spammers use them in porn pages to let their slaves er.... porn visitors to type in the captchas for them.
is botnets. They're the ones used to send email spam, form spam, click fraud, DOS attacks and all that nasty stuff. While it's true that you can't eliminate botnets located in countries like china, you CAN eliminate botnets in the US by running virus checks and all that. Then it's just matter of blocking other countries' IP's (or at least their known botnets addresses).
:(
But we need the government to give tax deductions to companies for dedicating to clean people's PC's, for example. I'm still shocked when I see people saying that they don't have an antivirus. Worse, their windows versions are still unpatched