Web Users Angered by Anti-Spam 'Captcha'

← Back to Stories (view on slashdot.org)

Web Users Angered by Anti-Spam 'Captcha'

Posted by ryuzaki0 on Thursday June 1, 2006 @02:36AM from the web-user-smash dept.

Carl Bialik from WSJ writes "Captchas -- the jumbles of letters that users must type to gain access to some websites -- are a growing irritation, the Wall Street Journal reports. But programmers hope to make new variations that are both easier to decipher and harder to crack. From the article: 'Some captchas have been solved with more than 90% accuracy by scientists specializing in computer vision research at the University of California, Berkeley, and elsewhere. Hobbyists also regularly write code to solve captchas on commercial sites with a high degree of accuracy. ... Henry Baird, a professor of computer science at Lehigh University who studies PC users' responses to the codes, has been working with colleagues to develop new generations of captchas that are designed to be easier on humans but baffling for computers.'"

15 of 267 comments (clear)

Min score:

Reason:

Sort:

Different method entirely by Volante3192 · 2006-06-01 02:42 · Score: 5, Interesting

Just throwing this out, but maybe there should be a very basic question asked instead? Since these already presume literacy, maybe something like:

Which of these is a number: A 2 R P?

Seems that regardless of what they come up with there's going to be some part of the population that won't figure it out anyway, and if the whole point is to confuse auto-registerers, then I'd think it'd be harder for those to account for every possible question and answer set.

(Yea, it's in TFA, but mentioned like an aside...)
captchas discriminate against the blind by Speare · 2006-06-01 02:42 · Score: 4, Interesting

The captcha concept breaks down if the user can't see the image, either through the limitations of their browser (links) or the limitations of their eyes. A US government site would have a hard time justifying captcha in light of their legal and moral responsibilities to the disabled citizenry.

--
[ .sig file not found ]
Not the point by Reality+Master+101 · 2006-06-01 02:52 · Score: 2, Interesting

Just as the point of DRM isn't to be completely bullet proof (there's always the analog hole), the point of a captcha is to be enough of a nuisance that someone doesn't spend the time to crack it. Obviously, for a site like Yahoo and it's zillions of sites, it pays to spend time breaking the captcha. But for your average site, the captcha just has to be "good enough" such that someone won't bother to write a crack to spam a small fish.

--
Sometimes it's best to just let stupid people be stupid.
Re:captcha isn't that bad.... by Jeff+DeMaagd · 2006-06-01 02:53 · Score: 2, Interesting

I think that's a problem. eBay has one that if you don't fill it in quickly enough, they'll say that you entered it incorrectly and you try again. Once, it put me in a loop, making me enter a new one every time and each time, it actually does send the response email, but it doesn't tell me that, so my customer got five copies of the same email.

Sites should have alternate means, but even the ones that claim to have alternate means never really follow up on anyone.
Re:News for Nerds? by Red+Flayer · 2006-06-01 03:01 · Score: 5, Interesting

And yet, the discussion of the article will prove to be much more illuminating than the article.

What's wrong with an article being a spark for more in-depth discussion? How else are things rarely discussed in the media and never in depth (like most tech topics) going to be discussed on slashdot?

Sure, I know this post (and the parent) are off-topic, but it bugs me when people think that the purpose of slashdot is just to accumulate articles... that's what RSS feeds are for.

The discussion is what keeps me coming back, and typically, no matter how moronic the article is, there are several posts that give the kind of information that I wish was included in the article (but isn't). At the very least, people provide links to more comprehensive information and/or discussion of the issues concerned.

--
"Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
Server in the Middle by Doc+Ruby · 2006-06-01 03:01 · Score: 4, Interesting

Captchas are not hard to crack, now that someone has produced my favorite crack strategy. A "man in the middle" attack server hits pages with captcha challenges. That server advertises a "free porn" website, presenting to its human audience the captchas it hit. The porn seeking humans decode and enter the captchas, get the porn (or not), the server sends their entries to the original captcha page, and gets past them as often as humans seeking porn would. There's so many humans seeking porn that the middleman transactions happen in realtime, indistinguishable from direct human responses to the original captcha.

This is v1.0 of the Matrix, where human brains are harnessed to solve problems by a more powerful and wise, though less "intelligent" computer network.

--
--
make install -not war
Re:Image Key Sets & Dynamic Captchas by Nos. · 2006-06-01 03:01 · Score: 3, Interesting

I spent some time working on an alternative to captcha, I call AOMIS. http://aomis.net./ I haven't had a chance to work on it for a while, but the basic idea was, provide a piece of media, the user must identify the content.

In most cases, it would be an image. So, I might show you a picture of an elephant, and to submit the form, the user would have to enter 'elephant' into the box. Each image would have a number of correct answers to account for common spelling mistakes, and the most common correct responses. Its built to handle multiple languages, and different types of media. Thus, you could use audio files for the blind. Audio files could ask a simple question "What is two plus two" or such.

Now, to deal with checksums, each piece of media is regenerated dynamically on a regular schedule, for example, changeing one or two pixels on an image is probably not noticeable to a person, but changes the checksum, making it impossible to catalog the database.

I just wish I had the time to get it to a point where people could start trialing it.
Re:To read this comment enter the text by saifrc · 2006-06-01 03:05 · Score: 3, Interesting

There's a geographic/cultural/educational problem with KittenAuth -- what if you're not familiar with kittens? Or foxes? What if you've never seen real cattle? These situations are not as rare as you might think, and certainly not invalid. I personally would have had a little trouble identifying the foxes on the KittenAuth page, were they not highlighted with a red border.

I think it's a step in the right direction, though. It's an interesting insight into what human memes can be considered universal.
Re:Image Key Sets & Dynamic Captchas by odyaws · 2006-06-01 03:08 · Score: 5, Interesting

In order to use the p0rn site he ran, you had to either pay money or spend time identifying captchas.
I saw a talk recently by Luis von Ahn, one of the inventors of the captchas. There were two interesting ways he said people were getting around captchas. One was a real-time approach similar to what you describe. Rather than storing a big database of these things, the bot that was signing up for email addresses or whatever would, upon encountering the captcha, sent that image off to someone browing the porn site (posing as a legitimate captcha - "We need to verify you're a person and not some bot stealing our porn for another site"). In order to continue browsing, the user would have to solve the captcha. Naturally they tend to do this very quickly and accurately :)
The second approach was simply to set up captcha solving sweatshops somewhere in Asia with cheap labor, with people paid a few cents an hour to sit and solve captchas all day. This brought the cost of a new email address up to something like 1/3 cent, which for many spammers is still a viable price. The cost does limit this approach, though, so the captcha still helps.
The interesting thing about both of these strategies is that they use humans to solve a problem that is difficult for computers, which is von Ahn's research area - he's also one of those behind The ESP Game (caution - this can be shockingly addictive). There's essentially nothing that can be done to defeat either approach without also making a system a huge pain in the ass for legitimate users. From this point of view, spending time trying to come up with more advanced captchas is kind of pointless.

--
Still trying to think of a clever sig...
Re:To read this comment enter the text by Qzukk · 2006-06-01 03:48 · Score: 3, Interesting

Basic image comparison techniques are pretty easy to fool. Change one pixel and the entire image hashes to something else. Some "dupe detectors" reduce the image to a grid of n*m, take the average color of each square, and hash that. This can be defeated by changing the color of a significant block of pixels to a random color, though this would need to be arranged based on the picture itself so you don't hide the kitten.

That still leaves things like manually capturing every possible unique base kitten image, then doing a pixel-by-pixel comparison and marking everything mostly matching as a kitten. It can be slowed down by changing the brightness or tint of the overall image slightly, but too much would make the image unrecognizable.

It would be more interesting to combine several ideas. Rather than "click on the kitten" have each picture marked with a random letter, and "enter the letters of the pictures with kittens". Or maybe change it up, pick brown kittens or black kittens or white kittens, kittens playing with a ball, etc.

--
If I have been able to see further than others, it is because I bought a pair of binoculars.
animated gifs? by psbrogna · 2006-06-01 04:30 · Score: 2, Interesting

In response to the people asking about animated gifs, I think they could be algorithmically defeated. However, what about something requiring mouse movement? For example, using a mouse gesture as an unlocking code. A text (or audio) cue to the user to do something with the mouse. The above wasn't my first thought after answer the animated gif question. But if follwed from the first thought; instead of animated gifs, what about the Apple Quicktime things that allowed you to move the mouse to view a 3d scene? The entire scene wouldn't be visible and would require mouse movement to view the scene enough to answer the question. Obvious problems- hard to generate. But a mouse gesture based unlocking? Isn't that doable?
1. Re:animated gifs? by AnalystX · 2006-06-01 05:24 · Score: 2, Interesting
  
  'However, what about something requiring mouse movement?'
  
  I have something like that. In fact, it's a part of a three tier security measure I came up with last year. Having spent a lot of time programming A.I. and automation routines in the past, I realized there was a class of processes that could be guaranteed to work against automated spammers. One tier involves recognizing patterns of movement between fields on a form and data entry patterns. There is usually a very unique pattern to the way a human fills out a form. There are a plethora of options to the spam-blocking people if they simply consider far more interactive (e.g. think data streams and anthropological forms) solutions. It's much harder for a computer to pretend to be a human in most situations than a computer to tell the difference. It's like a one-way social algorithm.
Works for non-static sets just as well by Moraelin · 2006-06-01 05:14 · Score: 2, Interesting

Let's say you have your super-duper captcha generator where no two are ever alike, and thus can't be indexed. Let's say I also want to crap-flood you with automated posts linking to my product, or just site I want brought forward on Google's index. Think you're safe?

Hell, let's use Slashdot as an example, since everyone has seen the captchas here.

It works like this: I'll set up a porn site all right. Gets people's interest easier than anything else. I promise some free porn, or heck, even some links to other thumbnail galleries, but make people go through a captcha each time. Except it's _your_ captcha. Consider the following sequence:

1. Random J Hornyguy wants to see the porn. He makes the request that'll give him the captcha page.

2. My server automatically makes a request for a message posting form on your site. (Think simulating clicking on a "Reply To This" link on Slashdot as Anonymous Coward.) Your server gives me the form, complete with session cookie, etc, which I store, and also a captcha. Ah-ha. Guess what I do with that captcha...

3. Random J Hornyguy finally gets his login page, complete with the captcha I just got at step 2. Which he mutters a bit about and finally fills in as plaintext and submits.

4. I now finally submit my post to your site, complete with the captcha text that Random J Hornyguy dutifully filled in for me.

5. If that doesn't go through, I'll make another request and politely ask Random J Hornyguy to try again. (I'm userfriendly, eh?) If it went through, I'll also let him see the porn. After all, I'll want him to come again later and do some more free work for me, so no use annoying the hell out of him. But if I'm an evil SOB and have an endless supply of suckers (e.g., a spamming or phishing operation reeling in the suckers), I might tell him that he typed wrong anyway, and see how many can I get him to solve before it dawns upon him that there's no reward and he can't ever get past the captcha.

Note that at no point this relied on you having a repeating set of images. My site just acted as a captcha proxy between yours and a human sucker, in real time.

Sure, it needs a bit more work coding it like that, but not much more. (I'd have to store the session, recognize links, simulate form responses, etc, anyway if I want to automatically crap-flood your site.) And it'll keep working no matter how you alter your captcha generator, as long as it's still readable at all by a human. And if I have enough users, I can add modules to automate that captcha proxying for several site: each user randomly gets to break the captcha for another site, so the crapflooding is more distributed instead of swamping one site solid.

Also note that a lot of sites, Slashdot included, only make you use the captcha once, when you create or log in a user. If you choose to get a permanent cookie, you can post thousands of posts without ever seeing a captcha again. So I don't have to rely on you reusing captchas, if I create a new user for each one my users solved for me and store the permanent cookie. Since each such user can post more than once before the site admins catch on to it and ban the user id, I can generate a lot more crapflood posts than I get users solving captchas for me.

(Or maybe I won't crapflood some message board, but generate ids to free mail accounts and send spam from that. Again, that escalates quite nicely. As long as you don't require a captcha for every single bloody email a legit user sends, I can send thousands of emails per captcha solved by some Random J Hornyguy. And when that user gets banned, some other Random J Hornyguy will solve the next captcha for me.)

So, to wrap this long rant up, TFA just made me go "didn't it ever occur to these people that they're doing a brilliant technical solution, but it solves the _wrong_ problem?" It's such a tunnel view of the problem, it ranks up with MPAA's being surprised that people tell their friend whether a movie was good or bad. It's the typical "idiot savant"

--
A polar bear is a cartesian bear after a coordinate transform.
Fourier to the rescue? by tepples · 2006-06-01 06:11 · Score: 2, Interesting

Basic image comparison techniques are pretty easy to fool. Change one pixel and the entire image hashes to something else.

Change one pixel and the peaks of the Fourier transform of the image remain mostly the same. It's the same reason one can hear a tone above white noise.

Some "dupe detectors" reduce the image to a grid of n*m, take the average color of each square, and hash that.

Which is the same as using only the low-pass parts of the Fourier fingerprint.

This can be defeated by changing the color of a significant block of pixels to a random color, though this would need to be arranged based on the picture itself so you don't hide the kitten.

To defeat this, use a short-time Fourier transform or a sharpen filter to detect areas with abnormal spatial-frequency properties, and reject those from the comparison before taking the Fourier transform.

That still leaves things like manually capturing every possible unique base kitten image, then doing a pixel-by-pixel comparison and marking everything mostly matching as a kitten.

That would just increase or decrease the DC component of the picture, again trivial for a Fourier based detector to rule out.
Just Had To Consider This by ObsessiveMathsFreak · 2006-06-01 06:43 · Score: 2, Interesting

My own weblog was recently hit by comment spam. I was extremely irritated, and initially considered captchas as a potential solution. But several problems with captchas ultimately lead to me seeking alternate solutions.

The first problem with captchas is the barrier it puts up, however small, between you and the users of your site. Apologies for the corney analogy, but captchas are a speedbump on the information superhighway. People hate running into them.

The impediment to visually disabled users is also a big one to consider. It's not just fully blind people. People can be shortsighted, colour blind, dyslexic or perhaps simply shortsighted users relying on specialist software to read your website. You're letting these people down by adopting this practice and that's something I would really feel bad about doing.

But the biggest reason not to use captchas is spammers increasing abilities to interpret them. At even a five percent success rate in interpreting captchas, a spammer can bombard your site with requests and still get something through. They're just using the same model as they did with email, and it will work.

Instead I chose some other plugins available for Wordpress to help with the spam. Akismet sounds like it could work as a kind of distributed spam check/blacklist of sorts, though I am wary of the fact that a private company is running the service. I also installed Bad Behaviour, though it's clear that eventually some spammers will adapt their behaviours to this.

Ideally what I'd like is a true bayesian comment spam filter plugin for wordpress, but so far I haven't been able to find one. Such filters have done wonders for me in Thunderbird for my email spam, with something like a 99.99% sucess rate and no false positives. Clearly the situation is quite different with comment spam, but all the same it would be nice to have one.

I envisage that the comment spam situation will get a lot worse as time goes by, regardless of any pagerank type algorithm changes. Comment spam will no doubt become as ubiquitous as regualar spam and I can forsee dozens of "splog" post per day in the not too distant futre. My opinion is that Blog software should come with robust, adaptable and self updating anti-spam software on by default before this problem escalates out of control.

--
May the Maths Be with you!