Fallout From the Fall of CAPTCHAs
An anonymous reader recommends Computerworld's look at the rise and fall of CAPTCHAs, and at some of the ways bad guys are leveraging broken CAPTCHAs to ply their evil trade. "CAPTCHA used to be an easy and useful way for Web administrators to authenticate users. Now it's an easy and useful way for malware authors and spammers to do their dirty work. By January 2008, Yahoo Mail's CAPTCHA had been cracked. Gmail was ripped open soon thereafter. Hotmail's top got popped in April. And then things got bad. There are now programs available online (no, we will not tell you where) that automate CAPTCHA attacks. You don't need to have any cracking skills. All you need is a desire to spread spam, make anonymous online attacks against your enemies, propagate malware or, in general, be an online jerk. And it's not just free e-mail sites that can be made to suffer..."
I hate the fact that a computer can view these things better than I can. Lately, a lot of the CAPTCHAs have become unreadable by human viewers.
Heh, at the end of the article they have a link to a site that requires you to solve a calculus problem to register (it gets easier if you reload the page a few times, down to simple arithmetic). I have a site that is only of interest to people who use verilog (a hardware design language) I've toyed with requiring a some digital logic problem to be solved, but the volume of spam signups it's big enough for me to be bothered yet...
Of course this solution isn't going to work for gmail - which seems to be the preferred email provider for the spam signups I do get these days.
ccalam - acoustic versions of new songs.
Combine it with a mix of simple math and image recognition? I.e.
"What colour hair does the (2+four)/3 girl from the left have?"
Hell, skip the math part if that's too easy.
We do not live in the 21st century. We live in the 20 second century.
Spammers are cracking some of the hardest problems of AI research.
How can they do that, and yet all the great academic minds can't? Two things:
* funding
* a willingness to use "anything that works"
What's really scary is that, in the end, spamming may turn out to be an agent of good.
How we know is more important than what we know.
Howcome /. is so spam free?
Do the hackers just not care about us,
or:
is this like one of those "safe zones" where geeks and hackers can hang out as long as nobody asks or tells? (looks at guy to his left..."say is that a CAPTCHA in your pocket or are you just excited to be here...")
Seven Days with Ubuntu Unity
It's depressing to me that things like viagra spam are still profitable enough to make spamming them financially useful. Sure, the way the economics of it work out you only need a really low response rate to break even, but hasn't everyone already gotten enough of those emails? I'd imagine that whatever market there is for sketch viagra distributors would be saturated by now.
At least with phishing spam I get to see new scams on a regular basis (some quite cleverly disgused too). But some of the more vanilla spam just seems pointless.
One time I threw a brick at a duck.
Interesting.
A few months ago I tried to post on a blog (sorry, I forget which one), entered the CAPTCHA and got a message that I was a suspected bot and my IP address was banned from posting for 48 hours.
I went back and carefully read the terms of use (just above the posting window) and buried in the middle of the terms was the phrase, "Do not enter the captcha, instead enter the first three letters of the fifteenth word in the second paragraph followed by the third word after the eighth word in the first paragraph in all capital letters."
A neat idea, but I suppose it won't be long before that one is cracked as well.
Beta sux! Join the Slashcott! http://hardware.slashdot.org/comments.pl?sid=4760465&cid=46173047
The best way I've seen that captcha's got broken are by "free porn sites". The web site is what is cracking another captcha. When it gets a captcha to solve, it passes it to one if it's "porn viewers" - "please type the word that this captcha says in order to prove you are old enough to view the porn". Then the porn is displayed and the bot running on the website has a potential solution made by a human to do it's botting with.
This method will suffice to crack ANY CAPTCHA!
--jeffk++
ipv6 is my vpn
Integrate OpenID based signatures with email by inserting a line into the email header.
Not a new idea, its the same old 3rd party trust situation-- so clearly the trusted OpenID servers would be targeted; however, if you added a simplistic peer ranking system on those user IDs (extending openID a little) then the bad IDs would get ranked down by real people.
This would also provide a means for verification for multiple emails used by the same individual's OpenID which could shield their actual identity (but not any better privacy than you have already.)
Additional headers for point of origin server could also be useful as some servers are less trust worthy than others (note: spam ranking is fuzzy and a slight nudge either way near the threshold value can make a noticeable difference. ) Server identity issues are already being worked on; but emails are not tied securely to the original server.
I'd like to see a standard email header line for spam ranking (0-100?); I'm sick of these "{spam?}" lines inserted in subject lines that I see time to time.
An OpenID based solution would get OpenID heavily tested since spammers may solve the big AI problems as well as letting us know where to get Viagra.
Democracy Now! - uncensored, anti-establishment news
Absolutely correct.
I run a mid-sized web development shop. A few years ago we were doing mostly retail sites. Vanilla and boring but we worked it down to a science and had some really great "modules" that made these sites super profitable for us. Of course, everything has its seedy side and with retail it was SEO.
Everybody wanted it. About 80% of our customers were of the "Do whatever, just ideminfy me" stripe. (And these are established companies paying high 5-figures for these sites). We drew our own demarcation about what we would and wouldn't do. (Excessive Internal-link structure is OK, zombie sites are not).
Now most our work is social networking.
We, too, followed the "rise" of CAPTCHA and we've been happy with our results. We always used a custom CAP for each site, and we tried to keep them relatively readable, being of the belief that making it too hard will only keep out Humans: If somebody wants to crack it, they will.
We still use them regularly. I noticed that about a year ago we actually had people begin to request them specifically. (Isn't that what Buffett said about the home mortgage mess? When the regular joe's started flipping houses, he knew it was over?)
Anyhoo, I think the real fault in CAP's is that they worked too well. They became too big of a target. Now, we try to mix and match a number of different techniques to identify humans.
Solutions range from dirt-simple: An input box named, say, "City" that has a label that reads "13 plus 8 equals:" or "What is the 3rd word on this page?"
To the more complex "what is the color of the front-door in this picture?"
We have a simple library we use for these things that pulls the questions (and, if applicable, the pics) from a Database of about 25,000 different turing tests.
The thing is, none of them are too complex. Any mediocre programmer could write an application to crack it. But your bot will probably never see that same exact question again, so it becomes irrelevant.
And, to tie it in to the parent, we chose this technique precicely because of what we learned from CAPs. Before there were software hacks, there was the "porn hack" and the "sweatshop labor hack."
In this case, when a bot the site, it's fairly difficult for it to even detect which item is the turing test. We auto-generate the location and even the name of the form field so it's always a bit different.
A good solution here is to include this as part of the turing test itself.
As I mentioned upthread, I'm a partner in a web dev shop. We do a lot of social networking (of course) and about a year ago we developed a utility to create just this type of turing test. For example, we'll have a picture, and ask the question "What is the color of the 3rd fish from the left?"
What we do, is we pair these tests on a page. We'll include a known test, like the one above. And we'll also show an unclassified image and we might ask "how many people are in this picture?"
There is no wrong answer for that test, and their answer is recorded. Soon, that same question will be asked for that same picture. As soon as its confirmed 2 times, it gets classified as having n people. Soon after it would be displayed again asking "how many females are in this pic?" or "what color shirt is the person on the right wearing?"
When we created the app, the DB had about 5000 turing tests in it. We then attached a DB of about 100,000 images that were pre-classified but not to an extent that would allow us to write a test off it.
Now, after a year in use across a couple dozen moderately trafficked websites, we have nearly 25,000 turing tests. All 20,000 new tests have been created thru the technique I described above.
The real reason we did it wasn't to save on some development costs. We could've hired temp workers and paid them $8 an hour to classify pictures.
We did it because I believe strongly that the key to simple turing tests like this is a large corpus of data. If a bot only encounters the same test once or twice EVER, then the problem becomes difficult to solve. This is like the ANTI-CAPTCHA.
CAPTCHA was all about taking a specific technique to its maximum extent: Challenge a computer system by taking a narrow field (OCR) and pushing it beyond the current state-of-the-art.
These tests are all about a general technique thats broad where CAPTCHA is just deep.
The only way to build a bot to solve each test in our DB would be to give it genuine intelligence. It would have to be capable of determining context, reference, connotation, image ID, etc.
As a programmer, if you say "Here's a captcha, write a program to solve it" I wouldn't know HOW, but I'd at least have an idea of where to begin.
Now, if you show me a picture with the turing test of "What object is in the hands of the 3rd woman from the left" ... well... i wouldn't know where to begin.
Search Engines help humans find web pages that the humans might find interesting, and they do this by having robots spider the web looking for patterns. Search Engine Optimizers try to get humans to read their customers' web pages in three ways:
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks