Fallout From the Fall of CAPTCHAs
An anonymous reader recommends Computerworld's look at the rise and fall of CAPTCHAs, and at some of the ways bad guys are leveraging broken CAPTCHAs to ply their evil trade. "CAPTCHA used to be an easy and useful way for Web administrators to authenticate users. Now it's an easy and useful way for malware authors and spammers to do their dirty work. By January 2008, Yahoo Mail's CAPTCHA had been cracked. Gmail was ripped open soon thereafter. Hotmail's top got popped in April. And then things got bad. There are now programs available online (no, we will not tell you where) that automate CAPTCHA attacks. You don't need to have any cracking skills. All you need is a desire to spread spam, make anonymous online attacks against your enemies, propagate malware or, in general, be an online jerk. And it's not just free e-mail sites that can be made to suffer..."
I hate the fact that a computer can view these things better than I can. Lately, a lot of the CAPTCHAs have become unreadable by human viewers.
Heh, at the end of the article they have a link to a site that requires you to solve a calculus problem to register (it gets easier if you reload the page a few times, down to simple arithmetic). I have a site that is only of interest to people who use verilog (a hardware design language) I've toyed with requiring a some digital logic problem to be solved, but the volume of spam signups it's big enough for me to be bothered yet...
Of course this solution isn't going to work for gmail - which seems to be the preferred email provider for the spam signups I do get these days.
ccalam - acoustic versions of new songs.
Combine it with a mix of simple math and image recognition? I.e.
"What colour hair does the (2+four)/3 girl from the left have?"
Hell, skip the math part if that's too easy.
We do not live in the 21st century. We live in the 20 second century.
Correct me if I'm wrong, but wouldn't something capable of "automating captcha attacks" be, um, a major advance in artificial cognition, and quite a wealth of scientific information, since that means it can solve an arbitrary captcha like a human can?
Information theory is life. The rest is just the KL divergence.
CAPTCHAs are only able to protect things worth $.0025, no matter how good they are. Simply because at about that price, you can pay humans to solve them for you.
Thus for preventing mail spam, it can work. But to prevent, say, bots from harvesting Ticketmaster, they will always fail, no matter how good they are.
Test your net with Netalyzr
But rather an over-reliance on turnkey solutions to the problem. The overwhelming majority of places that use them all use the same format (hard to read words) which in turn creates an incentive for someone to break it as it will be easily applied to other CAPTCHAs. The solution is for there to be a wide variety of them that come up at any given time of the "what number is on the picture of the girl in the blue shirt" one day, but "pick the picture of the elephant" a week later. I predict that a company like google will step up to implement a turnkey system like this for adwords users and the like in the near future.
Does anyone else find it as depressing as I do that such obviously intelligent, motivated individuals can't find a more productive use of their talents?
My blog
CAPTCHA is still useful for small to medium sites that aren't specifically targeted. Your average blog, for example, is only hit by random bots that try to get quick and easy posts. Only the largest sites like GMail need to find something better today.
For example, I use reCAPTCHA on DocForge to block the standard wiki spam bots. Since my site's not large enough to be under heavy attack very little gets through. Someday CAPTCHA may be so easy to break that everyone's at risk, but not today.
Developers: We can use your help.
Spammers are cracking some of the hardest problems of AI research.
How can they do that, and yet all the great academic minds can't? Two things:
* funding
* a willingness to use "anything that works"
What's really scary is that, in the end, spamming may turn out to be an agent of good.
How we know is more important than what we know.
Howcome /. is so spam free?
Do the hackers just not care about us,
or:
is this like one of those "safe zones" where geeks and hackers can hang out as long as nobody asks or tells? (looks at guy to his left..."say is that a CAPTCHA in your pocket or are you just excited to be here...")
Seven Days with Ubuntu Unity
Put 1,000 computers on the problem and allow them to share information from their successes ... and you've cracked a CAPTCHA implementation.
And there are hundreds of thousands of zombies out there.
This CAPTCHA has text from six emails. Five are randomly selected from those sent by people that have opened an email account in the past month. One is from an email account that is a honeypot. "Please select all emails that that are spam." Note, the obvious secondary benefit is that it is used as a spam detector. Then of course there is the simple rule: "Our free email accounts can not be used to send more than 20 emails per day. If you need more, please sign up for our deluxe account, that charges you $1 per year. of service"
excitingthingstodo.blogspot.com
it is no wonder that the "under 25" crowd now says "myspace me" or "facebook me" and no longer use email. why would they?
in a globally connected world with several billion possible users - open email simply won't work much longer.
when we need are permission based systems - ones in which people need permission before they can contact another person. it would eliminate spam entirely, by integrating whitelists into mail clients. because no one has built a system like this that leverages and extends existing email servers - private organizations leveraging social connections have moved in to fill the gap. sadly, because facebook messages and myspace messages are not built on an open standard - you have to go through those companies to contact people.
BONGARD PROBLEMS. No machine can crack them in at least 10 years time. And when one does, baby, we'll have genuine AI.
The first thing to actually pass the Turing test will probably be a spam-bot. Isn't that disgusting?
Much of this is finding a way to brute-force the methods used on particular sites, overwhelming randomness, etc. It's not really a computer reading any difficult text.
Nyet, but haf you conzidered ze amazing affordability uff zer timezhare at Lake Baikal? Operatorz iz schtanding by!
Any technology distinguishable from magic is insufficiently advanced.
The irony about this is that a CAPTCHA is a Turing test, a form of authentication designed to prove that a human is making the request. Given that some CAPTCHAs are rapidly becoming too hard for people to read, the outcomes of the tests are reversed - humans cannot win the test, only computers.
I have CAPTCHAs on my blog, but only deny posters who actually fill them in. Goes a long way to deterring spammers.
M
On gMail some simple rules should suffice. Don't allow a brand-new account to send out more than a few (20?) emails a day. Make sure that most of the email varies. Make sure the account gets and reads email as well as sends it, and that the email is accessed.
The trick is, you keep rotating these measures and don't tell anyone just what they are. You don't automatically disable anyone who breaks the rules, you just hold on to any large number of similar messages until a human reviews them--possibly through some mechanism similar to the "picture matching game" where multiple people identify a message as spam.
If it's determined to be spam, never tell them you caught on, just stop email from that account from being sent, silently. Log the ip addresses and use them to help you identify other accounts from the same computer if possible.
You could also use the ip addresses to notify people that they are a spambot next time that IP address is used to look up something on any google service.
Wow, that's a broad action with a lot of chances for failure, but I bet it could be refined enough to work--and worst case failure isn't bad at all--just one time when you go to search google you get a warning page back instead of your search results.
Really this just takes some dedicated effort and creative thinking by a strong, creative engineer with some power within google (I know there are quite a few of those)
Maybe the poster should've RTFA. But this is Slashdot after all. Nobody reads the articles.
http://it.slashdot.org/comments.pl?sid=467856&cid=22568696
You may be able to pay humans to solve them for you, but you can't pay humans to solve them for you at the same quantity. Human beings are slow and require extensive resources.
It makes a big difference when you're talking about creating a crime syndicate with thousands of employees vs. one lonely script kiddie. The former solution doesn't scale very well, and has a much higher barrier to entry. Even if you don't stop spam you are certainly cutting back on the quantity.
If they can break the captcha, that's a bit less helpful, because whoever did it can sell the solution. However, it's still better than if setting up an automated agent for spamming your site is nothing more than a scant few hours of work to anyone who can program. And the quicker you can change your captcha the less profitable/useful it becomes to crack it.
It's not about being utterly victorious. That would involve tracking down spammers and hiring hitmen to take them out. What it is about is harms mitigation, and captchas will still do that even after being broken.
I dunno. I recently installed reCaptcha on a site that received dozens of spam messages through its online forms, and they all instantly stopped. None of them have returned. It's a low-traffic site, but still... made me think reCaptcha was doing a decent job.
$nice = $webHosting + $domainNames + $sslCerts
This is misleadingly implies that CAPTCHA somehow enables spammers. On the contrary, broken CAPTCHA does not enable spammers to do anything they couldn't already do -- we're just back where we were before CAPTCHA.
And to be fair, CAPTCHA is still reducing the rate at which attackers are able to create accounts, keeping some smaller, less sophisticated players out of the game entirely, and protecting lower-value targets (e.g., most small-time bloggers with comment spam problems still see a drastic improvement when they set up CAPTCHA)
If everyone stopped using CAPTCHA, the spam problem would get noticeably worse.
In a Turing test, obviously, a human does the verification. Unless you have an army of extremely low-wage laborers doing the verification, or a machine capable of passing a real Turing test, the CAPTCHA will *never* work. The only solution for now, I think, would be to force multiple layers of authentication on users. ie, you can have your craigslist account, but you're gonna need to pay 2.95 S&H and wait 5-7 days to get your key chain dongle before you can log in. Obviously, the average user is not going to be up for that. So you're stuck with spam. It sucks, but there's no way around it.
I've toyed with the idea of making users write a 500 word essay on a random topic. I would then send this to my high school English teacher, and if it got maybe a B or above I would consider it legit.
Saying "I'll probably get modded down for this" in a post is the best way to get it modded up.
Fun fact, by replying to all his posts to call him an idiot you drastically increase his exposure. Ever hear of "don't feed the trolls"?
Obligitory XKCD reference: http://imgs.xkcd.com/comics/a_new_captcha_approach.png
Integrate OpenID based signatures with email by inserting a line into the email header.
Not a new idea, its the same old 3rd party trust situation-- so clearly the trusted OpenID servers would be targeted; however, if you added a simplistic peer ranking system on those user IDs (extending openID a little) then the bad IDs would get ranked down by real people.
This would also provide a means for verification for multiple emails used by the same individual's OpenID which could shield their actual identity (but not any better privacy than you have already.)
Additional headers for point of origin server could also be useful as some servers are less trust worthy than others (note: spam ranking is fuzzy and a slight nudge either way near the threshold value can make a noticeable difference. ) Server identity issues are already being worked on; but emails are not tied securely to the original server.
I'd like to see a standard email header line for spam ranking (0-100?); I'm sick of these "{spam?}" lines inserted in subject lines that I see time to time.
An OpenID based solution would get OpenID heavily tested since spammers may solve the big AI problems as well as letting us know where to get Viagra.
Democracy Now! - uncensored, anti-establishment news
The spammers have a new solution to CAPTCHAs in place - offshore outsourcing. This has become a sizable operation. System status earlier today:
Current Status: Volumes are exceedingly high. -- Automatically dispatching more labor
Queued Captchas: 91
Total outsourced volume: 4564301
This service is integrated with Craigslist auto posting tools, allowing high-speed spamming of Craigslist. It's also used for other services, like obtaining GMail accounts.
Even Craigslist's callback-by-phone system is starting to crack. Temporary phone numbers for Craiglist verification, provided by marginal telephony providers, have dropped to $1.50 in bulk.
The overall effect of Craigslist's new protections is that the cost of spamming has gone up, enough to slow down the low-rent operators but not by enough to stop it.
As I've pointed out previously, Google plays a central role in this. Google's services provide a facade of anonymity for scammers to hide behind. GMail for anonymous mail, YouTube for anonymous infomercials, AdWords for anonymous advertising, Checkout for anonymous money transfer, and Blogger/Blogspot for anonymous redirectors to zombie machines are all valuable services for scammers and spammers. All those services are used heavily by Craigslist spammers.
Others have provided some of the same services, but the competing services had bad reputations. Anybody trying to do business via Hotmail just had to be phony. Many mail agents just block all Hotmail mail. Anyone running a business off of "freewebpage.org" probably wasn't someone you'd want to deal with. So you had some strong indications of lack of legitimacy there.
Google, though, still has a good reputation. The combination of Google's reputation and low customer standards offers a great opportunity for scammers, and they're taking it.
Digital Spy have an interesting, but unfortunately very annoying, way of dealing with Captcha. If you sign up from a Hotmail, Gmail or Yahoo account, then you have to pay Digital Spy £5 to register that account. Business email addresses or ones from ISPs don't require a fee.
A simple albeit incredibly annoying solution.
Summation 2
I had thought of using something similar to what I have posted at the link below. The user must solve three of these in a row. Of course the number of fonts/numbers/backgrounds would be much large. Also I planned to introduce letters, letter pairs and shapes. But the key concept is that the instructions to solve are also embedded in the image. Much tougher I would think.
/. think?
And what does
Next gen CAPTCHA link here.
Note - this is just a random sample image, not an actual implementation.
Most of these attacks come from zombies, and I don't think anyone wants to block potential customers.
Though if they did, maybe people would start paying attention to computer security.
Liberte, Egalite, Fraternite (TM)
A good solution here is to include this as part of the turing test itself.
As I mentioned upthread, I'm a partner in a web dev shop. We do a lot of social networking (of course) and about a year ago we developed a utility to create just this type of turing test. For example, we'll have a picture, and ask the question "What is the color of the 3rd fish from the left?"
What we do, is we pair these tests on a page. We'll include a known test, like the one above. And we'll also show an unclassified image and we might ask "how many people are in this picture?"
There is no wrong answer for that test, and their answer is recorded. Soon, that same question will be asked for that same picture. As soon as its confirmed 2 times, it gets classified as having n people. Soon after it would be displayed again asking "how many females are in this pic?" or "what color shirt is the person on the right wearing?"
When we created the app, the DB had about 5000 turing tests in it. We then attached a DB of about 100,000 images that were pre-classified but not to an extent that would allow us to write a test off it.
Now, after a year in use across a couple dozen moderately trafficked websites, we have nearly 25,000 turing tests. All 20,000 new tests have been created thru the technique I described above.
The real reason we did it wasn't to save on some development costs. We could've hired temp workers and paid them $8 an hour to classify pictures.
We did it because I believe strongly that the key to simple turing tests like this is a large corpus of data. If a bot only encounters the same test once or twice EVER, then the problem becomes difficult to solve. This is like the ANTI-CAPTCHA.
CAPTCHA was all about taking a specific technique to its maximum extent: Challenge a computer system by taking a narrow field (OCR) and pushing it beyond the current state-of-the-art.
These tests are all about a general technique thats broad where CAPTCHA is just deep.
The only way to build a bot to solve each test in our DB would be to give it genuine intelligence. It would have to be capable of determining context, reference, connotation, image ID, etc.
As a programmer, if you say "Here's a captcha, write a program to solve it" I wouldn't know HOW, but I'd at least have an idea of where to begin.
Now, if you show me a picture with the turing test of "What object is in the hands of the 3rd woman from the left" ... well... i wouldn't know where to begin.
Now, now gents... No more of this alt.cascade shit -- USENET is dead, remember?
Method of processing duck feet
A lot of blind people surf the web too, you know. How do you think they like to be confronted with a CAPTCHA?
The end of CAPTCHAs is a win for web usability.
Search Engines help humans find web pages that the humans might find interesting, and they do this by having robots spider the web looking for patterns. Search Engine Optimizers try to get humans to read their customers' web pages in three ways:
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Humans may not be as fast as robots, but they can be surprisingly cheap. There's enough of the world where $1/hour* is an attractive wage that speak some English, and if the people there can solve a CAPTCHA in 9 seconds, that's at the $0.0025 price level that Nick was referring to. (Hi, Nick!)
If you're a scammer and there's a website that you want to crack, but it's not big enough to pay somebody to develop an algorithm for (either because the CAPTCHA's too hard or changes too often etc.), you can find some corrupt Nigerian generals' orphaned children who'll do it, or some Chinese guys who are tired of beating up monsters to get gold pieces or magic swords.
I don't know the going price of zombies or mail relay accounts, and it's probably dropping at faster than Moore's Law, but some sites are probably worth attacking.
* "Make good money $5 a day... Made any more I might move away..."
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
"What's the problem? The solution to the problem is simple... just solve it!"
Brilliant! Why didn't any of us think of that?
And your solution is...?
Please bear in mind "The system does not do X and Y" is not generally the form a real solution takes. Although it gives me one hell of an idea for the next joke computer language, one that requires you to enumerate all the things it shouldn't do...
The word is "use".
http://www.urbandictionary.com/define.php?term=leverage
Is logic puzzles. "You are in a room with three guards, one of these guards always lies, one of them always tells the truth, and one of them lets you register this email address. Who do you ask?" Let's see a computer solve that!
Way to go use a post about the cracking of captchas, which is done by the way using standard techniques developed by academic researchers and using the 'let an unwary human solve it to get to porn' approach, both of which were foreseen by researchers as reasons why captchas would not work in the long term, to deliver a baseless critique of academia.
Academia is probably the least dogmatic and bureaucratic environment there is. My personal experience with this comes from a physics lab, but I've heard similar stories from colleagues researching biology and information science, so I think this'll hold true for most exact sciences. People are researching whatever looks promising to them, sometimes radically changing the landscape of their field in the process.
Academics may start out as regular folk, but people do get smarter when they have to use their brain. Most academics are actually a lot smarter than normal folk, not because they were born smarter per se, but because they have during their career honed their thinking skills to an extent that normal people cannot even begin to appreciate. Thinking doesn't come naturally to people. When you're born, you're just a (relatively bad) pattern matcher, prone to seeing things that arent there, to invent causes where none exist. To get a grasp of logic, and how people often unwittingly abuse it, on the advanced math that is needed to understand how the world works, to understand how people can delude themselves, and so on, and of course to actually learn all the theory, you actually have to work hard. And in doing so, you will get smarter.
As for prior research being just a load of baggage, if people start to do research in field without prior knowledge, they almost always end up like Neal Adams.
Further, academia is made of critique. Academia is pretty much the only environment where really everything stands up for discussion and no theory or argument stands longer than the time it takes to refute it. Try to find that in the private sector or politics, with their power games, or the personal sphere where what counts is only the number of adherents of an idea, even if it's totally debunked. Oh the bitter irony of a Slashdotter accusing academia of groupthink.
Instead of solving the catchpa they want you to pay up for the payed service that doesn't have the catchpa.
Rapidshare WANTS to delay you and make it hard because the free users just cost them money.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
Has anyone tried flash for capatcha? Seems like that might stop em' for a little bit.
Or better yet Silverlight! That'll stop even more of 'em