Fallout From the Fall of CAPTCHAs
An anonymous reader recommends Computerworld's look at the rise and fall of CAPTCHAs, and at some of the ways bad guys are leveraging broken CAPTCHAs to ply their evil trade. "CAPTCHA used to be an easy and useful way for Web administrators to authenticate users. Now it's an easy and useful way for malware authors and spammers to do their dirty work. By January 2008, Yahoo Mail's CAPTCHA had been cracked. Gmail was ripped open soon thereafter. Hotmail's top got popped in April. And then things got bad. There are now programs available online (no, we will not tell you where) that automate CAPTCHA attacks. You don't need to have any cracking skills. All you need is a desire to spread spam, make anonymous online attacks against your enemies, propagate malware or, in general, be an online jerk. And it's not just free e-mail sites that can be made to suffer..."
I hate the fact that a computer can view these things better than I can. Lately, a lot of the CAPTCHAs have become unreadable by human viewers.
Heh, at the end of the article they have a link to a site that requires you to solve a calculus problem to register (it gets easier if you reload the page a few times, down to simple arithmetic). I have a site that is only of interest to people who use verilog (a hardware design language) I've toyed with requiring a some digital logic problem to be solved, but the volume of spam signups it's big enough for me to be bothered yet...
Of course this solution isn't going to work for gmail - which seems to be the preferred email provider for the spam signups I do get these days.
ccalam - acoustic versions of new songs.
Combine it with a mix of simple math and image recognition? I.e.
"What colour hair does the (2+four)/3 girl from the left have?"
Hell, skip the math part if that's too easy.
We do not live in the 21st century. We live in the 20 second century.
CAPTCHAs are only able to protect things worth $.0025, no matter how good they are. Simply because at about that price, you can pay humans to solve them for you.
Thus for preventing mail spam, it can work. But to prevent, say, bots from harvesting Ticketmaster, they will always fail, no matter how good they are.
Test your net with Netalyzr
CAPTCHA is still useful for small to medium sites that aren't specifically targeted. Your average blog, for example, is only hit by random bots that try to get quick and easy posts. Only the largest sites like GMail need to find something better today.
For example, I use reCAPTCHA on DocForge to block the standard wiki spam bots. Since my site's not large enough to be under heavy attack very little gets through. Someday CAPTCHA may be so easy to break that everyone's at risk, but not today.
Developers: We can use your help.
Spammers are cracking some of the hardest problems of AI research.
How can they do that, and yet all the great academic minds can't? Two things:
* funding
* a willingness to use "anything that works"
What's really scary is that, in the end, spamming may turn out to be an agent of good.
How we know is more important than what we know.
Howcome /. is so spam free?
Do the hackers just not care about us,
or:
is this like one of those "safe zones" where geeks and hackers can hang out as long as nobody asks or tells? (looks at guy to his left..."say is that a CAPTCHA in your pocket or are you just excited to be here...")
Seven Days with Ubuntu Unity
it is no wonder that the "under 25" crowd now says "myspace me" or "facebook me" and no longer use email. why would they?
in a globally connected world with several billion possible users - open email simply won't work much longer.
when we need are permission based systems - ones in which people need permission before they can contact another person. it would eliminate spam entirely, by integrating whitelists into mail clients. because no one has built a system like this that leverages and extends existing email servers - private organizations leveraging social connections have moved in to fill the gap. sadly, because facebook messages and myspace messages are not built on an open standard - you have to go through those companies to contact people.
BONGARD PROBLEMS. No machine can crack them in at least 10 years time. And when one does, baby, we'll have genuine AI.
The first thing to actually pass the Turing test will probably be a spam-bot. Isn't that disgusting?
The irony about this is that a CAPTCHA is a Turing test, a form of authentication designed to prove that a human is making the request. Given that some CAPTCHAs are rapidly becoming too hard for people to read, the outcomes of the tests are reversed - humans cannot win the test, only computers.
I have CAPTCHAs on my blog, but only deny posters who actually fill them in. Goes a long way to deterring spammers.
M
On gMail some simple rules should suffice. Don't allow a brand-new account to send out more than a few (20?) emails a day. Make sure that most of the email varies. Make sure the account gets and reads email as well as sends it, and that the email is accessed.
The trick is, you keep rotating these measures and don't tell anyone just what they are. You don't automatically disable anyone who breaks the rules, you just hold on to any large number of similar messages until a human reviews them--possibly through some mechanism similar to the "picture matching game" where multiple people identify a message as spam.
If it's determined to be spam, never tell them you caught on, just stop email from that account from being sent, silently. Log the ip addresses and use them to help you identify other accounts from the same computer if possible.
You could also use the ip addresses to notify people that they are a spambot next time that IP address is used to look up something on any google service.
Wow, that's a broad action with a lot of chances for failure, but I bet it could be refined enough to work--and worst case failure isn't bad at all--just one time when you go to search google you get a warning page back instead of your search results.
Really this just takes some dedicated effort and creative thinking by a strong, creative engineer with some power within google (I know there are quite a few of those)
This is misleadingly implies that CAPTCHA somehow enables spammers. On the contrary, broken CAPTCHA does not enable spammers to do anything they couldn't already do -- we're just back where we were before CAPTCHA.
And to be fair, CAPTCHA is still reducing the rate at which attackers are able to create accounts, keeping some smaller, less sophisticated players out of the game entirely, and protecting lower-value targets (e.g., most small-time bloggers with comment spam problems still see a drastic improvement when they set up CAPTCHA)
If everyone stopped using CAPTCHA, the spam problem would get noticeably worse.
I've toyed with the idea of making users write a 500 word essay on a random topic. I would then send this to my high school English teacher, and if it got maybe a B or above I would consider it legit.
Saying "I'll probably get modded down for this" in a post is the best way to get it modded up.
The best way I've seen that captcha's got broken are by "free porn sites". The web site is what is cracking another captcha. When it gets a captcha to solve, it passes it to one if it's "porn viewers" - "please type the word that this captcha says in order to prove you are old enough to view the porn". Then the porn is displayed and the bot running on the website has a potential solution made by a human to do it's botting with.
This method will suffice to crack ANY CAPTCHA!
--jeffk++
ipv6 is my vpn
The spammers have a new solution to CAPTCHAs in place - offshore outsourcing. This has become a sizable operation. System status earlier today:
Current Status: Volumes are exceedingly high. -- Automatically dispatching more labor
Queued Captchas: 91
Total outsourced volume: 4564301
This service is integrated with Craigslist auto posting tools, allowing high-speed spamming of Craigslist. It's also used for other services, like obtaining GMail accounts.
Even Craigslist's callback-by-phone system is starting to crack. Temporary phone numbers for Craiglist verification, provided by marginal telephony providers, have dropped to $1.50 in bulk.
The overall effect of Craigslist's new protections is that the cost of spamming has gone up, enough to slow down the low-rent operators but not by enough to stop it.
As I've pointed out previously, Google plays a central role in this. Google's services provide a facade of anonymity for scammers to hide behind. GMail for anonymous mail, YouTube for anonymous infomercials, AdWords for anonymous advertising, Checkout for anonymous money transfer, and Blogger/Blogspot for anonymous redirectors to zombie machines are all valuable services for scammers and spammers. All those services are used heavily by Craigslist spammers.
Others have provided some of the same services, but the competing services had bad reputations. Anybody trying to do business via Hotmail just had to be phony. Many mail agents just block all Hotmail mail. Anyone running a business off of "freewebpage.org" probably wasn't someone you'd want to deal with. So you had some strong indications of lack of legitimacy there.
Google, though, still has a good reputation. The combination of Google's reputation and low customer standards offers a great opportunity for scammers, and they're taking it.
Absolutely correct.
I run a mid-sized web development shop. A few years ago we were doing mostly retail sites. Vanilla and boring but we worked it down to a science and had some really great "modules" that made these sites super profitable for us. Of course, everything has its seedy side and with retail it was SEO.
Everybody wanted it. About 80% of our customers were of the "Do whatever, just ideminfy me" stripe. (And these are established companies paying high 5-figures for these sites). We drew our own demarcation about what we would and wouldn't do. (Excessive Internal-link structure is OK, zombie sites are not).
Now most our work is social networking.
We, too, followed the "rise" of CAPTCHA and we've been happy with our results. We always used a custom CAP for each site, and we tried to keep them relatively readable, being of the belief that making it too hard will only keep out Humans: If somebody wants to crack it, they will.
We still use them regularly. I noticed that about a year ago we actually had people begin to request them specifically. (Isn't that what Buffett said about the home mortgage mess? When the regular joe's started flipping houses, he knew it was over?)
Anyhoo, I think the real fault in CAP's is that they worked too well. They became too big of a target. Now, we try to mix and match a number of different techniques to identify humans.
Solutions range from dirt-simple: An input box named, say, "City" that has a label that reads "13 plus 8 equals:" or "What is the 3rd word on this page?"
To the more complex "what is the color of the front-door in this picture?"
We have a simple library we use for these things that pulls the questions (and, if applicable, the pics) from a Database of about 25,000 different turing tests.
The thing is, none of them are too complex. Any mediocre programmer could write an application to crack it. But your bot will probably never see that same exact question again, so it becomes irrelevant.
And, to tie it in to the parent, we chose this technique precicely because of what we learned from CAPs. Before there were software hacks, there was the "porn hack" and the "sweatshop labor hack."
In this case, when a bot the site, it's fairly difficult for it to even detect which item is the turing test. We auto-generate the location and even the name of the form field so it's always a bit different.
A good solution here is to include this as part of the turing test itself.
As I mentioned upthread, I'm a partner in a web dev shop. We do a lot of social networking (of course) and about a year ago we developed a utility to create just this type of turing test. For example, we'll have a picture, and ask the question "What is the color of the 3rd fish from the left?"
What we do, is we pair these tests on a page. We'll include a known test, like the one above. And we'll also show an unclassified image and we might ask "how many people are in this picture?"
There is no wrong answer for that test, and their answer is recorded. Soon, that same question will be asked for that same picture. As soon as its confirmed 2 times, it gets classified as having n people. Soon after it would be displayed again asking "how many females are in this pic?" or "what color shirt is the person on the right wearing?"
When we created the app, the DB had about 5000 turing tests in it. We then attached a DB of about 100,000 images that were pre-classified but not to an extent that would allow us to write a test off it.
Now, after a year in use across a couple dozen moderately trafficked websites, we have nearly 25,000 turing tests. All 20,000 new tests have been created thru the technique I described above.
The real reason we did it wasn't to save on some development costs. We could've hired temp workers and paid them $8 an hour to classify pictures.
We did it because I believe strongly that the key to simple turing tests like this is a large corpus of data. If a bot only encounters the same test once or twice EVER, then the problem becomes difficult to solve. This is like the ANTI-CAPTCHA.
CAPTCHA was all about taking a specific technique to its maximum extent: Challenge a computer system by taking a narrow field (OCR) and pushing it beyond the current state-of-the-art.
These tests are all about a general technique thats broad where CAPTCHA is just deep.
The only way to build a bot to solve each test in our DB would be to give it genuine intelligence. It would have to be capable of determining context, reference, connotation, image ID, etc.
As a programmer, if you say "Here's a captcha, write a program to solve it" I wouldn't know HOW, but I'd at least have an idea of where to begin.
Now, if you show me a picture with the turing test of "What object is in the hands of the 3rd woman from the left" ... well... i wouldn't know where to begin.
Search Engines help humans find web pages that the humans might find interesting, and they do this by having robots spider the web looking for patterns. Search Engine Optimizers try to get humans to read their customers' web pages in three ways:
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks