Slashdot Mirror


Fallout From the Fall of CAPTCHAs

An anonymous reader recommends Computerworld's look at the rise and fall of CAPTCHAs, and at some of the ways bad guys are leveraging broken CAPTCHAs to ply their evil trade. "CAPTCHA used to be an easy and useful way for Web administrators to authenticate users. Now it's an easy and useful way for malware authors and spammers to do their dirty work. By January 2008, Yahoo Mail's CAPTCHA had been cracked. Gmail was ripped open soon thereafter. Hotmail's top got popped in April. And then things got bad. There are now programs available online (no, we will not tell you where) that automate CAPTCHA attacks. You don't need to have any cracking skills. All you need is a desire to spread spam, make anonymous online attacks against your enemies, propagate malware or, in general, be an online jerk. And it's not just free e-mail sites that can be made to suffer..."

31 of 413 comments (clear)

  1. Cracaked CAPTHAs!!! oh no! by xpuppykickerx · · Score: 5, Interesting

    I hate the fact that a computer can view these things better than I can. Lately, a lot of the CAPTCHAs have become unreadable by human viewers.

    1. Re:Cracaked CAPTHAs!!! oh no! by xpuppykickerx · · Score: 2, Interesting

      It's come to a point where the messages are so jumbled, faded, etc etc that i'm avoiding sites that use them.

    2. Re:Cracaked CAPTHAs!!! oh no! by fm6 · · Score: 3, Interesting

      Or from failing 999 times out of 1,000. Computers have an infinite amount of patience. Security schemes that don't acknowledge that are doomed to failure.

    3. Re:Cracaked CAPTHAs!!! oh no! by Kismet · · Score: 2, Interesting

      If patience were something we could quantify reliably, I suspect that we would find computers to have none at all.

      The reason? Computers also have no boredom.

  2. Anyone usinging specialised tests? by niceone · · Score: 5, Interesting

    Heh, at the end of the article they have a link to a site that requires you to solve a calculus problem to register (it gets easier if you reload the page a few times, down to simple arithmetic). I have a site that is only of interest to people who use verilog (a hardware design language) I've toyed with requiring a some digital logic problem to be solved, but the volume of spam signups it's big enough for me to be bothered yet...

    Of course this solution isn't going to work for gmail - which seems to be the preferred email provider for the spam signups I do get these days.

    1. Re:Anyone usinging specialised tests? by stomv · · Score: 4, Interesting

      what is the opposite of up?
      what day is after friday?
      what does seven plus three equal?
      what letter of the alphabet comes before d?
      how many wheels does a bicycle have?
      what is the third word of this sentence?

      These are generally difficult for computers to solve, can be programed to have permutations, and since the quiz answer can be tied to the account, if a particular question or style is getting spammed frequently, it can be removed from the list of questions.

      It's an arms race, and this system won't work forever, but it's fairly easy to implement and fairly difficult to overcome.

    2. Re:Anyone usinging specialised tests? by Anonymous Coward · · Score: 1, Interesting

      You could just make the users sort through your archive of unknown kittycat (or whatever) photos.
      reCaptcha works by giving users two scrambled words a 'known' (by the system) one and an unknown one.

      If the user gets the known word, it's assumed (tentatively) that they also correctly entered the unknown word, which can then (after a few people supply the same answer for that word) become another known word.

      The nice thing about reCaptcha is that even if the 'bad guys' are running bots or (sweatshops) to solve the puzzles, at very least they are doing some useful work in helping to digitize texts.

    3. Re:Anyone usinging specialised tests? by Anonymous Coward · · Score: 1, Interesting

      I don't think you're thinking big enough. There was a /. a while back where they did this with an existing humane society database (millions of dogs and cats). However, the kicker was that you had to select all of the dogs or cats not just a random one.

    4. Re:Anyone usinging specialised tests? by mstahl · · Score: 2, Interesting

      Add random (but light) noise to the images while they're being served and randomize their filenames. There will be no way for an automated system to identify if it's been served the same image twice because the filename and checksum of the image would have been different.

  3. Mix it up a bit? by Hektor_Troy · · Score: 4, Interesting

    Combine it with a mix of simple math and image recognition? I.e.

    "What colour hair does the (2+four)/3 girl from the left have?"

    Hell, skip the math part if that's too easy.

    --
    We do not live in the 21st century. We live in the 20 second century.
  4. The problem isnt the CAPTCHA itself... by ragethehotey · · Score: 2, Interesting

    But rather an over-reliance on turnkey solutions to the problem. The overwhelming majority of places that use them all use the same format (hard to read words) which in turn creates an incentive for someone to break it as it will be easily applied to other CAPTCHAs. The solution is for there to be a wide variety of them that come up at any given time of the "what number is on the picture of the girl in the blue shirt" one day, but "pick the picture of the elephant" a week later. I predict that a company like google will step up to implement a turnkey system like this for adwords users and the like in the near future.

  5. The best part is.. by QuantumG · · Score: 4, Interesting

    Spammers are cracking some of the hardest problems of AI research.

    How can they do that, and yet all the great academic minds can't? Two things:

    * funding
    * a willingness to use "anything that works"

    What's really scary is that, in the end, spamming may turn out to be an agent of good.

    --
    How we know is more important than what we know.
  6. A dumb question: by AndGodSed · · Score: 4, Interesting

    Howcome /. is so spam free?

    Do the hackers just not care about us,
    or:
    is this like one of those "safe zones" where geeks and hackers can hang out as long as nobody asks or tells? (looks at guy to his left..."say is that a CAPTCHA in your pocket or are you just excited to be here...")

    1. Re:A dumb question: by p0tat03 · · Score: 3, Interesting

      Because it's difficult to get spam accounts *and* have good karma. Spam posts get modded to oblivion nice and quick :)

  7. And they share better. by khasim · · Score: 2, Interesting

    Put 1,000 computers on the problem and allow them to share information from their successes ... and you've cracked a CAPTCHA implementation.

    And there are hundreds of thousands of zombies out there.

    1. Re:And they share better. by statusbar · · Score: 5, Interesting

      The best way I've seen that captcha's got broken are by "free porn sites". The web site is what is cracking another captcha. When it gets a captcha to solve, it passes it to one if it's "porn viewers" - "please type the word that this captcha says in order to prove you are old enough to view the porn". Then the porn is displayed and the bot running on the website has a potential solution made by a human to do it's botting with.

      This method will suffice to crack ANY CAPTCHA!

      --jeffk++

      --
      ipv6 is my vpn
    2. Re:And they share better. by encoderer · · Score: 5, Interesting

      Absolutely correct.

      I run a mid-sized web development shop. A few years ago we were doing mostly retail sites. Vanilla and boring but we worked it down to a science and had some really great "modules" that made these sites super profitable for us. Of course, everything has its seedy side and with retail it was SEO.

      Everybody wanted it. About 80% of our customers were of the "Do whatever, just ideminfy me" stripe. (And these are established companies paying high 5-figures for these sites). We drew our own demarcation about what we would and wouldn't do. (Excessive Internal-link structure is OK, zombie sites are not).

      Now most our work is social networking.

      We, too, followed the "rise" of CAPTCHA and we've been happy with our results. We always used a custom CAP for each site, and we tried to keep them relatively readable, being of the belief that making it too hard will only keep out Humans: If somebody wants to crack it, they will.

      We still use them regularly. I noticed that about a year ago we actually had people begin to request them specifically. (Isn't that what Buffett said about the home mortgage mess? When the regular joe's started flipping houses, he knew it was over?)

      Anyhoo, I think the real fault in CAP's is that they worked too well. They became too big of a target. Now, we try to mix and match a number of different techniques to identify humans.

      Solutions range from dirt-simple: An input box named, say, "City" that has a label that reads "13 plus 8 equals:" or "What is the 3rd word on this page?"

      To the more complex "what is the color of the front-door in this picture?"

      We have a simple library we use for these things that pulls the questions (and, if applicable, the pics) from a Database of about 25,000 different turing tests.

      The thing is, none of them are too complex. Any mediocre programmer could write an application to crack it. But your bot will probably never see that same exact question again, so it becomes irrelevant.

      And, to tie it in to the parent, we chose this technique precicely because of what we learned from CAPs. Before there were software hacks, there was the "porn hack" and the "sweatshop labor hack."

      In this case, when a bot the site, it's fairly difficult for it to even detect which item is the turing test. We auto-generate the location and even the name of the form field so it's always a bit different.

    3. Re:And they share better. by Ploum · · Score: 2, Interesting

      I've build my own "invisible" captcha mechanism : http://ploum.frimouvy.org/?150-the-invisible-captcha-mechanism-icm-against-form-spam And in 2 years, it was so efficient that I almost completely forgot the existence of spam on my blog. And nobody ever complained about a false positive. The only drawback I see is that if you write a script to attack me now, it could work well and spam me for one day before my captcha block it.

  8. Suggested New CAPTCHA method. by gurps_npc · · Score: 2, Interesting

    This CAPTCHA has text from six emails. Five are randomly selected from those sent by people that have opened an email account in the past month. One is from an email account that is a honeypot. "Please select all emails that that are spam." Note, the obvious secondary benefit is that it is used as a spam detector. Then of course there is the simple rule: "Our free email accounts can not be used to send more than 20 emails per day. If you need more, please sign up for our deluxe account, that charges you $1 per year. of service"

    --
    excitingthingstodo.blogspot.com
  9. Re:Depressing by cowscows · · Score: 3, Interesting

    It's depressing to me that things like viagra spam are still profitable enough to make spamming them financially useful. Sure, the way the economics of it work out you only need a really low response rate to break even, but hasn't everyone already gotten enough of those emails? I'd imagine that whatever market there is for sketch viagra distributors would be saturated by now.

    At least with phishing spam I get to see new scams on a regular basis (some quite cleverly disgused too). But some of the more vanilla spam just seems pointless.

    --

    One time I threw a brick at a duck.

  10. Re:fall of open email by 91degrees · · Score: 2, Interesting

    There's spam on myspace. I get people friending my virtually empty page from time to time. Myspace deletes them pretty quickly but I presume they just have a front page with a load of spam on it.

  11. Re:Bound to happen by Dekortage · · Score: 2, Interesting

    I dunno. I recently installed reCaptcha on a site that received dozens of spam messages through its online forms, and they all instantly stopped. None of them have returned. It's a low-traffic site, but still... made me think reCaptcha was doing a decent job.

    --
    $nice = $webHosting + $domainNames + $sslCerts
  12. Re:The Irony by Telecommando · · Score: 4, Interesting

    Interesting.

    A few months ago I tried to post on a blog (sorry, I forget which one), entered the CAPTCHA and got a message that I was a suspected bot and my IP address was banned from posting for 48 hours.

    I went back and carefully read the terms of use (just above the posting window) and buried in the middle of the terms was the phrase, "Do not enter the captcha, instead enter the first three letters of the fifteenth word in the second paragraph followed by the third word after the eighth word in the first paragraph in all capital letters."

    A neat idea, but I suppose it won't be long before that one is cracked as well.

    --
    Beta sux! Join the Slashcott! http://hardware.slashdot.org/comments.pl?sid=4760465&cid=46173047
  13. OpenID signatures by bussdriver · · Score: 3, Interesting

    Integrate OpenID based signatures with email by inserting a line into the email header.

    Not a new idea, its the same old 3rd party trust situation-- so clearly the trusted OpenID servers would be targeted; however, if you added a simplistic peer ranking system on those user IDs (extending openID a little) then the bad IDs would get ranked down by real people.

    This would also provide a means for verification for multiple emails used by the same individual's OpenID which could shield their actual identity (but not any better privacy than you have already.)

    Additional headers for point of origin server could also be useful as some servers are less trust worthy than others (note: spam ranking is fuzzy and a slight nudge either way near the threshold value can make a noticeable difference. ) Server identity issues are already being worked on; but emails are not tied securely to the original server.

    I'd like to see a standard email header line for spam ranking (0-100?); I'm sick of these "{spam?}" lines inserted in subject lines that I see time to time.

    An OpenID based solution would get OpenID heavily tested since spammers may solve the big AI problems as well as letting us know where to get Viagra.

  14. blacklists by mcelrath · · Score: 1, Interesting

    Why isn't anyone making systematic IP blacklists? I mean, after the usual kind of spam crap, you've just identified the attacker, or a piece of a botnet. Keep it all in a list and just deny those IP any access at all. (e.g. firewall rules) By sharing these rules, you nullify the effect of the botnets. Tough shit for the people with cracked computers. They should have been more dilligent in applying patches...

    I do this with denyhosts which checks logs for ssh dictionary attacks and then blocks them. By sharing these lists, and cross referencing them between different hosts, you should have a very reliable list, and can remove the effect of IP spoofing which may be possible with some protocols/attacks.

    --
    1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.
  15. Digital Spy by Rik+Sweeney · · Score: 2, Interesting

    Digital Spy have an interesting, but unfortunately very annoying, way of dealing with Captcha. If you sign up from a Hotmail, Gmail or Yahoo account, then you have to pay Digital Spy £5 to register that account. Business email addresses or ones from ISPs don't require a fee.

    A simple albeit incredibly annoying solution.

  16. I've been playing around with next gen CAPTCHAs... by Panaqqa · · Score: 2, Interesting

    I had thought of using something similar to what I have posted at the link below. The user must solve three of these in a row. Of course the number of fonts/numbers/backgrounds would be much large. Also I planned to introduce letters, letter pairs and shapes. But the key concept is that the instructions to solve are also embedded in the image. Much tougher I would think.

    And what does /. think?

    Next gen CAPTCHA link here.

    Note - this is just a random sample image, not an actual implementation.

  17. Re:Just use by linhares · · Score: 2, Interesting

    Where do I get the 10 year figure? easy... Harry Foundalis, a former Ph.D. student under Douglas Hofstadter, spent 11 years on his thesis. It's a profoundly brilliant piece of work. However, it can only solve 15 problems, out of hundreds and hundreds tried. BPs require bottom-up, data-driven, perception processing, and top-down, hypothesis-driven, conceptual processing, both intermingled, as argued in the AI paper. In other words, you have to look and create concepts on-the-fly about what's going on. You can't take objects for granted. BP91, for example, has different, incompatible interpretations of boxes. That is why we need flexibility way beyond what's available today.

  18. A good solution here... by encoderer · · Score: 4, Interesting

    A good solution here is to include this as part of the turing test itself.

    As I mentioned upthread, I'm a partner in a web dev shop. We do a lot of social networking (of course) and about a year ago we developed a utility to create just this type of turing test. For example, we'll have a picture, and ask the question "What is the color of the 3rd fish from the left?"

    What we do, is we pair these tests on a page. We'll include a known test, like the one above. And we'll also show an unclassified image and we might ask "how many people are in this picture?"

    There is no wrong answer for that test, and their answer is recorded. Soon, that same question will be asked for that same picture. As soon as its confirmed 2 times, it gets classified as having n people. Soon after it would be displayed again asking "how many females are in this pic?" or "what color shirt is the person on the right wearing?"

    When we created the app, the DB had about 5000 turing tests in it. We then attached a DB of about 100,000 images that were pre-classified but not to an extent that would allow us to write a test off it.

    Now, after a year in use across a couple dozen moderately trafficked websites, we have nearly 25,000 turing tests. All 20,000 new tests have been created thru the technique I described above.

    The real reason we did it wasn't to save on some development costs. We could've hired temp workers and paid them $8 an hour to classify pictures.

    We did it because I believe strongly that the key to simple turing tests like this is a large corpus of data. If a bot only encounters the same test once or twice EVER, then the problem becomes difficult to solve. This is like the ANTI-CAPTCHA.

    CAPTCHA was all about taking a specific technique to its maximum extent: Challenge a computer system by taking a narrow field (OCR) and pushing it beyond the current state-of-the-art.

    These tests are all about a general technique thats broad where CAPTCHA is just deep.

    The only way to build a bot to solve each test in our DB would be to give it genuine intelligence. It would have to be capable of determining context, reference, connotation, image ID, etc.

    As a programmer, if you say "Here's a captcha, write a program to solve it" I wouldn't know HOW, but I'd at least have an idea of where to begin.

    Now, if you show me a picture with the turing test of "What object is in the hands of the 3rd woman from the left" ... well... i wouldn't know where to begin.

    1. Re:A good solution here... by Mr2001 · · Score: 2, Interesting

      What we do, is we pair these tests on a page. We'll include a known test, like the one above. And we'll also show an unclassified image and we might ask "how many people are in this picture?"

      This is basically what reCAPTCHA does, although they only use words. They take images of words that off-the-shelf OCR software failed to read, apply more distortions, and serve them up two at a time. One of the words is known; the other is unknown but becomes known after enough people have submitted the same answer.

      And as a bonus, the answers aren't just used to grant access to a web site - they're used to digitize the old books that the images came from in the first place.

      --
      Visual IRC: Fast. Powerful. Free.
  19. SEOs - Lying to Robots so Robots Lie to Humans by billstewart · · Score: 5, Interesting

    Search Engines help humans find web pages that the humans might find interesting, and they do this by having robots spider the web looking for patterns. Search Engine Optimizers try to get humans to read their customers' web pages in three ways:

    • Making it easy for the robots to find the content. Google's how-to page tells you pretty much everything you need to know, and it's not hard, but I guess there are companies who want to hire somebody to clean up their web page structure for them instead of doing the work themselves, or to tell their graphic designers to stop using complex Flash-based mouseover gesture interactions instead of simpler links and good indexing. Usually people who do that call themselves "consultants" or "web designers" instead of "SEOs", but not always.
    • Helping their customers write more interesting web pages instead of boring ones. Usually people who do that call themselves "editors" or "content consultants" or whatever instead of "SEOs", but not always.
    • Lying to the search engines' robots so that the customers' uninteresting-to-humans web pages match patterns that the robots identify as "interesting", so the robots will lie to humans about the interestingness of those pages. Sometimes this includes building link farms or generating vast reams of uninteresting content with popular keywords and ad banners or kiting millions of domain names. Usually people who do this call themselves "SEOs" or "Search Engine Optimization Consultants" instead of "lying scum polluting the Internet". But sometimes they pretend to be something else, like "Advertising specialists" or whatever.
    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks