Slashdot Mirror


HTML Encoded Captchas

rangeva writes to tell us about a twist he has developed on the common Captcha technique to discourage spam bots: HECs encode the Captcha image into HTML, thus presenting an unsolved challenge to the bots' programmers. From the writeup: "The Captcha is no longer an image and therefore not a resource they can download and process. The owner of the site can change the properties of the Captcha's HTML, making it unique,... add[ing] another layer of complication for the bot to crack." HECs are not exactly lightweight — the one on the linked page weighs in at 218K — but this GPL'd project seems like a nice advance on the state of the art.

7 of 177 comments (clear)

  1. I failed to see how this'll help by Rosco+P.+Coltrane · · Score: 5, Interesting

    At the end of the day, this captcha is displayed on the screen as a colorful harder-to-read mumbo-jumbo, just like jpeg captchas, so all a bot has to do is use a html renderer to turn it into a regular image that can be processed. So the added complication is linking one of the existing captcha decoders and the gecko engine for example, maybe a half day's work. Not exactly uncrackable...

    --
    "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
    1. Re:I failed to see how this'll help by rangeva · · Score: 5, Insightful

      "so all a bot has to do is use a html renderer to turn it into a regular image that can be processed"

      It's not that simple. Since the Captcha is no longer an image that you can download, the bot will first has to locate the position of the Captcha. The owner of the site can modify the layout of the page and Captcha making it unique. By rendering the image into HTML you practically modify to encoding of the image to a new and unique one - making it highly difficult to create a generic bot that will learn to decode all the HTML variations out there.

      The problem today is with automated software that download the Captcha images from a pre-defined location (URL) and crack them. HECs makes it much harder to locate this resource.

      Oh and everything is Crackable;)

  2. Bad form by Zaph0dB · · Score: 5, Insightful

    I think using a captcha like this one (html-table rendered) is bad web-manners. The rendering of such a table, pixel by pixel, is a huge toll on browsers. Even on my (relatively) new and (relatively) powerful machine, it took Firefox a noticeable amount of time to render the image, and caused my hard drive to crunch a little. I don't even want to imagine less powerful machines or, random-fluctuation-of-time-and-space forbid, mobile devices. All in all, I think this method severely limits the users accessing this site.

    --
    When in danger or in doubt, run in circles, scream and shout [Robert Heinlein]
  3. workaround... by zozzi · · Score: 5, Informative
    Spammers already have a workaround for catchpas:

    1. Show the image in an alternate pornographic/warez/whatever website

    2. Ask the user to type it in to access the site

    3. Use the user's input to access the original protected site

    4. There is no step 4.

    --
    ---
    1. Re:workaround... by Phillup · · Score: 5, Funny

      When it comes to porn, I'm no slouch and I can count the number of times I've seen sites that give you free access after entering a captcha on one hand.

      One hand eh?

      Guess we don't really need to ask how you know this...

      --

      --Phillip

      Can you say BIRTH TAX
  4. Broken by Kurayamino-X · · Score: 5, Interesting

    All text based captcha's are broken, it doesn't matter how they're rendered, they're still a pre-defined set of characters that a bot can pick out eventually. Now, the "Click three kittens" captcha, that was fucking genious, no bot on the planet will be able to tell the difference between a kitten and a ham sandwich. Why isn't it being used? People seem to think obscuring text and making it harder for humans to read is a better idea than using something a computer will not be able to identify.

    --
    ...I got nothing.
  5. No need to download the image by lintux · · Score: 5, Interesting

    There's no need to download the image. Look at the source. Somewhere it says:

    Now, just go to MD5Lookup.Com and convert that little "hidden" MD5Sum back to the original text:

    ad6ade8a0b6e2f748b80a390ff45cf31 - &NMTB

    Maybe the author should add some salt. :-)