Slashdot Mirror


HTML Encoded Captchas

rangeva writes to tell us about a twist he has developed on the common Captcha technique to discourage spam bots: HECs encode the Captcha image into HTML, thus presenting an unsolved challenge to the bots' programmers. From the writeup: "The Captcha is no longer an image and therefore not a resource they can download and process. The owner of the site can change the properties of the Captcha's HTML, making it unique,... add[ing] another layer of complication for the bot to crack." HECs are not exactly lightweight — the one on the linked page weighs in at 218K — but this GPL'd project seems like a nice advance on the state of the art.

5 of 177 comments (clear)

  1. Bad form by Zaph0dB · · Score: 5, Insightful

    I think using a captcha like this one (html-table rendered) is bad web-manners. The rendering of such a table, pixel by pixel, is a huge toll on browsers. Even on my (relatively) new and (relatively) powerful machine, it took Firefox a noticeable amount of time to render the image, and caused my hard drive to crunch a little. I don't even want to imagine less powerful machines or, random-fluctuation-of-time-and-space forbid, mobile devices. All in all, I think this method severely limits the users accessing this site.

    --
    When in danger or in doubt, run in circles, scream and shout [Robert Heinlein]
  2. Re:I failed to see how this'll help by rangeva · · Score: 5, Insightful

    "so all a bot has to do is use a html renderer to turn it into a regular image that can be processed"

    It's not that simple. Since the Captcha is no longer an image that you can download, the bot will first has to locate the position of the Captcha. The owner of the site can modify the layout of the page and Captcha making it unique. By rendering the image into HTML you practically modify to encoding of the image to a new and unique one - making it highly difficult to create a generic bot that will learn to decode all the HTML variations out there.

    The problem today is with automated software that download the Captcha images from a pre-defined location (URL) and crack them. HECs makes it much harder to locate this resource.

    Oh and everything is Crackable;)

  3. Re:I failed to see how this'll help by Aladrin · · Score: 3, Insightful

    I should have added this disclaimer to the post:

    Yes, I see that they recommend adding in random divs and crap. If it's still a table, it's still very very easy to parse, even without a parser. If they intend for you to replace the table with 'random elements' ... Do you KNOW how hard it would be to get it to show up correctly on each different browser? Another nightmare.

    --
    "If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
  4. Re:What are the gotchas with these captchas by YrWrstNtmr · · Score: 4, Insightful

    Blind, color blind, text only browsers, more of a hassle, just to name a few.

  5. Captcha's are annoying by tacocat · · Score: 4, Insightful

    While this has little to do with the original post I have a really annoying experience with captchas

    I have 20/20 vision and am not color blind. Captchas are becoming so complicated and garbled that I get the code wrong about 40% of the time. Another portion of the time I take to long trying to answer the code question and type in the right characters. I typically get screwed on the number Zero and the letter 'O' and lowercase 'L' and the number 1.

    It'b becoming, for me, an entry barrier to signing up and gaining access to websites. It would be much easier to simply use email authentication. What do you do with the people who are color blind? I spent some years dealing with display design and this was a legitimate concern that we addressed at the time for a specialized group of people. In the common population there are a lot more occurrences of people who are color blind.

    Are captcha's really worth the effort compared to other more human friendly processes? Is anyone working on what we will be doing next? Considering that there are decades of technology in machine vision technology to pull from I think it will be fairly trivial for the bots to become better at reading captchas than humans.

    It might be effective to take the email authentication process and apply everything that mail servers do to authenticate the user. What I mean by this is apply all the mail server rules like FQDN requirements for HELO, fully resolvable email domains, valid email addresses, non-open relays. Much of this would eliminate either the bots or the ISP's who are too stupid to properly configure a mail server. Similarly it might be sufficient to code the HTML/HTTP to expect a properly responding client and not some hacked up bot that can't do most of it right.