Slashdot Mirror


HTML Encoded Captchas

rangeva writes to tell us about a twist he has developed on the common Captcha technique to discourage spam bots: HECs encode the Captcha image into HTML, thus presenting an unsolved challenge to the bots' programmers. From the writeup: "The Captcha is no longer an image and therefore not a resource they can download and process. The owner of the site can change the properties of the Captcha's HTML, making it unique,... add[ing] another layer of complication for the bot to crack." HECs are not exactly lightweight — the one on the linked page weighs in at 218K — but this GPL'd project seems like a nice advance on the state of the art.

15 of 177 comments (clear)

  1. I failed to see how this'll help by Rosco+P.+Coltrane · · Score: 5, Interesting

    At the end of the day, this captcha is displayed on the screen as a colorful harder-to-read mumbo-jumbo, just like jpeg captchas, so all a bot has to do is use a html renderer to turn it into a regular image that can be processed. So the added complication is linking one of the existing captcha decoders and the gecko engine for example, maybe a half day's work. Not exactly uncrackable...

    --
    "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
    1. Re:I failed to see how this'll help by Aladrin · · Score: 3, Interesting

      Even worse, this catcha would be -easier- than a regular one. It lists every pixel as a TD, in rows... So easy to render that it's idiotic. And the image itself is simple as well... The background letters are much lighter in color and could easily be filtered.

      Add in the huge size of the html and the annoyance factor of captchas in general, and this is amazingly stupid.

      --
      "If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
    2. Re:I failed to see how this'll help by Anonymous Coward · · Score: 1, Interesting

      Do you really think it's going to be a problem? A dynamic page keeps a given structure and therefore I say it takes, in the worst scenario, 10 minutes - to figure out how to extract the data you need to decode the captcha. Even if you move the text around, that's still going to be done programmatically, and that is a big limitation, isn't it?

      What would I do? simply look for all the td's with one single colored pixel, and then count the tr's inbetween.

      Everything else is made easier as the chance is given, in fact, of developing a successful and simple scanner without the need for third party modules (gd, image::magick et similia).

      Give up. If i can read that, i know i'm going to be able to make a script that just does that. This is just not the way.
      You can make a script that makes things difficult on me, but that's just delaying the day where the captcha will be broken.

      Stefano

  2. Render, PrintScr, OCR? by Frogular · · Score: 3, Interesting

    Can't the bot simply render and OCR it?

    A better solution might be the authentication system old 386 games had where you have to do some simple but human intelligence requiring task. "Find the word in the upper right of manual pg 4" -> "Enter the 3rd word from the following paragraph"

  3. watermarking by dattaway · · Score: 2, Interesting

    How about watermarking the captcha with the site's address and a short message?

  4. Spy vs spy by Anonymous Coward · · Score: 1, Interesting

    This scheme will work until it is widely enough used that it is worth the spammers' while to write a crack. As the author suggests, the ultimate solution is probably to have so many of these schemes that the spammers can't keep up.

    I have a question. How much of a problem are these spammed responses to blogs. I go to several blogs that don't have captchas and haven't noticed anything that could be called spam. Is this a response to a non-problem?

  5. A captcha is still a captcha by Cee · · Score: 4, Interesting

    One of the main objections of a captcha is that an attacker could steal the image file and simply use it on their site (XXX sites...) to get it "cracked".
    A HTML generated captcha would prevent that, since there is no image file to copy.
    However, what prevents the attacker to simply copy the relevant HTML source and put it on his or her site, just like the image? Sure, you can make it quite complicated by adding CSS layers and whatnot, but in the end that would just merely be an extra annoyance.

    And stopping the attacker on using OCR on the captcha won't really work either. It's not that hard to render HTML code to an image, which you can feed to the OCR software.

    In short, this hack is just another step in the arms race, that just buys us some time.

  6. Do others use such spam-bot blockers? by msobkow · · Score: 2, Interesting

    I've had sessions that took an inordinately long time to initialize with various web service providers (it's very noticeable on dial-up.) I'm wondering whether similar techniques might be used to attack rather than defend, possibly including rogue AJAX code.

    --
    I do not fail; I succeed at finding out what does not work.
  7. Screen Captcha! by mrmeval · · Score: 2, Interesting

    It's easy no?

    The file size is what intriques me. Just make a 'hidden' captcha that a bot would download. Now figure out how to make a jpeg decompressor uncompress that to 2 gigs or better.

    It's like the old "I'll compress 2gigs of the letter A with zip and upload it to that BBS and let the virus checker gag" gag.

    Or maybe a gif file. I wonder how solid black or white compress......

    --
    I'd go on a Vegan diet but the delivery time from Vega is too long. --brownkitty
  8. Lunacy by Stormx2 · · Score: 4, Interesting

    Lunacy! I've made apps which can do this sort of thing before, and this one is totally unoptimized! Take a look at this:

    With the limited amount of colours used, it would make much more sense to
    a) give the table an id, then:
    table.tabid td { width:1px; height:1px; )
    b) give some classes for each colour used
    td.colid { background-color: blah; }

    I'm sure that would half the source code size... How can you trust a HTML solution that hasn't even been properly thought through?

  9. Processing by jones_supa · · Score: 2, Interesting

    The Captcha is no longer an image and therefore not a resource they can download and process.

    Err...but the HTML captcha is a resource they can download and process.

  10. Broken by Kurayamino-X · · Score: 5, Interesting

    All text based captcha's are broken, it doesn't matter how they're rendered, they're still a pre-defined set of characters that a bot can pick out eventually. Now, the "Click three kittens" captcha, that was fucking genious, no bot on the planet will be able to tell the difference between a kitten and a ham sandwich. Why isn't it being used? People seem to think obscuring text and making it harder for humans to read is a better idea than using something a computer will not be able to identify.

    --
    ...I got nothing.
  11. A matter of time by superbrose · · Score: 2, Interesting

    The advantage of this captcha is that it is not widespread yet and so the chances that a bot can crack it are lower.

    Funny that when OCR software is supposed to work it often fails, but when there is some effort to hinder recognition then bots can deal with that. Maybe general OCR software should try to crack input instead!

  12. Congratulations... FOOL! by Tom · · Score: 1, Interesting

    Great, so blocking images in E-Mail will no longer get those image-spams thrown out, because now a bright-but-not-intelligent geek has given the spammer assholes a way to encode their crap in simple HTML which no spam filter will manage to get.

    Congratulations. How much did they pay you?

    Oh, as for the "official" purpose. I give it a life expectancy of 3 weeks before the spammers have found a way around it. If they bother at all.

    --
    Assorted stuff I do sometimes: Lemuria.org
  13. No need to download the image by lintux · · Score: 5, Interesting

    There's no need to download the image. Look at the source. Somewhere it says:

    Now, just go to MD5Lookup.Com and convert that little "hidden" MD5Sum back to the original text:

    ad6ade8a0b6e2f748b80a390ff45cf31 - &NMTB

    Maybe the author should add some salt. :-)