HTML Encoded Captchas
rangeva writes to tell us about a twist he has developed on the common Captcha technique to discourage spam bots:
HECs encode the Captcha image into HTML, thus presenting an unsolved challenge to the bots' programmers. From the writeup: "The Captcha is no longer an image and therefore not a resource they can download and process. The owner of the site can change the properties of the Captcha's HTML, making it unique,... add[ing] another layer of complication for the bot to crack." HECs are not exactly lightweight — the one on the linked page weighs in at 218K — but this GPL'd project seems like a nice advance on the state of the art.
At the end of the day, this captcha is displayed on the screen as a colorful harder-to-read mumbo-jumbo, just like jpeg captchas, so all a bot has to do is use a html renderer to turn it into a regular image that can be processed. So the added complication is linking one of the existing captcha decoders and the gecko engine for example, maybe a half day's work. Not exactly uncrackable...
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
Can't the bot simply render and OCR it?
A better solution might be the authentication system old 386 games had where you have to do some simple but human intelligence requiring task. "Find the word in the upper right of manual pg 4" -> "Enter the 3rd word from the following paragraph"
How about watermarking the captcha with the site's address and a short message?
This scheme will work until it is widely enough used that it is worth the spammers' while to write a crack. As the author suggests, the ultimate solution is probably to have so many of these schemes that the spammers can't keep up.
I have a question. How much of a problem are these spammed responses to blogs. I go to several blogs that don't have captchas and haven't noticed anything that could be called spam. Is this a response to a non-problem?
One of the main objections of a captcha is that an attacker could steal the image file and simply use it on their site (XXX sites...) to get it "cracked".
A HTML generated captcha would prevent that, since there is no image file to copy.
However, what prevents the attacker to simply copy the relevant HTML source and put it on his or her site, just like the image? Sure, you can make it quite complicated by adding CSS layers and whatnot, but in the end that would just merely be an extra annoyance.
And stopping the attacker on using OCR on the captcha won't really work either. It's not that hard to render HTML code to an image, which you can feed to the OCR software.
In short, this hack is just another step in the arms race, that just buys us some time.
I've had sessions that took an inordinately long time to initialize with various web service providers (it's very noticeable on dial-up.) I'm wondering whether similar techniques might be used to attack rather than defend, possibly including rogue AJAX code.
I do not fail; I succeed at finding out what does not work.
It's easy no?
The file size is what intriques me. Just make a 'hidden' captcha that a bot would download. Now figure out how to make a jpeg decompressor uncompress that to 2 gigs or better.
It's like the old "I'll compress 2gigs of the letter A with zip and upload it to that BBS and let the virus checker gag" gag.
Or maybe a gif file. I wonder how solid black or white compress......
I'd go on a Vegan diet but the delivery time from Vega is too long. --brownkitty
Lunacy! I've made apps which can do this sort of thing before, and this one is totally unoptimized! Take a look at this:
With the limited amount of colours used, it would make much more sense to
a) give the table an id, then:
table.tabid td { width:1px; height:1px; )
b) give some classes for each colour used
td.colid { background-color: blah; }
I'm sure that would half the source code size... How can you trust a HTML solution that hasn't even been properly thought through?
The Captcha is no longer an image and therefore not a resource they can download and process.
Err...but the HTML captcha is a resource they can download and process.
All text based captcha's are broken, it doesn't matter how they're rendered, they're still a pre-defined set of characters that a bot can pick out eventually. Now, the "Click three kittens" captcha, that was fucking genious, no bot on the planet will be able to tell the difference between a kitten and a ham sandwich. Why isn't it being used? People seem to think obscuring text and making it harder for humans to read is a better idea than using something a computer will not be able to identify.
...I got nothing.
The advantage of this captcha is that it is not widespread yet and so the chances that a bot can crack it are lower.
Funny that when OCR software is supposed to work it often fails, but when there is some effort to hinder recognition then bots can deal with that. Maybe general OCR software should try to crack input instead!
Great, so blocking images in E-Mail will no longer get those image-spams thrown out, because now a bright-but-not-intelligent geek has given the spammer assholes a way to encode their crap in simple HTML which no spam filter will manage to get.
Congratulations. How much did they pay you?
Oh, as for the "official" purpose. I give it a life expectancy of 3 weeks before the spammers have found a way around it. If they bother at all.
Assorted stuff I do sometimes: Lemuria.org
There's no need to download the image. Look at the source. Somewhere it says:
:-)
Now, just go to MD5Lookup.Com and convert that little "hidden" MD5Sum back to the original text:
ad6ade8a0b6e2f748b80a390ff45cf31 - &NMTB
Maybe the author should add some salt.