HTML Encoded Captchas
rangeva writes to tell us about a twist he has developed on the common Captcha technique to discourage spam bots:
HECs encode the Captcha image into HTML, thus presenting an unsolved challenge to the bots' programmers. From the writeup: "The Captcha is no longer an image and therefore not a resource they can download and process. The owner of the site can change the properties of the Captcha's HTML, making it unique,... add[ing] another layer of complication for the bot to crack." HECs are not exactly lightweight — the one on the linked page weighs in at 218K — but this GPL'd project seems like a nice advance on the state of the art.
At the end of the day, this captcha is displayed on the screen as a colorful harder-to-read mumbo-jumbo, just like jpeg captchas, so all a bot has to do is use a html renderer to turn it into a regular image that can be processed. So the added complication is linking one of the existing captcha decoders and the gecko engine for example, maybe a half day's work. Not exactly uncrackable...
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
I think using a captcha like this one (html-table rendered) is bad web-manners. The rendering of such a table, pixel by pixel, is a huge toll on browsers. Even on my (relatively) new and (relatively) powerful machine, it took Firefox a noticeable amount of time to render the image, and caused my hard drive to crunch a little. I don't even want to imagine less powerful machines or, random-fluctuation-of-time-and-space forbid, mobile devices. All in all, I think this method severely limits the users accessing this site.
When in danger or in doubt, run in circles, scream and shout [Robert Heinlein]
1. Show the image in an alternate pornographic/warez/whatever website
2. Ask the user to type it in to access the site
3. Use the user's input to access the original protected site
4. There is no step 4.
---
All text based captcha's are broken, it doesn't matter how they're rendered, they're still a pre-defined set of characters that a bot can pick out eventually. Now, the "Click three kittens" captcha, that was fucking genious, no bot on the planet will be able to tell the difference between a kitten and a ham sandwich. Why isn't it being used? People seem to think obscuring text and making it harder for humans to read is a better idea than using something a computer will not be able to identify.
...I got nothing.
There's no need to download the image. Look at the source. Somewhere it says:
:-)
Now, just go to MD5Lookup.Com and convert that little "hidden" MD5Sum back to the original text:
ad6ade8a0b6e2f748b80a390ff45cf31 - &NMTB
Maybe the author should add some salt.