Slashdot Mirror


HTML Encoded Captchas

rangeva writes to tell us about a twist he has developed on the common Captcha technique to discourage spam bots: HECs encode the Captcha image into HTML, thus presenting an unsolved challenge to the bots' programmers. From the writeup: "The Captcha is no longer an image and therefore not a resource they can download and process. The owner of the site can change the properties of the Captcha's HTML, making it unique,... add[ing] another layer of complication for the bot to crack." HECs are not exactly lightweight — the one on the linked page weighs in at 218K — but this GPL'd project seems like a nice advance on the state of the art.

177 comments

  1. I failed to see how this'll help by Rosco+P.+Coltrane · · Score: 5, Interesting

    At the end of the day, this captcha is displayed on the screen as a colorful harder-to-read mumbo-jumbo, just like jpeg captchas, so all a bot has to do is use a html renderer to turn it into a regular image that can be processed. So the added complication is linking one of the existing captcha decoders and the gecko engine for example, maybe a half day's work. Not exactly uncrackable...

    --
    "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
    1. Re:I failed to see how this'll help by Anonymous Coward · · Score: 3, Informative

      Well, considering that the sample captcha is just a large table where every pixel is set as a background color, I'd say it would probably be a ten line perl script you can write in a lot less than half a day work.

    2. Re:I failed to see how this'll help by stg · · Score: 1

      Seems unfair that the parent has been modded down - the comment is very relevant in that case. While the page recommends using other methods, most other methods are going to be a lot easier to crack than doing good OCR on complex CAPTCHAs.

    3. Re:I failed to see how this'll help by rangeva · · Score: 5, Insightful

      "so all a bot has to do is use a html renderer to turn it into a regular image that can be processed"

      It's not that simple. Since the Captcha is no longer an image that you can download, the bot will first has to locate the position of the Captcha. The owner of the site can modify the layout of the page and Captcha making it unique. By rendering the image into HTML you practically modify to encoding of the image to a new and unique one - making it highly difficult to create a generic bot that will learn to decode all the HTML variations out there.

      The problem today is with automated software that download the Captcha images from a pre-defined location (URL) and crack them. HECs makes it much harder to locate this resource.

      Oh and everything is Crackable;)

    4. Re:I failed to see how this'll help by Aladrin · · Score: 3, Interesting

      Even worse, this catcha would be -easier- than a regular one. It lists every pixel as a TD, in rows... So easy to render that it's idiotic. And the image itself is simple as well... The background letters are much lighter in color and could easily be filtered.

      Add in the huge size of the html and the annoyance factor of captchas in general, and this is amazingly stupid.

      --
      "If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
    5. Re:I failed to see how this'll help by Aladrin · · Score: 3, Insightful

      I should have added this disclaimer to the post:

      Yes, I see that they recommend adding in random divs and crap. If it's still a table, it's still very very easy to parse, even without a parser. If they intend for you to replace the table with 'random elements' ... Do you KNOW how hard it would be to get it to show up correctly on each different browser? Another nightmare.

      --
      "If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
    6. Re:I failed to see how this'll help by stg · · Score: 1

      While I have to agree with your "everything is crackable", doesn't HECs use a whole lot more of bandwidth (moving the HEC, even compressed) and/or processing on both sides to decompress a gzipped stream than regular CAPTCHAs?

      How poorly are the CAPTCHAs doing these days against bots, anyway? I see a few that are probably easy to OCR, but there are quite a few where I have to make a effort to read them myself...

    7. Re:I failed to see how this'll help by rangeva · · Score: 1

      The HEC is heavy - although you can change the size of the HEC and make it smaller.
      The HEC should only be on the form page (registration, forum submission etc) so it won't harm the user's experience too much.

      I created the HEC because I used to get about 20 spam posts a day on my phpBB forums and other forms on my sites. I also read on many boards that this is a real problem. Since I started using HECs the spam amount went to 0.

    8. Re:I failed to see how this'll help by Anonymous Coward · · Score: 0

      Everything is crackable, granted.

      However, everything should be accessible - most (arguably all) captcha methods are not.

      If you could add ways for users with accessibility issues (particularly in this case screenreaders) to also access the service using your method you'd have something really worth shouting about.

    9. Re:I failed to see how this'll help by Giorgio+Maone · · Score: 2, Informative

      Gecko is absolutely overkill there: the HTML "encoding" is pretty lame, as the image is entirely made of 1px table cells, each one carrying its color information inlined in the style attribute.

      Just one Perl line can extract the color matrix and pass it straight to your OCR algorithm.

      Maybe if they used JavaScript to render the table on the client side, that would require Gecko or something like that (SpiderMonkey or Rhino would likely suffice), but still the complexity of a captcha cracker is noise reduction and character recognition, rather than image decoding.

      That said, I've seen no "Content-encoding: gzip" in their response: gzip encoding cannot be remotely compared to jpeg compression, but it would nevertheless cut the weight of a very redundant HTML table by a 1:16 factor or more... (hurry up guys, you've been slashdotted!)

      --
      There's a browser safer than Firefox, it is Firefox, with NoScript
    10. Re:I failed to see how this'll help by Anonymous Coward · · Score: 0

      Agree with parent. Very clever. Instead of a crackable captcha, we get a crackable captcha which also break standards compliance and rending performance. This has 'stupidity' written all over it.

    11. Re:I failed to see how this'll help by Lehk228 · · Score: 1

      the trouble is finding where that set of tables is. the site can move it around on the page each time it is loaded, so the bot has to be much smarter than existing bots which just find the right URL to download the image

      --
      Snowden and Manning are heroes.
    12. Re:I failed to see how this'll help by stg · · Score: 1
      The HEC is heavy - although you can change the size of the HEC and make it smaller.


          Wouldn't pretty much anything larger than a single letter in HEC be larger than a full CAPTCHA?

      The HEC should only be on the form page (registration, forum submission etc) so it won't harm the user's experience too much.


          My problem with the idea is that if it got popular, it'd probably be in a well-know script, at which point it'd be fairly easy to crack (even with random HTML spread around, it's a whole lot easier to analyze the text into a visible captch than doing OCR). So we'd keep the problem of the spammers, and add a new problem of large HECs.

      I created the HEC because I used to get about 20 spam posts a day on my phpBB forums and other forms on my sites. I also read on many boards that this is a real problem. Since I started using HECs the spam amount went to 0.


          Were you using well known phpBB CAPTCHAs? Comparing a brand new system to a popular system where serious time was spent on developing bots is unfair.

          Unless your forums are particularly targeted in a huge scale (i.e.: 20 spam posts per day would be nothing) - ANYTHING at all you added that wasn't know to the bots would cut the amount of spams to 0.

          It's a clever concept... But I'm afraid it won't scale all that well.
    13. Re:I failed to see how this'll help by harrkev · · Score: 1
      even with random HTML spread around, it's a whole lot easier to analyze the text into a visible captch than doing OCR
      This is still an image. Instead of sending a JPG or GIF, you are sending an actual bitmap in HTML. In my three-second preview, it just looks like a table with one-pixel cells. Then, you set the color of each cell (pixel) in HTML.

      So, this still requires OCR, but there is just an extra obfuscation step in getting the image from HTML to a standard graphics format. The down side is that it is incredibly inefficient. Each pixel takes probably a dozen bytes or more (too lazy for an exact count right now).
      --
      "-1 Troll" is the apparently the same as "-1 I disagree with you."
    14. Re:I failed to see how this'll help by (Score.5,+Interestin · · Score: 1
      At the end of the day, this captcha is displayed on the screen as a colorful harder-to-read mumbo-jumbo, just like jpeg captchas, so all a bot has to do is use a html renderer to turn it into a regular image that can be processed. So the added complication is linking one of the existing captcha decoders and the gecko engine for example, maybe a half day's work. Not exactly uncrackable...

      Exactly. It's just a really inefficient way to encode an uncompressed bitmap image. If you can encode it, the bad guys can decode it.

      (Not to mention the fact that any well-organized attacker will be going at these things with a Internet cafe full of minimum-wage humans in some third-world country, so it doesn't matter how you fiddle the encoding. This looks like an example of what happens when you don't understand your problem space).

    15. Re:I failed to see how this'll help by TubeSteak · · Score: 1
      By rendering the image into HTML you practically modify to encoding of the image to a new and unique one - making it highly difficult to create a generic bot that will learn to decode all the HTML variations out there.
      OCR programs are already designed to accept per-site profiles.

      Once a HTML-to-image rendering engine is added...
      profiles can be updated to include site-specific html layout
      --
      [Fuck Beta]
      o0t!
    16. Re:I failed to see how this'll help by Anonymous Coward · · Score: 0

      There's no need for an html renderer even, you only need a couple of regexp to find a table with 1px cells and styled with a background color, and pass those 1px colors to a script that generates the resulting image cell by cell, row by row.
      The same process that was used to create the image can be easily reverted.

    17. Re:I failed to see how this'll help by Jerf · · Score: 2, Insightful

      Oh, piffle. That's not hard either.

      The "HTML renderer" in question will be either Mozilla or IE, both of which offer through Javascript the ability to find the absolute position of an element, and its absolute width and height. So the only "hard" part left is identifying the HTML location of the test, probably with something like XPath, or Mozilla's DOM Inspector which already allows you to just click on the element (and maybe go up in the hierarchy a bit.)

      And I'm pretty sure the spammers already have programs to make it easy to have a human do just the hard parts, like identifying the location of the test, because I'm pretty sure that I've seen them have that sort of program to figure out the form field names easily. (Unique blogs, that is, blogs not based on any common software, have gotten blog spam too quickly and thoroughly before for any other explanation to make sense.)

      You can try to move the test around, but you're right back to an arms race (which is where we already were, so no progress), and it's one where the spammers have a system that automatically notifies them of when they need to make changes.

      The only spam solution is total moderation of the comment queue. If everyone did that there would be no spam anymore. (Somewhat ironically.)

    18. Re:I failed to see how this'll help by Anonymous Coward · · Score: 0
      it's still very very easy to parse, even without a parser

      WTF? If you're parsing it, you have a parser, by definition.
    19. Re:I failed to see how this'll help by Anonymous Coward · · Score: 1, Interesting

      Do you really think it's going to be a problem? A dynamic page keeps a given structure and therefore I say it takes, in the worst scenario, 10 minutes - to figure out how to extract the data you need to decode the captcha. Even if you move the text around, that's still going to be done programmatically, and that is a big limitation, isn't it?

      What would I do? simply look for all the td's with one single colored pixel, and then count the tr's inbetween.

      Everything else is made easier as the chance is given, in fact, of developing a successful and simple scanner without the need for third party modules (gd, image::magick et similia).

      Give up. If i can read that, i know i'm going to be able to make a script that just does that. This is just not the way.
      You can make a script that makes things difficult on me, but that's just delaying the day where the captcha will be broken.

      Stefano

    20. Re:I failed to see how this'll help by v1 · · Score: 1

      And even if that fails (and I don't see how) then they could just resort to screen scrapers and feed that output to their capcha image processing engine.

      --
      I work for the Department of Redundancy Department.
    21. Re:I failed to see how this'll help by stg · · Score: 1
      This is still an image. Instead of sending a JPG or GIF, you are sending an actual bitmap in HTML. In my three-second preview, it just looks like a table with one-pixel cells. Then, you set the color of each cell (pixel) in HTML.

      So, this still requires OCR, but there is just an extra obfuscation step in getting the image from HTML to a standard graphics format. The down side is that it is incredibly inefficient. Each pixel takes probably a dozen bytes or more (too lazy for an exact count right now).


      Yes, I realize the spammer still have to do an OCR step. But the obfuscation step seems way too simple compared to OCR, which was my point.

      BTW - dozen bytes per pixel? You wish! It's 64 bytes per pixel in the current sample!
    22. Re:I failed to see how this'll help by Anonymous Coward · · Score: 0

      He means (i guess) that the image given to you in a table in the html is, in fact, parsed. Just a bitmap with 60 bytes per pixel.

      Stefano

    23. Re:I failed to see how this'll help by sbaker · · Score: 1

      You can achieve random positioning just by putting the captcha into a larger image - that costs more bandwidth - but so does this approach. I don't see the benefit.

      I think a better approach is to use some natural language: "Please type the following word in backwards: WIBBLE" - "Please type in every alternate letter of this word XPQUNTF" - "Please tell me the name of a baby dog". "What is the first word in this paragraph?"

      Just think up a few dozen of these and you're done. Providing no two websites use the same list of questions, the bad guy can't break into a large number of sites automatically without a truly industrial strength AI package! Remember we don't need a defense against one-off attacks because those guys can just type in the captcha anyway.

      --
      www.sjbaker.org
    24. Re:I failed to see how this'll help by Reziac · · Score: 1

      Also, even on 1.5Mbit, it took so long to download and render that at first I thought the site had stalled. Probably a good 20 seconds.

      And it didn't render at all in my everyday browser.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    25. Re:I failed to see how this'll help by nacturation · · Score: 1

      So the added complication is linking one of the existing captcha decoders and the gecko engine for example, maybe a half day's work. Do you have any references for this? I was wondering if there is there a library which can be linked to where you can simply say render http://www.google.com/ as an image in PNG format to filename "foo.png"?
      --
      Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
    26. Re:I failed to see how this'll help by Lehk228 · · Score: 1

      traditional captcha bots can work on huindreds of blogs with only enough brains to figure out which picture is the captcha, this captcha is easy to mess with so a bot pretty much must be custom coded for your site especially if sometimes you slip the captcha code into existing page tables and use non-square cells and inequal sized cells to break things up for the bot. mix in some page content into a few more of those cells, even replace a section with an actual text character that belongs inside the captcha and botting will be made much more difficult.

      --
      Snowden and Manning are heroes.
    27. Re:I failed to see how this'll help by ProfessionalCookie · · Score: 1

      http://www.www.browsercam.com/. There's free ones too but I'm to lazy to look them up. Anyway, now you know it already exists. Cheers, Ed

    28. Re:I failed to see how this'll help by Anonymous Coward · · Score: 0

      I would agree as well, see a post in responding to the HEC here http://www.kerrywong.com/PermaLink,guid,f4326447-d f20-4dcd-993e-69da5a4956da.aspx

  2. Render, PrintScr, OCR? by Frogular · · Score: 3, Interesting

    Can't the bot simply render and OCR it?

    A better solution might be the authentication system old 386 games had where you have to do some simple but human intelligence requiring task. "Find the word in the upper right of manual pg 4" -> "Enter the 3rd word from the following paragraph"

    1. Re:Render, PrintScr, OCR? by AlecLyons · · Score: 1

      Problem with that is it requires some human intelligence to set the challenge.

    2. Re:Render, PrintScr, OCR? by vtcodger · · Score: 1

      Maybe, but one of the few times I ever went to the effort to hack a binary was to modify one of those games to get around that sort of authentication scheme. I, at least, found to it be far more aggravating than Captchas are today.

      --
      You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey
    3. Re:Render, PrintScr, OCR? by BiggyP · · Score: 1

      This makes me wonder if spammers might pick up on this method to get around FuzzyOCR and the like, unless of course HTML tables are discarded anyway.

      If anyone wants to produce HTML table graphics then The GIMP comes with an export plugin, good fun but don't try exporting or rendering anything too large, it can put a lot of strain on the browser.

    4. Re:Render, PrintScr, OCR? by Phillup · · Score: 1

      Problem with that is it requires some human intelligence to set the challenge. And to solve it.

      Chance are that if you make something that joe sixpack can pass... so can a bot written by an Einstein.
      --

      --Phillip

      Can you say BIRTH TAX
    5. Re:Render, PrintScr, OCR? by Millenniumman · · Score: 1

      You have 9 semi-random pictures. One is a dog. The rest are not. "Pick the dog".

      --
      Stupidity is like nuclear power, it can be used for good or evil. And you don't want to get any on you.
    6. Re:Render, PrintScr, OCR? by Anonymous Coward · · Score: 0

      Can't the bot simply render and OCR it?

      What about an animated graphic? You could OCR it, but you'd have to catch it first.
    7. Re:Render, PrintScr, OCR? by PAjamian · · Score: 1

      You have 9 semi-random pictures. One is a dog. The rest are not. "Pick the dog".
      Hard to actually solve for a computer program, but a very short keylength. I can have my script guess one of the 9 pics at random and an average of one out of every 9 attempted spams will go through.
      --
      Windows is a bonfire, Linux is the sun. Linux only looks smaller if you lack perspective.
    8. Re:Render, PrintScr, OCR? by sheean.nl · · Score: 1

      Ask the question two or three times (pick a dog out of 9 pics, a cat, a horse) and block anyone who got it wrong more than, say, 20 times.

      --

      If at first you don't succeed, then sky diving definitely isn't for you.
    9. Re:Render, PrintScr, OCR? by Millenniumman · · Score: 1

      Not if the image refreshed when you got it wrong, which is what most captchas do now.

      --
      Stupidity is like nuclear power, it can be used for good or evil. And you don't want to get any on you.
    10. Re:Render, PrintScr, OCR? by Anonymous Coward · · Score: 1, Insightful

      > You have 9 semi-random pictures. One is a dog. The rest are not. "Pick the dog".

      Been done. kittenauth.com It's even nine pictures, so I suspect you're just not giving credit.

    11. Re:Render, PrintScr, OCR? by nacturation · · Score: 1

      How about this:

      "One of the pictures below is of a dog lying on a pile of newspaper headlines. What's on the dog's nametag?"

      On second thought, I guess that could eventually be solved as well. Maybe add in a legend with "What kind of dog is it?" etc. The joys of a never-ending arms race.

      --
      Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
    12. Re:Render, PrintScr, OCR? by Geoffreyerffoeg · · Score: 2, Funny

      human intelligence requiring task

      "Prove or disprove P=NP. (You have 500 characters remaining.)"

    13. Re:Render, PrintScr, OCR? by Anonymous Coward · · Score: 0

      Given N=1, P=1P --> P=P --> P=NP

    14. Re:Render, PrintScr, OCR? by bogado · · Score: 1

      No the grandparent is suggesting that a simply "luck" based attack will succeed 1 out of 9 times. if the spammer hit your site with 100s or thousands of spam 1/9th of those would pass, witch still too much.

      I made a similar algorithm for my captcha, I use an image with some, but not too much randomization, this step is only to make the spammer use a OCR witch makes things a little harder but not too much. The image has a question in the grandparents format and 6 written answers, he then proceed and asks the user to enter the nth vogal or consonant followed by another letter of the correct answer.

      For instance one possible result of this scheme is :

      Witch of those is a marine being?
      from the correct answer type the vogal number 1 and the consonant number 2 :

      1) dolfin
      2) Arnold Schwarzenegger
      3) Garfield
      4) fly
      5) United States

      --
      []'s Victor Bogado da Silva Lins

      ^[:wq

    15. Re:Render, PrintScr, OCR? by PAjamian · · Score: 1

      Ask the question two or three times (pick a dog out of 9 pics, a cat, a horse) and block anyone who got it wrong more than, say, 20 times. ok, lets say three times...

      9^3 = 729.

      So now a spammer succeeds one in 729 times. If you're blocking after 20 incorrect attempts then each computer on a network of 20,000 zombies gets 20 tries, so a total of 400,000 tries for our spammer, one out of 729 will go through, so your site will still receive almost 550 spams.
      --
      Windows is a bonfire, Linux is the sun. Linux only looks smaller if you lack perspective.
  3. watermarking by dattaway · · Score: 2, Interesting

    How about watermarking the captcha with the site's address and a short message?

    1. Re:watermarking by user24 · · Score: 1

      my freecap PHP CAPTCHA does this; puremango.co.uk

  4. What are the gotchas with these captchas by Timesprout · · Score: 1

    Anyone?

    --
    Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
    What truth?
    There is no dupe
    1. Re:What are the gotchas with these captchas by YrWrstNtmr · · Score: 4, Insightful

      Blind, color blind, text only browsers, more of a hassle, just to name a few.

    2. Re:What are the gotchas with these captchas by Anonymous Coward · · Score: 0

      How hard would it be to convert this back to an Image? AFAICS, the PHP code simply creates a normal captcha image (using the PHP ImageTTFText() function, among others) then converts that in to a small table with as many rows and columns as the image has pixels. Each has a background color the same as the color at the corresponding point in the image.

      One possibility that occured to me was to use XSLT in text output mode to try and generate the image. This is just one possibility of many.

      OTOH, I think it's good that people are trying to stay ahead of the spammers, and anything which makes their job more difficult is a good thing.

      --Robin

    3. Re:What are the gotchas with these captchas by Nyh · · Score: 2, Insightful

      Or just users who have the sitteings for Firefox on 'Alway use my colors' because they don't like the angry fruit salads of most sites.

      Nyh

    4. Re:What are the gotchas with these captchas by smillie · · Score: 1
      Alway use my colors

      I was wondering why I can normally see captchas but saw nothing in the sample box. Being colorblind I need to force most pages to colors I can see. Since my browser doesn't allow me to set colors for one site but not another, even the good sites get changed.

      --

      Dyslexics Untie!

    5. Re:What are the gotchas with these captchas by springbox · · Score: 1

      The same limitations as other image-based CAPTCHAs

    6. Re:What are the gotchas with these captchas by AusIV · · Score: 1

      I'm colorblind, and I frequently find myself refreshing a page numerous times to get a captcha I can actually read. I find the things really annoying. I understand a need to keep bots from spamming sites, but some of these captchas are absolutely ridiculous.

  5. Bad form by Zaph0dB · · Score: 5, Insightful

    I think using a captcha like this one (html-table rendered) is bad web-manners. The rendering of such a table, pixel by pixel, is a huge toll on browsers. Even on my (relatively) new and (relatively) powerful machine, it took Firefox a noticeable amount of time to render the image, and caused my hard drive to crunch a little. I don't even want to imagine less powerful machines or, random-fluctuation-of-time-and-space forbid, mobile devices. All in all, I think this method severely limits the users accessing this site.

    --
    When in danger or in doubt, run in circles, scream and shout [Robert Heinlein]
    1. Re:Bad form by jellomizer · · Score: 1

      Heck it took time on my (Very) new and (Very) Powerful machine. The fastest chips available (more CPU or cores will not help because the browser calculates this on one CPU right now, maybe in the future) still needs to work on it. Makes it to slow for normal users. My mom on a iMac G3 with dialup will be painful to see.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    2. Re:Bad form by Xeriar · · Score: 1

      Took next to no time at all on any of my machines in Firefox. One is modern, three would have been considered top of the line about six years ago. If it's slow in your browser, either

      1: Your browser does not prerender (ie. IE) - though rendering was pretty instantaneous in IE6 for me too.
      2: Something is wrong with your machine
      3: You should consider looking into the purchase of a new machine if you are obviously so anal about a registration scheme that you will go through -once- taking a few extra seconds.

    3. Re:Bad form by the_womble · · Score: 3, Informative

      It did not take a noticable time to either download or render: Firefox, linux and dialup.

    4. Re:Bad form by multipartmixed · · Score: 1

      Yeah. I didn't even notice a rendering delay on FF2; my box is a P3-900 w/ 512MB RAM with a bunch of puttys and photoshop 7 already running.

      --

      Do daemons dream of electric sleep()?
    5. Re:Bad form by j00r0m4nc3r · · Score: 1

      Who cares about form? If it stops/slows spam then I support it. How often do you have to solve captchas anyway? Once a month maybe? Big deal... It's not like every website you visit every day has captchas for you to solve...

    6. Re:Bad form by Anonymous Coward · · Score: 0
      It did not take a noticable time to either download or render: Firefox, linux and dialup.

      It did not take a noticable (less than 0.5 sec) time in my case as well: GNU/Linux, Firefox, AMD Athlon 64 3000+, dsl
    7. Re:Bad form by gmerideth · · Score: 1

      A clients celeron 500 machine with 256MB of ram rendered the page in a blink of an eye along with the resulting (fake) image. I think the poster is under the impression that every single page on the 'net will have one of these images. If the purpose is to block spammers, and the HTML table will only be rendered when you attempt to post a message to a board, then what is the point of his post. I'm sure the 1/10th of a second extra it took to render that table is worth the wait to the owner of the board, and the readers, to keep posts on content and not filled with spam garbage.

      --
      Why do overlook and oversee mean opposite things?
    8. Re:Bad form by Rich+Klein · · Score: 1

      HECs are not exactly lightweight -- the one on the linked page weighs in at 218K...
      I was thinking that the files may be large, but they're highly compressible, but you hit on a good point. On my 933MHz PowerPC G4 running Firefox 2 it's not terribly slow, but it's definitely slower than any other captcha I've seen. It's an interesting technique, in any case.

      --
      -Rich
    9. Re:Bad form by Geoffreyerffoeg · · Score: 1

      Even on my (relatively) new and (relatively) powerful machine, it took Firefox a noticeable amount of time to render the image, and caused my hard drive to crunch a little.

      In Safari, on a 1.83 GHz Core Duo, the rendering was completely not noticeable. Perhaps Gecko is poor at rendering giant tables?

      (This isn't unlikely. On Mozilla 1.4 on a 1.8 GHz HP, about a year ago, I tried to render a 320x200 image as a table - I was trying to find a quick hack to get an image out of QBasic. It took less time to realize netpbm would be easier and implement that than to render the table.)

    10. Re:Bad form by insane_coder · · Score: 1

      So what you're saying is that this is a good thing. Since the spam bots will require much more raw horsepower, and thus take it longer to go around spamming the web.

      --
      You can be an insane coder too, read: Insane Coding
  6. Far easier ways to prevent the image being dl'd by Anonymous Coward · · Score: 0

    If the goal was to prevent the image from immediately downloadable, could have used the data: protocol to embed the image data directly into the actual page, or embedded SVG, or used regular CSS to obfuscate the captcha.

  7. Spy vs spy by Anonymous Coward · · Score: 1, Interesting

    This scheme will work until it is widely enough used that it is worth the spammers' while to write a crack. As the author suggests, the ultimate solution is probably to have so many of these schemes that the spammers can't keep up.

    I have a question. How much of a problem are these spammed responses to blogs. I go to several blogs that don't have captchas and haven't noticed anything that could be called spam. Is this a response to a non-problem?

    1. Re:Spy vs spy by Anonymous Coward · · Score: 0

      Comment spam is a serious problem in my opinion and growing very strong. Your question is like back in 1996: how much of a problem is email spam, I get only 4 such emails a day.

      See: http://johnbokma.com/mexit/2006/07/13/

      If you have your own blog, writing down some of the links in the spam, and checking Google for them after a few days might show you 10,000-20,000 blogs/guestbooks with the same links.

  8. April Fool's 3 months early? by Anonymous Coward · · Score: 0

    I read TFA and solved their little captcha, but didn't get no Pr0n!

  9. 218kb by Joebert · · Score: 1

    Screw trying to solve it, it would be easier to use that 218kb chunk of junk that's no doubt going to need a bunch of dynamic processing against them, thus forcing them to wish they never used it in the first place.

    --
    Wanna fight ? Bend over, stick your head up your ass, and fight for air.
  10. workaround... by zozzi · · Score: 5, Informative
    Spammers already have a workaround for catchpas:

    1. Show the image in an alternate pornographic/warez/whatever website

    2. Ask the user to type it in to access the site

    3. Use the user's input to access the original protected site

    4. There is no step 4.

    --
    ---
    1. Re:workaround... by rjamestaylor · · Score: 2, Funny

      Brilliantly devious. Hundreds of pr0n-seeking addicts are itching at any given moment to get their fix. Only problem is that there probably aren't enough CAPTCHAs available on the web to meet the pr0n-seekers demand! Either free "inventory" will be given away for repeated CAPTCHA solving or, if repeats not used, CAPTCHA won't be available and will frustrate the frustrated seeker even more. So, PhpBB-admins do your part: enable CAPTCHAs to meet the demand!

      --
      -- @rjamestaylor on Ello
    2. Re:workaround... by iangoldby · · Score: 1
      Spammers already have a workaround for catchpas:
      1. Show the image in an alternate pornographic/warez/whatever website ...

      Except this isn't an image.
    3. Re:workaround... by iamdrscience · · Score: 1

      People bring that up whenever there's news about Captchas, but I have to say I don't believe it. When it comes to porn, I'm no slouch and I can count the number of times I've seen sites that give you free access after entering a captcha on one hand. Far more Captchas are compromised because some OCR nerd has figured out how to crack it.

    4. Re:workaround... by Phillup · · Score: 5, Funny

      When it comes to porn, I'm no slouch and I can count the number of times I've seen sites that give you free access after entering a captcha on one hand.

      One hand eh?

      Guess we don't really need to ask how you know this...

      --

      --Phillip

      Can you say BIRTH TAX
    5. Re:workaround... by Dystopian+Rebel · · Score: 1
      When it comes to porn, I'm no slouch


      At least you're maintaining good posture while you're stunting your growth.
      --
      Rich And Stupid is not so bad as Working For Rich And Stupid.
    6. Re:workaround... by Anonymous Coward · · Score: 0

      That's what this solves. Idiot.

      Daniel

    7. Re:workaround... by skiingyac · · Score: 1

      Actually, thats not a problem, the more views the better. The spammers have no easy way to verify what the user typed in is correct (if they could, then this whole discussion would be pointless), so they should show the captcha to tens or hundreds of people, and whatever text is most commonly entered should be attempted to get past the captcha... Or if they can afford incorrect guesses, just keep trying until some helpful user guesses correctly.

      The point is it takes multiple users to access a single captcha-protected item.

      The moral is that when you're getting your pr0n/warez/etc. fix, just enter random letters for the captcha response, because the pr0n server doesn't know any better.

    8. Re:workaround... by BCoates · · Score: 1

      Most existing captchas are weaker in some more trivial way so the porn trick isn't necessary. The reason it matters is that the porn attack is pretty much unstoppable (it's the grandmaster problem) and low-cost enough that if captchas got more popular and less weak in other ways it or something like it will be the attack of choice. Captchas are technological dead-end, though they can be used in the short term as a way to make your site slightly harder to spam than everyone else's, as long as you don't care about screwing over the blind, people not set up to take images, and well-behaved bots.

  11. A captcha is still a captcha by Cee · · Score: 4, Interesting

    One of the main objections of a captcha is that an attacker could steal the image file and simply use it on their site (XXX sites...) to get it "cracked".
    A HTML generated captcha would prevent that, since there is no image file to copy.
    However, what prevents the attacker to simply copy the relevant HTML source and put it on his or her site, just like the image? Sure, you can make it quite complicated by adding CSS layers and whatnot, but in the end that would just merely be an extra annoyance.

    And stopping the attacker on using OCR on the captcha won't really work either. It's not that hard to render HTML code to an image, which you can feed to the OCR software.

    In short, this hack is just another step in the arms race, that just buys us some time.

    1. Re:A captcha is still a captcha by Joebert · · Score: 0, Redundant
      A HTML generated captcha would prevent that, since there is no image file to copy.

      Someone could go to Microsoft.com & download Visual C++ Express Edition for free & put together somthing that uses Internet Explorer to render the entire page & save a bitmap of it in less than a day.
      If they already had somthing to recognise a CAPTCHA, I bet they could dig the CAPTCHA image out of that bitmap while they're there.
      --
      Wanna fight ? Bend over, stick your head up your ass, and fight for air.
    2. Re:A captcha is still a captcha by cmallinson · · Score: 1

      "One of the main objections of a captcha is that an attacker could steal the image file and simply use it on their site (XXX sites...) to get it "cracked"."

      You could also just steal the html table code, and show it on another site. It almost easier, since there's no file to deal with.

    3. Re:A captcha is still a captcha by sbaker · · Score: 1

      The problem is that anything you can put up on the screen can be rendered using the code inside an open-sourced browser and saved as an image file. Hence there is no possible means to encode or encrypt or otherwise mangle the image that can't be read by a sufficiently good font recognition algorithm.

      The trick has to be to make life harder for the image recognition step - not to make it harder to feed the image into that stage.

      So - more noise in the background - more crazy font choices - more 'meta' stuff like "Please type this in backwards" that the AI program will find hard to understand. If they get too good at AI image extraction you can use cultural knowledge that the program won't have "Who lives at the north pole and brings presents to children at Christmas?"

      In the limit, this is like a Turing test - can the bad guy's program answer questions sufficiently like a human?

      If the bad guys can solve this AI problem then they are doing better than the great artificial intelligence minds of our time!

      --
      www.sjbaker.org
    4. Re:A captcha is still a captcha by noidentity · · Score: 1

      "It's not that hard to render HTML code to an image, which you can feed to the OCR software."

      And you could also just render the HTML into an image and present this on your human-powered CAPTCHA-defeating site (and lower your bandwidth costs too, since it'll be smaller than this bloated HTML).

  12. Do others use such spam-bot blockers? by msobkow · · Score: 2, Interesting

    I've had sessions that took an inordinately long time to initialize with various web service providers (it's very noticeable on dial-up.) I'm wondering whether similar techniques might be used to attack rather than defend, possibly including rogue AJAX code.

    --
    I do not fail; I succeed at finding out what does not work.
    1. Re:Do others use such spam-bot blockers? by msobkow · · Score: 1

      What I'm trying to get at is that with Flash and similar technologies, I can just remove the plugin or disable it in the browser. But with an AJAX or any other interface that uses ECMAScript, it might well be possible to deliver attack code. People forget it's called JavaScript because it's a similar syntax, but it is NOT sandboxed like real Java applets.

      --
      I do not fail; I succeed at finding out what does not work.
  13. Not a resource they can download and process? by Tim+C · · Score: 1

    Really? Firefox doesn't seem to have any problems downloading and processing it, and as I wasn't aware that Firefox or Gecko used voodoo magic, I'm going to assume that the same would be true of any purpose-written code...

    It's a nice idea, but it's little more than a speed-bump at best. (And not a particularly high one, at that)

    1. Re:Not a resource they can download and process? by Professor_UNIX · · Score: 1
      Really? Firefox doesn't seem to have any problems downloading and processing it, and as I wasn't aware that Firefox or Gecko used voodoo magic
      Took about 5 seconds to fully render that HEC on my 1.6 GHz Powerbook running Firefox. It could just be the time associated with downloading all that HTML though I guess. It definitely seems to not be friendly compared to a 30K JPEG of the same thing.
    2. Re:Not a resource they can download and process? by JamesTRexx · · Score: 1

      Took about 0.5 seconds on my FreeBSD Dell 1.13 GHz P III in Konqueror while compiling KDElibs3 in the background. At every reload of the page.
      Or it could just be that FreeBSD/KDE has magic powers. :-)

      --
      home
    3. Re:Not a resource they can download and process? by Tim+C · · Score: 1

      It took my copy of FF2 "a couple of seconds"* to render it first time round on my X2 4400+ once the page had fully loaded; subsequent loads showed the captcha more or less instantaneously, despite it being different each time.

      By "no problems" I mean that it's right there in the page, and can be scraped out with relative ease. In fact, it's not really any harder than searching for the appropriate img tag, in either case you have to identify an enclosing block of text and pull out the relevant HTML fragment.

      It might stop the odd kiddy when the script they found on an IRC channel can't handle it, but it's not going to stand up to anyone with even half a clue for very long. Don't get me wrong, it's definitely an interesting idea, and might be a (very small) step in the right direction, if someone can strengthen it somehow; I'm just not convinced that it'll be possible.

    4. Re:Not a resource they can download and process? by henryhbk · · Score: 1

      I tested it on my core-duo mac mini (with the crappy IMA graphics) in Safari and it was essentially instant. I will admit it was over FIOS so download speed was 15mbit, but it didn't slow down at all.

    5. Re:Not a resource they can download and process? by user24 · · Score: 1

      crashed firefox when I tried to view source; winXP's virtual memory crap.

    6. Re:Not a resource they can download and process? by Anonymous Coward · · Score: 0

      switcheur \'swi`ch &r\, n.
      A person who thinks that they are a Mac user but are really just trying to be. The mistake they make is to try to become a Mac user, when real Mac users are all about not trying to be anything and following your own rules. There is no fashion code to being a Mac user. There are no rules as to what applications you have to run.

      Recent converts like you are ruining the old school Mac community because you are posers. Apple releases one OS that popularizes Fitts' law and the Genie effect, and suddenly people assume being a Mac user is all about owning a Mac. But a real Mac user is born, not made. You "switchers" are misrepresenting yourselves and the Mac platform. You're giving people the wrong idea of what Macintosh is.

      switcheur: shops at hot topic, thinks Firefox is a good Mac app, waiting for OS X port of PayrollPro 2000, follows any hint of a fashion trend (instead of setting them!), wouldn't know Clarus from Carl Sagan.

      real Mac user: someone true to who they are, the misfits, the rebels, the troublemakers, the round pegs in the square holes. The ones who see things differently. They're not fond of rules and they have no respect for the status quo. The ones who are crazy enough to think that they can change the world.

  14. Screen Captcha! by mrmeval · · Score: 2, Interesting

    It's easy no?

    The file size is what intriques me. Just make a 'hidden' captcha that a bot would download. Now figure out how to make a jpeg decompressor uncompress that to 2 gigs or better.

    It's like the old "I'll compress 2gigs of the letter A with zip and upload it to that BBS and let the virus checker gag" gag.

    Or maybe a gif file. I wonder how solid black or white compress......

    --
    I'd go on a Vegan diet but the delivery time from Vega is too long. --brownkitty
    1. Re:Screen Captcha! by KillerBob · · Score: 1

      If it's a "hidden" field, the legit browsers will still see it, though. The user may not see it, but it'll still be loaded by the browser.

      As to how to make it compress really well, simple. Save it as a 2-colour bitmap (with all the pixels "on"). Of some obscenely high resolution. Like 168,000x105,000. 17.6 billion pixels. Will compress really small, but will also suck up a huge amount of RAM to display.

      --
      If you believe everything you read, you'd better not read. - Japanese proverb
    2. Re:Screen Captcha! by quis · · Score: 1

      I just ran a quick test in Photoshop. Starting with a 4000px square image, filled with white I got a 48Mb document. Saving this for web, as compressed as I could get it I ended up with a 12kb file.

    3. Re:Screen Captcha! by mrmeval · · Score: 1

      Is there any way to fool a robot into reading such a file but not a browser?

      --
      I'd go on a Vegan diet but the delivery time from Vega is too long. --brownkitty
    4. Re:Screen Captcha! by mrmeval · · Score: 1

      Cool. I was trying that with Gimp and it choked. :( It could be that my system is too wimpy.

      --
      I'd go on a Vegan diet but the delivery time from Vega is too long. --brownkitty
    5. Re:Screen Captcha! by JoshJ · · Score: 1

      That's probably due to the repeated pattern of td/tl/#ffffff . Changing the colors would increase the filesize rather dramatically, I imagine.

  15. Lunacy by Stormx2 · · Score: 4, Interesting

    Lunacy! I've made apps which can do this sort of thing before, and this one is totally unoptimized! Take a look at this:

    With the limited amount of colours used, it would make much more sense to
    a) give the table an id, then:
    table.tabid td { width:1px; height:1px; )
    b) give some classes for each colour used
    td.colid { background-color: blah; }

    I'm sure that would half the source code size... How can you trust a HTML solution that hasn't even been properly thought through?

  16. Processing by jones_supa · · Score: 2, Interesting

    The Captcha is no longer an image and therefore not a resource they can download and process.

    Err...but the HTML captcha is a resource they can download and process.

    1. Re:Processing by Phillup · · Score: 1

      The Captcha is no longer an image and therefore not a resource they can download and process.

      Err...but the HTML captcha is a resource they can download and process.

      Not without getting the whole page you can't... that is the point.

      You still need to separate the captcha from the rest of the page.
      --

      --Phillip

      Can you say BIRTH TAX
  17. When bad ideas go live by billcopc · · Score: 1, Insightful

    Having a 200kb block of text, no matter how well it compresses, will add anywhere from 10 to 40 seconds to download on a dial-up line, and that was for a ridiculously small CAPTCHA. A larger, more human-readable size might use up 500kb or more. Even on a high-speed link that's a noticeable pause. The fact that it only shows up on the sign-up page doesn't make it excuseable; in fact it makes it counter-productive. If I find some cool site, eagerly hit the sign-up link and end up staring at a half-rendered page for more than 15-20 seconds, I'll just leave and find some other site that loads faster, because I really don't care what's going on behind the scenes... I have no compassion for an elaborate security device if it bungles my experience.

    This is what happens when bad ideas are brought to life. This will only waste the site owner's bandwidth, maybe slow down the attacker slightly while the algorithm is modified.. we're talking AT MOST a couple days work. You could achieve the same result by adding a 2-second delay to the CAPTCHA cgi, the same idea as adding a delay to failed logins... if you can't properly defeat the attackers, at least slow them down.

    We've reached a point where, with security/copy protection, if it is something than can be done by a human sitting at a computer, the human can be removed from the equation. The greatest shortcoming of any system like CAPTCHA, or even asking "human intelligence" questions like "What do monkeys eat" or other things that computers don't innately "know", is that a human has to computerize those actions in the first place. You have to teach YOUR computer what the answer to the monkey question is, and there are only so many answers you will teach it until you run out of ideas (or exhaust the body of humankind's knowledge). Eventually the attacker will know all the answers to your challenges and you've just wasted a whole lot of time.

    A better strategy here is the psychological approach. How do you get rid of a tireless attacker ? What motivates an attacker ? They WANT something of value to them. That something can be email addresses, zombie hosts, or in the case of blog spam they just want eyeballs. There are two ways to demotivate them: get rid of what's luring them, or make your prize harder to get than everyone else's. The first solution might mean crippling your site, even making it totally worthless (think site owners that give up, communities that are abandoned after relentless attacks). The second solution only buys you time, because the more vulnerable sites will ramp up their security, sooner or later, and then you're back at square one.

    Actually there is a solution 3: find the attackers and attack THEM. Hey it's not the higher road, but it's damn effective.

    --
    -Billco, Fnarg.com
  18. Captcha's are annoying by tacocat · · Score: 4, Insightful

    While this has little to do with the original post I have a really annoying experience with captchas

    I have 20/20 vision and am not color blind. Captchas are becoming so complicated and garbled that I get the code wrong about 40% of the time. Another portion of the time I take to long trying to answer the code question and type in the right characters. I typically get screwed on the number Zero and the letter 'O' and lowercase 'L' and the number 1.

    It'b becoming, for me, an entry barrier to signing up and gaining access to websites. It would be much easier to simply use email authentication. What do you do with the people who are color blind? I spent some years dealing with display design and this was a legitimate concern that we addressed at the time for a specialized group of people. In the common population there are a lot more occurrences of people who are color blind.

    Are captcha's really worth the effort compared to other more human friendly processes? Is anyone working on what we will be doing next? Considering that there are decades of technology in machine vision technology to pull from I think it will be fairly trivial for the bots to become better at reading captchas than humans.

    It might be effective to take the email authentication process and apply everything that mail servers do to authenticate the user. What I mean by this is apply all the mail server rules like FQDN requirements for HELO, fully resolvable email domains, valid email addresses, non-open relays. Much of this would eliminate either the bots or the ISP's who are too stupid to properly configure a mail server. Similarly it might be sufficient to code the HTML/HTTP to expect a properly responding client and not some hacked up bot that can't do most of it right.

    1. Re:Captcha's are annoying by KillerBob · · Score: 1
      Well, I administer a couple of forums, and I can honestly tell you that captcha is mostly useless. That said, so is e-mail validation. The bots are using throwaway e-mail addresses to get around e-mail validation. Sometimes, they're registering their own domains and using a catch-all so that the bot can put in random junk for the e-mail address, sometimes they're using free e-mail providers.

      The thing is, it's a losing battle. You can either shrug your shoulders, and let it happen, or you can take up arms. At its peak, an average of 100 bots/day were registering on my forums. I was able to stymie them by blocking websites from their profiles and preventing them from posting links until they'd had 10 posts, but they were still registering.

      100/day may not sound like many, until I tell you that there's only about 2,000 legitimate users on the messageboard. You need to employ every weapon at your disposal to prevent bots from registering, and even then, you may not get all of them.

      Now, addressing what you said specifically:

      I have 20/20 vision and am not color blind. Captchas are becoming so complicated and garbled that I get the code wrong about 40% of the time. Another portion of the time I take to long trying to answer the code question and type in the right characters. I typically get screwed on the number Zero and the letter 'O' and lowercase 'L' and the number 1.

      That's the fault of the website designer more than anything. The smart website designer will use a font that puts a slash through the numeral zero, for example, or will remove certain characters from the list of available characters, such as you describe. Most canned captchas actually support that, but it's up to the person using it to tell the script not to use characters like 'l'.

      It'b becoming, for me, an entry barrier to signing up and gaining access to websites. It would be much easier to simply use email authentication. What do you do with the people who are color blind? I spent some years dealing with display design and this was a legitimate concern that we addressed at the time for a specialized group of people. In the common population there are a lot more occurrences of people who are color blind.

      Provide an alternative method for registration. My e-mail address is provided on my registration page, asking users to send me an e-mail if they have any problems registering and I'll manually create the account. I check my e-mail daily, and at most, you'll have to wait 24 hours.

      Again, that's up to the website administrator to make allowances for people who get screwed over by the captcha and other methods.

      Are captcha's really worth the effort compared to other more human friendly processes? Is anyone working on what we will be doing next? Considering that there are decades of technology in machine vision technology to pull from I think it will be fairly trivial for the bots to become better at reading captchas than humans.

      Yes. While I did say they were mostly useless, I also said that you need to employ every weapon at your disposal to keep the bots from registering. They do block some of the less sophisticated bots, and every little bit helps.

      I've been tossing around an idea of an anti-captcha, though. Throw in a captcha, and right below it, have a note that says "now, disregard the above captcha and type 'notabot' in the box". I'll probably implement it to see what happens.

      It might be effective to take the email authentication process and apply everything that mail servers do to authenticate the user. What I mean by this is apply all the mail server rules like FQDN requirements for HELO, fully resolvable email domains, valid email addresses, non-open relays. Much of this would eliminate either the bots or the ISP's who are too stupid to properly configure a mail server. Similarly it might be sufficient to code the HTML/HTTP to expect a properly r

      --
      If you believe everything you read, you'd better not read. - Japanese proverb
    2. Re:Captcha's are annoying by AusIV · · Score: 1
      I have 20/20 vision and am not color blind. Captchas are becoming so complicated and garbled that I get the code wrong about 40% of the time.
      I have 20/40 vision and am colorblind. I find a site with a captcha, I give up on it unless it's something I'm really interested in. There have definitely been websites that have lost my business because of their obnoxious captchas.
    3. Re:Captcha's are annoying by tacocat · · Score: 1

      I supposed the next step is to have people write a physical letter or make a phone call to you personally. But I wonder how long it will be before electronic speech gets better.

    4. Re:Captcha's are annoying by brassman · · Score: 1

      I signed up for a new board today (vBulletin based) and had to refresh the Captcha four times before I got one I could read.

      --
      "Ain't no right way to do a wrong thing."
    5. Re:Captcha's are annoying by NaDrew · · Score: 1
      I supposed the next step is to have people write a physical letter or make a phone call to you personally.
      Sounds like the old BBS days. Many of the 'leeter boards required you to answer the phone and talk to the SYSOP briefly before they'd let you in.
      --
      Vista:XPSP2::ME:98SE
  19. Broken by Kurayamino-X · · Score: 5, Interesting

    All text based captcha's are broken, it doesn't matter how they're rendered, they're still a pre-defined set of characters that a bot can pick out eventually. Now, the "Click three kittens" captcha, that was fucking genious, no bot on the planet will be able to tell the difference between a kitten and a ham sandwich. Why isn't it being used? People seem to think obscuring text and making it harder for humans to read is a better idea than using something a computer will not be able to identify.

    --
    ...I got nothing.
    1. Re:Broken by springbox · · Score: 1

      I see a few problems with that CAPTCHA. First, it's one of the few CAPTCHAs that requires JavaScript to work, which is not its biggest problem. All of the images for the CAPTCHA are thrown out onto the page so it's just a matter of having a human identify each animal in each picture and an automated program can find "x" number of "y"s on the page. Not only that, but the CAPTCHA images themselves are easily accessible since they're put in the same directory with file names like 0.jpg, 1.jpg, etc.

    2. Re:Broken by izam_oron · · Score: 1

      In that case, one could render them all into one image and use alphanumeric identifiers for the kittens instead. Even better, throw in a bunch of random pictures and ask the user to type in the ID of whichever one is item "x". Someone suggested that you use flickr with the kittens, so having access to multiple "x" pictures is a must. You get all the combinations of strange alphanumeric strings and the usability/cuteness factor of contrasting fuzzy animals.

    3. Re:Broken by eugene+ts+wong · · Score: 1

      Another idea would be to show a picture of many people, and ask how many of them have brown/black hair. In the picture would be all coloured people with brown/black hair, and black people with different coloured hair. The problem with my suggestion, is that number of people could only be so many [ie: between 1 - 10], so it would only take so long for bots to guess all possibilities, but then again, every little bit helps.

    4. Re:Broken by eugene+ts+wong · · Score: 1

      Maybe they can make it more complex by instructing the user to click on them in a certain order. For example, "Click on the black bunny, then the white bunny, then the black kitten.".

      Maybe another idea is to have a question asking which item doesn't belong, and explaining why. Things like that can be subjective, unfortunately.

    5. Re:Broken by toddestan · · Score: 1

      The problem with image captcha is that you would be starting from a pre-defined set of images. All you would have to do is teach the bot which images are kittens, and you would be set. The other problem would be guessing - the bots could try to get in just by randomly clicking on the images. If you want to keep the chances of that low, you would have to display a lot of images, or have the user pick out the kittens multiple times.

    6. Re:Broken by Anonymous Coward · · Score: 0

      The HotOrNot CAPTCHA asks you to pick three "hot" men or women as determined by HotOrNot.com users. This gives you a very large selection of already-categorized images, although maybe Flickr tags would be less controversial.

    7. Re:Broken by spitzak · · Score: 1

      The problem is not coming up with a question that a computer program cannot answer. The problem is making a computer program that can create such questions.

      If the questions are created by a human, there is going to be a limited set. The spammers only have to figure out the answers to that limited set. Only by having the computer generate a (essentially) infinite set of questions can this workaround be avoided.

    8. Re:Broken by nuzak · · Score: 1

      > no bot on the planet will be able to tell the difference between a kitten and a ham sandwich.

      Kitten tastes like veal, actually. Mmm.

      Blacklisting every single bot IP ought to do it. Turn off the internet for the botnets, that's all. Yes, it's "enumerating badness", but it's still a reasonably finite and discoverable space. Blogs aren't at the state of the art anti-spam was in 5 years ago, and they could be adding to the anti-spam arsenal instead of trying to catch up with it. Ah well.

      --
      Done with slashdot, done with nerds, getting a life.
    9. Re:Broken by Anonymous Coward · · Score: 0

      I'm sure it's more than possible to train a neural net to tell the difference between a kitten and a ham sandwich.

    10. Re:Broken by izam_oron · · Score: 1

      It might help if I mention that something like this could be done using session ids in cookies. The server generates a random position for said "holy object", and associates it with a session id. This goes on, and for each new request, a new id and news position are generated and associated. This ought to keep the bot guessing for a very long time, especially if there are multiple correct answers. The hair thing is a good idea, but what about the colorblind? I think that identifying generic objects would be a bit more accessible, but having multiple correct answers (and having to get them all correct) is a nice idea.

    11. Re:Broken by eugene+ts+wong · · Score: 1
      I like the session id idea and the position idea. That seems pretty complex to me. Even if it is easily overcome from an intellectual stand point, it just makes it that much harder.

      The hair thing is a good idea, but what about the colorblind?
      Yes, that is a problem, but it might not be a problem where the colourblind aren't invited. A legitimate example is a web site where you send in an application to be a military pilot. The more the users are required to be fully functional in their day to day lives, the less accessible the site needs to be. The site could go so far as to make a form as complex as a cockpit, if potential pilots are applying.

      Another use could be where the images have colour but the questions deal with black and white. It will be much easier for the spammers to guess which people have black hair and white hair, than guessing various other colours, but it should still be harder than not having to guess at all.

      For web sites which require perfect vision and fully functioning browsers, you could have no HTML text and have only images with text asking questions, like when did Columbus sail the ocean blue? or what time is it?
      Other layers of complexity could be made. All photos could be black and white. All photos could contain pictures of people holding photos. Some pictures could be cartoons.

      Even more complex questions could be made. Who most likely has natural black hair? The spammer must identify the photos of people who have black hair roots, and/or black skin. How will the spammers identify the Orientals?

      An important aspect of trying to foil the spammers is each site using a completely different technique. If each bot were only able to access 1 site, then the usefulness of spamming would go down. The spammers could create bots that attack different sites, but do they really want to create 1 bot with each feature only able to attack 1 site? A lot of those captchas aren't really all that difficult, but it forces the spammer to assess whether or not it is worth attacking a certain forum, where only a handful of people may be present.

      Honestly, as for handicapped people, I haven't the foggiest idea of how to deal with them.
  20. not very effective by varunvnair · · Score: 1

    This is going to be effective only as long as it is not popular and not worth somebody's time to sit down and write a script to convert it into a genuine image.

    How difficult is it to translate this matrix into a normal image? Not very difficult I am afraid.

    1. Re:not very effective by Anonymous Coward · · Score: 0

      How difficult? Press the "Prt Scr" key. Paste in image editor. Enjoy.

  21. Imagine encrypting your webforms by Anonymous Coward · · Score: 0

    You have a user feedback or similar with your nice new captcha. You add the Javascript PGP encryption to it, and suddenly all the snooping is a thing of the past because all the user submitted replies are encrypted with PGP.

    I'm not sure why you're throwing mod points to kick a 0 score comment that most people will never see to -1, and no, I have nothing to do with either Hanewin or Enigmail and the code is freely available.

    1. Re:Imagine encrypting your webforms by Anonymous Coward · · Score: 0

      Your first comment was (with hindsight incorrectly) moderated "off-topic" because it seemed totally irrelevant to the topic of this article. You failed to explain how it related to captchas. Although you then tried to explain the relevance in your follow-up post, your explanation was still not inadequate! Re-read the first paragraph of your comment entitled "Not offtopic" and think how the English could be corrected to improve your communication. If you post as an AC, try harder to make yourself crystal clear. If you use good English and express your thoughts in a logical order, the relevance of what you write should be clear from the beginning. You're not doing that yet. I was not the moderator.

    2. Re:Imagine encrypting your webforms by Anonymous Coward · · Score: 0

      typo: "not inadequate" --> "not adequate"

  22. Opera doesn't seem to have that problem by pathological+liar · · Score: 1

    For what it's worth, Opera doesn't seem to have that problem. The page loaded/rendered so fast on my laptop that I thought they'd cheated and just stuck an image in there.

  23. A matter of time by superbrose · · Score: 2, Interesting

    The advantage of this captcha is that it is not widespread yet and so the chances that a bot can crack it are lower.

    Funny that when OCR software is supposed to work it often fails, but when there is some effort to hinder recognition then bots can deal with that. Maybe general OCR software should try to crack input instead!

    1. Re:A matter of time by jonbryce · · Score: 1

      Even if the crackbot OCR software works only a small percentage of the time, it is still worth their while using it, as they just need to keep it running again and again until they get in. That's very different from OCRing a document many times, and hoping that one of them comes out right.

  24. You must have a very poor dialup connection. by goldcd · · Score: 1

    That page gzips down to 12k - so ~2 seconds download speed.
    The larger problem would probably be load on the server - possibly you could get around this by pre-compressing and then randomly serving. I don't think this was supposed to be a perfect solution, it's just a nice little demo showing how something common can be done in a new way.

  25. capchas by iviagnus · · Score: 0

    Kudos! An awesome idea. If some here cannot see the relevance, they are n00bs with no brains.

  26. 218k of junk by suv4x4 · · Score: 2, Informative

    This GPL-ed project can be reproduced by a junior coder in an hour so the fact it's GPL-ed I guess isn't of so much help.

    Also on the subject of it being 218k, each pixel looks like:

    ... tr... <td style='height:1px;width:1px;background-color:#fcfb ff'></td> ... /tr...

    which is badly redundant, the very first thing is you can make all "td"-s in the table be 1px/1px with a simple: table.captcha td {width:1px; height:1px} rule, then background-color can be shortened to just "background" and still be valid.

    Furthermore you don't need table with rows and columns, if you float the pixels to left, then you only need a container of the right width and columns/rows wil naturally form, to keep it down we can style a shorter tag for our purposes, like <b>

    So at this stage we arrive at the much simpler:

    <b style="background:#abcdef"></b>

    But this can be simplified even further by indexing the colors used as around a 40-50 css classes (fiven the image has a lot more than 40-50 pixels and 40-50 colors are enough for it, it's still a net gain), for example: .cA {background:#abcdef} .cB {background:#ffaabb}, at which point we get not only more obfuscation for the captcha crackers to solve, but much lighter code:

    <b class="cA">&lt/;b>

    and again the original:

    ... tr... <td style='height:1px;width:1px;background-color:#fcfb ff'></td> ... /tr...

    And this is before we start putting JavaScript in the picture...

    1. Re:218k of junk by belg4mit · · Score: 1

      Someone needs a swift kick, this is not a troll...
      A terser comment with some of the same content not
      a few screens down is marked +2 interesting.

      --
      Were that I say, pancakes?
  27. Congratulations... FOOL! by Tom · · Score: 1, Interesting

    Great, so blocking images in E-Mail will no longer get those image-spams thrown out, because now a bright-but-not-intelligent geek has given the spammer assholes a way to encode their crap in simple HTML which no spam filter will manage to get.

    Congratulations. How much did they pay you?

    Oh, as for the "official" purpose. I give it a life expectancy of 3 weeks before the spammers have found a way around it. If they bother at all.

    --
    Assorted stuff I do sometimes: Lemuria.org
    1. Re:Congratulations... FOOL! by WWWWolf · · Score: 1
      Great, so blocking images in E-Mail will no longer get those image-spams thrown out, because now a bright-but-not-intelligent geek has given the spammer assholes a way to encode their crap in simple HTML which no spam filter will manage to get.

      In other news, people are celebrating on the streets when HTML email is finally dead and all spam filters are configured universally throw HTML attachments away. Everyone is suprised when the 1990s technology is working so well and understand why every old beard has been complaining about HTML mail...

      ...or maybe it's just that someone writes a new rule for SpamAssassin (adds "no megacrappy HTML" rule after the "no crappy HTML" rule) and calls it a day.

    2. Re:Congratulations... FOOL! by jonbryce · · Score: 1

      Spamassassin already blocks messages with a very high ratio of html tags to text, so it would get those messages.

    3. Re:Congratulations... FOOL! by Anonymous Coward · · Score: 0

      Yeah and fuck GNU and freesoftware. With GCC they can compile trojans !

    4. Re:Congratulations... FOOL! by Geoffreyerffoeg · · Score: 1

      a bright-but-not-intelligent geek has given the spammer assholes a way to encode their crap in simple HTML which no spam filter will manage to get.

      Are you seriously trying to imply that the concept of rendering an image in HTML via 1-pixel table cells is new? The innovation here is connecting table-rendered-images and CAPTCHAs, not one or the other.

  28. This will leave spammers saying.... by sbben · · Score: 1

    What the HEC?

  29. No need to download the image by lintux · · Score: 5, Interesting

    There's no need to download the image. Look at the source. Somewhere it says:

    Now, just go to MD5Lookup.Com and convert that little "hidden" MD5Sum back to the original text:

    ad6ade8a0b6e2f748b80a390ff45cf31 - &NMTB

    Maybe the author should add some salt. :-)

    1. Re:No need to download the image by Anonymous Coward · · Score: 0

      that probably means its even easier than that. you could just make your own form, to post your own word & hash to the server.

    2. Re:No need to download the image by lintux · · Score: 1

      Yeah, that's what I realized a few seconds after posting this too. The author certainly has some work to do. :-)

      Adding a long salt should work though, as long as the salt is secret.

    3. Re:No need to download the image by gwait · · Score: 1

      I tried twice to do that, and crashed Firefox both times!
      (Version 1.5.0.8 on KUbuntu)

      Seems like the hash input code is to the right of the html encoded image,
      the HTML encoded image is all one one line, so the view source window is now very wide virtually.

      To crash:
      1. View source
      2. Search for the word "hash"
      3. See firefox lock up till KUbuntu pops up a kill box for it..
      4. Profit! (umm?)

      --
      Bavarian Purity Law of Rice Krispie Squares: Rice Krispies, Marshmallows, Butter, Vanilla.
    4. Re:No need to download the image by cnettel · · Score: 1

      Long, and variable, but deterministic, is also quite ok. However, it should still be stored serverside, since it seems (to me) that the only point of including the hash in a production system would be to avoid storing the session information. Then, the creative hacker will naturally switch the hash value and add a matching input string when the form is posted.

    5. Re:No need to download the image by lintux · · Score: 1

      Oops! You're totally right. Why didn't I think of that... :-(

  30. A simple screen capture defeats this by BarnabyWilde · · Score: 0

    A simple screen capture defeats this, since everything is ultimately a map of bits (a bitmap!) on the screen that can easily be converted to a file.

  31. I just wrote one.... by Skylinux · · Score: 1

    I just wrote a sample CAPTCHA system as well but kept in black and white for various reasons. I also use whole words to make text input simpler for humans. Here is a competing article explaining a different approach.
    ahref=http://www.network-technologies.org/Projects /Virtual_Brain_Online/article/user_validation_imag e_verification_code_captcha/rel=url2html-31631http ://www.network-technologies.org/Projects/Virtual_B rain_Online/article/user_validation_image_verifica tion_code_captcha/>

    --
    Everyone who buys Wild Hunt will receive 16 specially prepared DLCs absolutely for free, regardless of platform.
  32. Captchas by anasciiman · · Score: 1

    I loathe CAPTCHAs in any form. People who are blind or, like me, totally colourblind have no functional means to figure out just what in the hell we're supposed to "see" in them. Some places have noted this and added audio versions but the majority of sites using CAPTCHAs do not bother. For those webmasters thinking of using CAPTCHAs - there are a lot of us out here who are visually disabled and, um, we have money to spend with your competition if you can't at least meet us halfway.

    --
    Think of me when you shave your legs...
    1. Re:Captchas by /dev/trash · · Score: 1

      huh? What commercial sites require a captcha?

    2. Re:Captchas by linuxfanatic1024 · · Score: 1

      You can probably e-mail the webmaster and have her/him create an account for you. Most webmasters will understand if you have problems with them.

      --
      Microsoft-free since March 28, 2004
  33. Visually impaired? by cojsl · · Score: 1

    A quick skim of TFA didn't indicate whether these are any more accessible for the visually impaired. The current "audio captcha" option help, but standard captchas are a real barrier to the visually impaired.

  34. BitBloating the pixels to individual TD tags by TehBeer · · Score: 1

    BitBloating the pixels to individual 1X1 TD tags will hardly make a difference. All that is needed is to reconstruct the bitmap then use OCR on it as usual.

    What a sad solution.

  35. Doesn't render in Konqueror 3.5.2 by TheJohn · · Score: 1

    Not only is it bad form, it appears it may stress some browsers enough that it blocks legitimate users.

  36. Clever but no cigar. by MikeFM · · Score: 2, Insightful

    Locating the captcha in the rendered page can't take more than a couple seconds. You'd have to change it a lot to change that. It's a blocky, colorful, bit of screen near a form submit button. Even if you change it there are only so many ways you can change it without making it confusing to users. If a user can find it then I can write a script to find it.

    It's a useful tool to slow down script kiddies but it won't stop anyone that could actually write the code to grab the characters in the image in the first place.

    --
    At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
  37. But was it interesting to the /. audience? by Anonymous Coward · · Score: 0

    "because it seemed totally irrelevant to the topic of this article." ...
    "Re-read the first paragraph of your comment entitled "Not offtopic" and think how the English could be corrected to improve your communication."

    No, please, whoever moderated, stop misusing negative mod points. Just because they could not see how it relates, doesn't mean others have the same problem. Slashdot isn't a forum for professional writers and comments are not expected to be punchy journalism.

    1. Re:But was it interesting to the /. audience? by Thundersnatch · · Score: 1

      What you wrote had no positive value, and was moderated accordingly. We are all dumber for having read it.

  38. Size is not so bad./ by sugarmotor · · Score: 1

    The page compresses with gzip from 188,398 bytes down to 13,326 bytes. In plain text it displays ca. 5,000 bytes.

    So with HTML compression the size of this encoding isn't really a problem.

    But as mentioned at http://en.wikipedia.org/wiki/Captcha the real hurdle is that the opponent can use low-paid data entry workers: http://it.slashdot.org/article.pl?sid=06/09/06/121 7240 "Will Solve Captcha for Money?"

    Stephan

    --
    http://stephan.sugarmotor.org
  39. Use it to prevent hotlinking instead of captchas by Anonymous Coward · · Score: 0

    I can only see one good use for this as a method to prevent easy hotlinking/stealing of normal images, as long as it's CSS optimised as a previous post, and the page in TFA itself, mentions.

    Or even as a non-image way to render text on a page in fonts that don't exist on the users system, such as is used at http://mardeg.sitesled.com/

  40. Doesn't need to be this complicated by Anonymous Coward · · Score: 0

    There are still image captcha for phpbb that have not been cracked, although the default phpbb captcha has been cracked.

  41. Make the question a wav file by cheekyboy · · Score: 1

    Dont ask the question in text, use an audio file.

    Generate the audio file using a good natural text to speech maker.

    Ofcourse, use 10 variations of grammer for the questions perhaps. Easy to do in real time.

    --
    Liberty freedom are no1, not dicks in suits.
    1. Re:Make the question a wav file by sulfur · · Score: 1

      As far as I know, quite a lot of people have their sound muted or no speakers at all (especially at work). I turn on my speakers only when I need it (thanks to those lame websites with background music), so audio captchas would be more annoying for me. Also, think of all people whose native language is not English (it's *much* harder to understand audible information than readable).

    2. Re:Make the question a wav file by linuxfanatic1024 · · Score: 1

      What about deaf visitors? Don't they count?

      --
      Microsoft-free since March 28, 2004
  42. Lots of security issues by Anonymous Coward · · Score: 0

    http://bmaurer.blogspot.com/2007/01/beware-random- captchas-found-on.html

    - If your OCR can read 1/2 the chars on the page, the md5sum lets you crack the others. Really quickly
    - Forget OCR. It doesn't check that the server itself generated the hashes. Hash "apple" then submit the hash and the word "apple".
    - There are no checks for duplicates. You can solve one captcha and submit it 1000000 times.
    - You can delete any jpeg file on the website, due to the non-checking of the hash for the word ".."
    - You can fill up the dude's disk by requesting lots of captchas but

  43. Captchas synonymous to DRM? by MoogMan · · Score: 1

    "Capchas" and similar technology are just DRM. Thankfully, the audience trying to crack the former are far more stupid than the audience that crack DRM.

  44. Seems Almost Misleading by Sheepeep · · Score: 1

    Although not technically an "image", it's still an image-based solution. If I wrote a CAPTCHA using SVG, it'd still be an image, even if it's a markup language. If I wrote it in Flash, it'd still be an image-based solution, even if it's Flash. I also don't see what would be difficult about automatically singling out an enormous, single-lined (in source) table full of CSS declarations without any 'data'. In fact, it's probably easier to spot as a script than an image...Probably with a similar time to decode it.

    --
    If your idea looks good on paper, you need more paper.
  45. Surely im not the first person by Anonymous Coward · · Score: 0

    #!/bin/bash
    #Pass the page url with the captcha image as the first argument ex "script http://www.omgili.com/captcha.php"
    wget $i -O - |sed -e '/.*tabl.*/!d' -e 's/\<\/table\>/\<\/table\>\n/g' >> table
    for i in `seq 1 $(wc -l table |sed -e 's/ .*//g')`; do
        echo $(head -n$i table| tail -1) >> table_$i
    done
    echo "<html>" `cat $(du -b table_* | sort -n |tail -1 |sed -e 's/[0-9].*t/t/')` "</html>" > html.html
    rm table*
    Not that I'm any more for spam, getting it into an image would not be hard though, html2ps or something similar. Just a nice way to waste your server's bandwidth and kill your users browsers
    1. Re:Surely im not the first person by Anonymous Coward · · Score: 0
      I suppose replacing $i with $1 should actually make it work, its what I get for trying to hack something like that together while tired

      #!/bin/bash
      #Pass the page url with the captcha image as the first argument ex "script http://www.omgili.com/captcha.php"
      wget $1 -O - |sed -e '/.*tabl.*/!d' -e 's/\<\/table\>/\<\/table\>\n/g' >> table
      for i in `seq 1 $(wc -l table |sed -e 's/ .*//g')`; do
          echo $(head -n$i table| tail -1) >> table_$i
      done
      echo "<html>" `cat $(du -b table_* | sort -n |tail -1 |sed -e 's/.*t/t/')` "</html>" > html.html
      rm table*
  46. Cleaner way by Metasquares · · Score: 1

    If you absolutely must use something like this, you can easily confuse spambots (and with far less code!) by interspersing some elements containing the CAPTCHA text itself and making them contiguous on the screen using absolute positioning. Such a thing is an accessibility nightmare, but no worse than the technique in the article.

  47. 'notabot' anti-captcha by Jamie+Lokier · · Score: 1
    I've been tossing around an idea of an anti-captcha, though. Throw in a captcha, and right below it, have a note that says "now, disregard the above captcha and type 'notabot' in the box". I'll probably implement it to see what happens.

    Obviously that will only work until the spammers add a rule to check for what you're doing. If your method remains a minority, they probably won't bother.

    Given that, why bother asking the user to type anything or show them the graphic? Just use Javascript to enter the text into a hidden field and hide the image ("visibility: none") in a way that looks to a bot like it's shown. Thus, the users aren't bothered but you'll hopefully catch the (current) bots.

    -- Jamie

  48. Functional example of that way of Captcha by gromite · · Score: 1

    I have done it and published on root.cz the functional example (ready to use) of the captcha here: http://kregion.cz/discussion-forum-commentary-spam -filter/ (writen in PHP, MySQL and using GD library - licence GNU/GPL - still not writen as a function, but only to copy&paste to source code - will be corrected in the future version)

  49. How smart can they be if... by IBitOBear · · Score: 1

    ...they don't know the difference between a "DOS Attack" and a simple slashdotting... 8-)

    --
    Innocent people shouldn't be forced to pay for inferior software development.
    --"Code Complete" Microsoft Press
  50. Nothing here to see by OhHellWithIt · · Score: 1

    The page has been taken down. The author says it was subject to a DoS attack. I guess that's what /. readers are, eh?

    --
    "Who controls the past controls the future. Who controls the present controls the past." -- George Orwell
  51. Re:Not offtopic by Anonymous Coward · · Score: 0

    HAHA, fucktard! U got modded down!

  52. wow ! DOS on the author's website by chrisranjana.com · · Score: 0

    "Please note: I removed the HEC demo since it was used to perform a DoS attack on this server. The HECs are quite heavy and I didn't add a mechanism against Dos " So is it reliable to use this technique at all ? I mean the author's website itself has been taken down just like that ! I tell you, internet nowadays is an evil place indeed.

    --
    Chris ,
    Php Programmers.