Slashdot Mirror


Baffling the Spam Bots

dumpster_dave writes "Scientific American is running an article, Baffling the Bots on techniques to outsmart and subvert spam bots and their chat-room cousins via CAPTCHA. You have probable seen this in the form of images containing text as gate-keepers to various on-line services. The latest evolution is using non-words and distorting the text such that even the best AI systems cannot decipher them, yet humans can not help but do so [cf., Gestalt Psychology]."

38 of 350 comments (clear)

  1. Blind Users by X-rated+Ouroboros · · Score: 5, Insightful

    I've often wondered how these types of systems can be made handicapped accessible

    --
    Simple Machines in Higher Dimensions
    1. Re:Blind Users by The+Clockwork+Troll · · Score: 3, Interesting

      Instead of sending an image of distorted text, send a wave file of distorted speech - easy for the human ear to discern, but harder for run-of-the-mill speech recognition tools to do.

      --

      There are no karma whores, only moderation johns
    2. Re:Blind Users by zcat_NZ · · Score: 3, Interesting

      Easy; When you generate your mangled GIF image, also create a wav/mp3 containing the same information (eg using TTS software, or by concatenating pre-recorded audio files).

      Most blind users are running windows with JAWS or similar screen-reading software, and sites like ACB release a lot of their content as mp3's already, so I'd assume that most are well equipped to handle web audio.

      --
      455fe10422ca29c4933f95052b792ab2
    3. Re:Blind Users by EvilNTUser · · Score: 3, Insightful

      "Then you have to worry about those with poor or no hearing, as well as those with poor or no sound equipment. Why not have someone solve a riddle or puzzle"

      Because then you'd be discriminating against stupid people, and keeping them off the internet.

      Oh, wait...

      --
      My Sig: SEGV
    4. Re:Blind Users by Talez · · Score: 4, Funny

      It's part of the three pronged attack on spam.

      1) Obfuscate e-mail addresses
      2) Stop spammers from getting to places containing real email addresses
      3) Keep stupid people off the internet so the revenue stream of spammers is cut off.

    5. Re:Blind Users by Pathwalker · · Score: 2, Funny
      For some time, I've felt that math is the answer to verifying that a viewer is a human, and still keeping the test accessible to the widest number of disabled people.

      A couple of simple math/logic problems such as these should be suitable:
      1. Find the two roots of x*x-16x+60=0
      2. What are two numbers who have the sum of 16 and the product of 60?
      3. From the following facts, what can you infer about Albert?
        • All men are mortal.
        • Albert is a man
      Simple puzzles like this should be able to be figured out by almost all people in a few seconds, and can be expressed in plain text, making them accessible to the blind.
    6. Re:Blind Users by vidarh · · Score: 2, Insightful

      Big problem with this: Let's say this type of challenge is given 1 out of a 100 times. It has the MASSIVE weakness that word lists with classifications are readily available (hint: computational linguistics - academics have spent decades preparing computer readable databases of stuff like this for use in their research), and if not can relatively quickly be built (think parsing dictionary.com output, looking for the category keywords). Say these method will solve 1 out of 10 of the challenges, which I think is very low given both the possibility of scanning a dictionary entry and availability of specialized word lists. That means 1 in a 1000. Which means somebody will hammer your registration server, and still be able to register 100's of accounts a day that they can abuse.

    7. Re:Blind Users by aridhol · · Score: 2, Informative

      Braille.

      --
      I can't say that I don't give a fuck. I've just run out of fuck to give.
  2. I've always thought by Sir+Haxalot · · Score: 3, Insightful

    that just using johnsmithword-AT-hotmail.com works fine (where word is taken out and -AT- is replaced with @) I use that and have yet to have a single spam email.

    --
    I have over 70 freaks, do you?
    1. Re:I've always thought by Grimster · · Score: 5, Interesting

      Yes this is a great solution if the only people you want to email you are a little towards the smart side. But speaking as someone who has to deal with "joe sixpack" daily I've seen people who are confused by user@NOSPAMdomain.com and when I tell them to go to http://webmail.domain.com/ to get their webmail they put www. on the front!

      These same people if I were verbally giving them the url to slashdot would end up at http://www.slash..org/ (god I wish I were trying to make a joke but seriously I've had this happen).

      Because of this my email is plainly visible on our web site, and in my forums, and on a few other forums and on an occasional usenet message. With a combination of RBL's, bayesian filtering, procmail soup and other goodies my spam count per day is kept to a low roar (double figures in spam number rather than four figures, again I wish this were joking).

      --
      --- www.f-theocean.com
    2. Re:I've always thought by gantrep · · Score: 2, Funny

      Are you sure they wouldn't end up at http;\\www.\..org/?

    3. Re:I've always thought by andih8u · · Score: 3, Informative

      I've been using this http://jodrell.net/projects/mailto which puts your mailto link into a coupla hundred character long javascript. People can still click on the mailto link as per norm, but getting the address from the source is a different matter.

      --


      slashdot, news for crazed liberal socialist zealots
  3. I don't receive any spam by Dancin_Santa · · Score: 3, Informative

    Hotmail's spam filter has gotten really smart in the past few months. Yahoo's filter used to be the best among web mailers, but Hotmail has improved to the point that I don't get any spam in my hotmail inbox anymore.

    I'm not one to go about shouting the praises of Microsoft, but someone over there's got their head out of their asses.

    1. Re:I don't receive any spam by benna · · Score: 2, Funny

      50 bucks says all the AC replies to this parent are from the same IP.

      --
      "It is not how things are in the world that is mystical, but that it exists." -Ludwig Wittgenstein
  4. Losing battle against false error by YellowSubRoutine · · Score: 2, Interesting

    This is a losing battle.
    Smart humans will outsmart computers for quite a while. The average human is already dis-comforted with such a test (what's the middle word in the second image?!).

    But those systems should work for the dumbest (within reason) humans. They're trying to design a test that's passed by the dumbest of six million, yet makes the smartest of a few (bots) fail.

    I give in.

    *comment about spambot overlords*

  5. Keep tabs on where your address goes by bigberk · · Score: 4, Insightful

    Everyone should know this by know, but you can control spam by keeping tabs on where your email address goes.

    The address I use to post to USENET is completely disposable. The 'swen' worm in fact picked up my USENET addy and spammed it with about 40,000 emails. The address is now dead, but I saw that coming.

    I have a public address which I give to casual contacts (who may not be totally trustworthy). This address changes yearly, and this keeps it spam free.

    My well guarded private address, which I only give to my closest friends, has gotten no spam for 5 years. I receive about 20 emails per day at that private address and there is 0 spam.

    1. Re:Keep tabs on where your address goes by penguin7of9 · · Score: 2, Insightful

      Well, lucky you. However, most people actually have some sort of public existence: they run a business and want clients to be able to contact them, they are teachers or professors and students need to be able to find out their address and contact them, etc. Hiding one's address simply isn't a solution.

  6. Instead of Text? by vraddict · · Score: 2, Informative

    Why not use a photograph of something very destinguishable by a human, IE a picture of a horse, or car, etc. It would be much more difficult to program a bot to detect what is in the picture. Or better yet, use that and the CAPTCHA text located in the corner of the photograph. It doesn't seem like it would be that much more trouble to enter in two pieces of information instead of just the CAPTHCA text.

  7. CAPTCHAs are not the answer by Eponymous+Cowboy · · Score: 4, Interesting

    Earthlink has an optional system like this, where unknown senders are blocked by default. They receive an autoreply giving them a URL to go to where they must enter the text from a CAPTCHA.

    Unfortunately, the system does not work very well. My dad sells on eBay, and a buyer of one of his auctions had an Earthlink account, which blocked the message that told how much the shipping would be, where to send payment, etc. When my dad went to the specified URL, and entered the CAPTCHA text as requested he would simply get an error message that he had entered it incorrectly. He forwarded me the Earthlink email and asked me if it was just him; it wasn't; I couldn't get it to work either. The random string of numbers and letters was very distorted, and there were four possible meanings; I tried those plus at least ten more with no sucess. The message never got through.

    There are many problems with this type of system. Consider: what if both parties have CAPTCHA-enabled accounts, from different providers? The confirmation messages from both parties get blocked. Smarter systems whitelist people as messages are sent to them, but as in the eBay case, the recipient had no way of knowing my dad's email until AFTER a message from him was received. It's a Catch-22.

    And for people who are visually impaired, universal deployment of this system this makes email essentially impossible. Earthlink's page had a link "if you cannot see the picture, click here" and when you got to that they said to call their 1-800 number if you have any problems. Right.

    Adding CAPTCHAs to everyone's email systems is NOT the way to solve the spam problem. We need a more realistic, permanent solution. For example, cryptographically authenticating the sender (the "From" field) at the level of the originating ISP (and rejecting messages from senders it cannot authenticate, by password or whatever means), and then having each relay in turn authenticating the previous relay if it trusts it. Headers can be inserted in the emails, signing the previous headers with private encryption keys with their public counterparts obtainable from the ISPs by simple DNS lookups. This will build a chain of trust, which stops when a message gets outside of the sender's network, and therefore allows the original sender to be properly identified back through their ISP. Once we know who messages are from, people can be held responsible. And at that point, anti-spam laws can handle the rest.

    --
    It's hard for thee to kick against the pricks.
  8. Big problem by Lord_Dweomer · · Score: 2, Insightful
    I've always thought this was an incredibly creative solution. However...sometimes it works a little too well. I've encountered sites where I can't make out what the word is no matter what I try. And I'm not even colorblind/blind. The problem is....this filter does a good job of filtering not just computers who would have difficulty piecing the information together visually, but humans who might have problems doing that as well.

    One solution might be to offer multiple ways of deciphering. Such as an audio clip that could play a distorted version of the phrase that you could then type in. Or even ask simple questions, such as "What color is the background?".

    Then there's the other issue of the code not being visible simply because I'm using Mozilla....but thats a whole different can of worms.

    --
    Buy Steampunk Clothing Online!
  9. Could baffletext be used here ? by Rosco+P.+Coltrane · · Score: 2, Insightful

    Slashdot could benefit from such a human checker, each time someone posts, so that idiocies from crapflood scripts could be kept in check.

    --
    "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
    1. Re:Could baffletext be used here ? by BrainInAJar · · Score: 2, Insightful

      Problem with that is that even though trolls seem subhuman, they're actually just extremely stupid humans

  10. A better way to do this... by GPez · · Score: 2, Interesting

    A big problem with CAPTCHAs is that they can be "broken" with some vigilance and know-how, although not 100% of the time. Yahoo!'s has been broken by a UC Berkeley group, they claim a 92% success rate. The UCB algorithm looks at the image then searches through a dictionary to find the most probably matches and spits them out (you can actually see on the site how it chooses and how close it gets when it misses, mistaking 'grip' for 'slip' and so on).

    What is really needed for a *good* CAPTCHA is not pure image obscurity, but rather something that combines hard-to-read images with aspect about language that humans know intuitively, while at the same time being very difficult for computers to sort out. Take word associations, for example. You probably learned how words are associated with each other in 1st grade, so for humans it is a very simple task to pick out words that have a common theme. Computers are a different story. Have a CAPTCHA randomly spit out 10 words to the screen and have the user pick the 3 that are associated with one another, say for example HOUSE, LOG, FRONT, CAT, BROWN, DOG, CART, RUNNING, HOUR, MOUSE.

    Even if the algorithm was to correctly identify all 10 words, it would still have to figure out what the association is and then correctly identify the words that fit the association. Assuming that it did correctly identify all of the words, at that point random guessing would yeild a success rate of 0.83%, less if it misidentifies even just one of the words. Combine something like this with a slightly smarter word obfuscator and I think it'd be something that would be very hard to beat...unless you're human, of course :)

  11. Re:A better way to do this... by Rosco+P.+Coltrane · · Score: 5, Funny

    I have a better idea : present a complex differential equation and ask the person to solve it in less than 10s. If he fails, he's human.

    --
    "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
  12. Aren't they trying too hard? by danila · · Score: 3, Insightful

    Am I the only one having troubles deciphering the second word on the second picture?

    --
    Future Wiki -- If you don't think about the future, you cannot have one.
    1. Re:Aren't they trying too hard? by AKnightCowboy · · Score: 2, Funny
      Am I the only one having troubles deciphering the second word on the second picture?

      It says:

      NVIRGIE
      OBVIOUSE
      HURCHES

      I'm not sure what the hell that means, but if they're expecting someone to come up with other words in place of those then they're really expecting too much. Anything this complicated isn't worth it.

    2. Re:Aren't they trying too hard? by danila · · Score: 2, Funny

      Well, second one might be ODVIOUSE or even ODVLOUSE, but I don't think the second letter is B.

      --
      Future Wiki -- If you don't think about the future, you cannot have one.
    3. Re:Aren't they trying too hard? by herrvinny · · Score: 2, Funny

      I thought at first it was "CDVIOUSE". The first letter looks a lot like a C, especially with that big chunk cut out of it's right. The second looks like a D to me, because all the top of the B is cut off. Are you sure it's supposed to be "OBVIOUS"?

  13. And I thought the eye tests were hard enough... by Ron+Bennett · · Score: 3, Insightful

    I'm not sure about others, but I have a difficult time with sites which use distorted numbers on a nearly matching background...and I'm not even color-blind.

    Sound is better, but even that sometimes can be difficult to understand - also, I don't have speakers hooked up on some machines I use; some folks disable sound due to abnoxious websites/ads that blast sound unexpectedly.

    Anyways, many of my relatives and friends can't get into sites that use distorted numbers, etc at all and are basically locked out; sometimes they get lucky and find a similar site (likely a competitor) to the site they desired, which doesn't use such nonsense...

    Seems to me a better way is use geotracking (too many inbound connections from similar sources [IP ranges, routes, browser config, etc), email verification, etc... ...and perhaps even requiring the person to call a phone number to activate the account - ideal for financial-based sites such as banks, payment
    sites, etc.

    With good heuristics (really the key to stopping automated bots in my view), any decent website should be able to filter out much of the bots and other junk - it's no accident really that many of the largest sites don't use distorted numbers, pictures, etc - how do they do without them?...perhaps be a good Ask Slashdot item :)

    Ron

  14. Re:This is stupid. by jollis · · Score: 2, Insightful

    1. Block all email that contains HTML.. I mean how exciting can a text email be :)... Kills the marketing BS.

    Agreed, this is an immensely useful measure; HTML e-mail simply isn't too useful. This'll also kill all the tracking bugs.

    2. Institute a block all email except where you have whitelisted the sender...

    Powerful, but a huge sacrifice. Feels like throwing in the towel to me.

    3. Allow the sender to get prioritized by requiring them the first time to respond to an email and identify who they are and why they are contacting you.

    Challenge-Response causes backscatter to innocent bystanders. Think of worms and spam with falsified from: headers. Using C-R makes you a part of the problem, not the solution.

  15. Spam isn't that much of a problem ... by DaneelGiskard · · Score: 2, Insightful

    I use my email address for everything, including usenet. My provider runs a spam filter which reduces my spam / day to about 10 pieces. Of course, it filters out about 100-150 spam mails / day. When I'm bored I go through these filtered spam mails, but I did not find a false hit yet, so it works pretty well for me.

    This is convenient, I don't have to care where my email address goes, I just use it.

    1. Re:Spam isn't that much of a problem ... by pe1chl · · Score: 2, Interesting

      Don't count yourself lucky just yet!
      I used the same method, and my own mailserver with agressive filters, and it worked very well until... a Russian spammer started to send out spam with my mail address as the sender address. He did this via hacked systems (open proxies) so it was not possible to do any blocking.
      The load of crap that came in was just unbelievable, and all attempts to contact his spamvertized site or their providers just had no result.

      In the end the only thing I could do was remove the MX record for the domain. I pointed it to the spamvertized site instead. Hopefully they are happy with their own bounces.
      Of course I cannot receive any legitimate mail on that address anymore :-(

  16. type what you see: by gfody · · Score: 3, Funny

    <img src="it_says_kitten.jpg">

    heh dumb bot

    --

    bite my glorious golden ass.
  17. The real problem with CAPTCHAs.. by gschmidt · · Score: 3, Interesting

    .. is that they can be brokered. If you give me a puzzle, *I* don't have to solve it; all I have to do is induce someone, somewhere, to solve it, and give me the answer. That means I can set up a CAPTCHA-solving factory in Taiwan, or field a porn site where users pay for their pictures in CAPTCHA answers. (*My* CAPTCHAs, the ones my script was assigned to answer in order to make Paypal transactions, not new ones I made up on the spot.)

    Suppose that a human can solve your CAPTCHA in an average of five seconds. Suppose unskilled labor costs $6/hour. Then it costs a bit under a cent to find the solution to your CAPTCHA, assuming that I want to solve at least a few thousand a day. As a result it is impractical to protect a service worth more than a penny with a CAPTCHA.

    Actually unskilled labor costs far less than $6/hour in some parts of the world, so if CAPTCHAs see wide use the value of the services they can protect is even lower. A tenth of a cent?

    CAPTCHAs should be seen as a proof-of-work mechanism, like "hash cash", not as an oracle that can determine whether a transaction was initiated by a human or a machine. Unlike proof-of-worth schemes that burn CPU time, the value of a CAPTCHA won't be inevitably halved every 18 months by Moore's law; on the other hand, it could be suddenly reduced to zero by breakthroughs in image processing.

  18. What's wrong with this picture by hey · · Score: 2, Insightful

    How about those kid's puzzles where there is an image where many things are "wrong". Like the water from the tap is flowing up. These are easy to solve by people but very hard for machines.

  19. Re:Baffling the spam-bots are easy... by CaptainBaz · · Score: 2, Insightful

    Yes, but this would also baffle users who browse without JavaScript. There are lots of them, and they have a variety of good reasons for doing so.

  20. Easy by fredrikj · · Score: 3, Funny

    Just do a Burrows-Wheeler transform on your e-mail address. Comes with the bonus of preventing stupid people from trying to contact you.

  21. Re:how about tons of fake emails on webpages? by Eustace+Tilley · · Score: 2, Informative

    Take a look at WebPoison