Slashdot Mirror


How are You Preventing Mailto-Link Harvesting?

mixwhit asks: "In our ever increasing effort against spam, we are now considering replacing all mailto: links on our website with something unharvestable (i.e. 'user (at) address', javascript mailto links, character entity evasion, etc.). Obviously this won't stop the spam, but it seems prudent to stop the harvesting so that the spam may slow down someday (year 2024 maybe?). What are others doing with this issue? We would prefer to preserve mailto link clickability, but also only want to make this adjustment once." One suggestion I would make is to put your email address in an image. People can read it, but harvesters won't be able to harvest it (unless they download the image for OCR), but any barrier you can place in front of the spammer, without blocking people honestly interested in communicating with you, is probably a good thing.

41 of 229 comments (clear)

  1. Mail form by NaDrew · · Score: 4, Insightful

    Just use a mail form instead of mailto: links. Once you reply to feedback mail, the sender has your address and you can correspond normally. Meanwhile, evil spambots can't harvest an address that isn't shown anywhere.

    --
    Vista:XPSP2::ME:98SE
    1. Re:Mail form by skookum · · Score: 3, Informative

      That is only the case if you are running an ancient, brain dead copy of the original (Matt's Script Archive) formmail.pl. But you'd be a retard for doing that and deserve everything you get. Modern formmail scripts do not allow spam through.

    2. Re:Mail form by innosent · · Score: 2, Interesting

      I agree 100%. Either use something like formmail.pl, or write your own custom CGI program to handle emails. It is trivial to write a mail form, and users who wish to contact you will be at your website anyways, so why make them read the address and fire up their mail client? Hell, depending on your site (if you have user registrations), you could even use a database-driven email system, and eliminate spam entirely. Just let the user fill out the form, store the message in the database, and when you reply, they should be able to view messages sent to them the next time they log in to your site. You won't get spam, since you aren't using SMTP, but you still have a good (and probably better, since it is more reliable) system of communicating with your customers.

      --
      --That's the point of being root, you can do anything you want, even if it's stupid.
    3. Re:Mail form by scrytch · · Score: 2, Informative

      > I agree 100%. Either use something like formmail.pl, or write your own custom CGI program to handle emails

      Ironic, that in order to stop spam to you, you would use the notoriously buggy and insecure formmail, turning your box into an open mail relay for spammers to use. Use a secure alternative (there's compatible versions, but really it's not hard to use MIME::Lite yourself). Matt has never fixed formmail to a satisfactory degree, and shows no inclination toward doing so.

      If you roll your own, it'd probably still be more secure than formmail, as long as you don't allow it to take addressing information from the outside. Hardwire the configuration into the script, and break it out into a nonreadable config file if you have to. But don't use a "flexible" form mailer unless you know you've got it nailed down.

      --
      I've finally had it: until slashdot gets article moderation, I am not coming back.
  2. Beware of disability advocates by bluelip · · Score: 4, Interesting

    People fighting for those who have difficulty seeing have been complaining about the sites that have a person type a number displayed in an image to verify that they're not a bot. They say it causes undue hardship on sight impaired folks. That may not be a legal fight your company would like to enter.

    I can see both sides of this. Can't say I know where to stand though.

    --

    Yep, I never spell check.
    More incorrect spellings can be found he
    1. Re:Beware of disability advocates by glivings · · Score: 4, Insightful

      The problem with having e-mail addresses encoded in images goes beyond excluding the blind. People with text-only browsers (a la lynx), screen readers, PDAs, cell phones, etc. are all excluded.

      It's important to remember that web pages are not always rendered visually.

  3. Un-what? by devphil · · Score: 5, Informative
    replacing all mailto: links on our website with something unharvestable (i.e. 'user (at) address'

    What makes you think "user at mail dot foo dot com" is unharvestable? The web archives of all the development mailing lists at gcc.gnu.org use that scheme, and we still get spam to unique addresses used only for sending mail to those lists.

    It's a handy technique, and useful, but it's certainly not foolproof.

    --
    You cannot apply a technological solution to a sociological problem. (Edwards' Law)
  4. Server side scripting by mikeswi · · Score: 2, Informative

    Any method of munging the address must still be clickable within the visitor's browser. If it is clickable, it can be harvested. Javascript and html encoding may stop most of the bots, but bots exist that can slurp the address no matter how much javascript you wrap it in.

    I use a PHP email form that never sends the address to the to client accessing it. Short of hacking the server and looking at the php script in plain text, there is no way to harvest the address. I have no need to let the public know my address. If they want to email me, use the form or use my site's message board.

    I don't want the guy getting slashdotted, so I won't link his site. If you really want the script I use (available in PHP or ASP), go to hotscripts.com and search for dbmaster's mail form.

  5. simple js by anim8 · · Score: 5, Informative

    <script>
    <!--
    var u = "sales" ;
    var d = "example" ;
    var t = "com" ;
    var a = u + '@' + d + '.' + t ;
    document.write('<a href="mailto:'+a+'">'+a+'</a>') ;
    //-->
    </script>

    1. Re:simple js by xingdiego · · Score: 3, Interesting

      I recommend the above method plus:

      1) Randomize the variable names for u, d, t, and a
      2) Randomize the position of var XX = XX statements.

      This will reduce simple regex replacements if you site is big enough with enough emails that someone would want to create a simple reg mod to harvest it.

    2. Re:simple js by Diplo · · Score: 2, Interesting

      Faced with this problem I ended up writing my own email-address encoder that has proved quite popular with friends. Whilst not as sophisticated as some, it works pretty well and will generate both HTML and JavaScript links via simple web-form. Try it out at www.diplo.co.uk/encode/. (Obviously, all email addresses' entered into this are sold on :p )

  6. Hiveware's Enkoder by jpsowin · · Score: 3, Informative

    Just use this. Life is good, eh?

    1. Re:Hiveware's Enkoder by dimator · · Score: 3, Informative

      This is a really cool idea, actually. Two things though: it increases the document size a good deal, since the my email address (19 characters) becomes a 1383 character string. This could really add up if you had more than one email address on the page (such as a mailing list archive). Although, in the world of broadband, thats a small price to pay.

      The other thing is, if you are using this, you'd be wise to change the string 'hiveware_enkoder' to something unique. The reason being, if spam harvesters really wanted to, they could recognize that string, and have their own javascript engine handy run the script to get at the email address hidden inside. That's a lot of work, but not entirely impossible. If the Hiveware system gains many users, it might be worthwhile for them.

      --
      python -c "x='python -c %sx=%s; print x%%(chr(34),repr(x),chr(34))%s'; print x%(chr(34),repr(x),chr(34))"
  7. I use an image by Kris_J · · Score: 2, Insightful
    My personal site uses a simple image of my email address with no link. So far no spam, but the odd real email. Even if it does start getting spam, it's a Spamcop address. At work, we have a generic text-only active link as you would expect for reception. For individual emails you need to be logged onto our student/staff portal.

    Meanwhile, I'm keeping an eye out for the next technology to replace email. IM was promising about five years ago, but went to hell faster than email.

  8. Uhh... by babbage · · Score: 2, Informative

    Quoth the original message...

    What are others doing with this issue? We would prefer to preserve mailto link clickability, but also only want to make this adjustment once." One suggestion I would make is to put your email address in an image. People can read it, but harvesters won't be able to harvest it (unless they download the image for OCR)

    Err, doesn't this exactly not meet the given criteria? The guy wants links to be clickable. If you hide the image, you can only get as far as, say:

    <a href="mailto:foo@bar.com">
    <img src="email_addy.png">
    </a >

    But that's just as easily harvestable as it would have been if you left the visible text as the plain address. What's the point?

    It's the contents of the href attribute that need to be obscured, not the visible text (or image, or video clip, or whatever). You can't embed an image in the href text, so I don't see how this suggestion gains us anything at all.

    ---

    The suggestion I like best is to encapsulate the address as HTML entities. Currently, this is enough to fend off the average address harvesting software, though if the practice catches on, I assume that the harvesters would start to take this into account -- at which point I don't know what the solution should be...

    Barring that, it seems like the only way to provide an address will be to use literal text such as "write to us at foo at bar.com" and hope people just get it.

    Alternatively, shy away from giving out your address, and provide a form where visitors can submit comments. This could allow you to filter out some of the incoming traffic (hint, if you're going to use "off the shelf" software for this, use NMS instead of Matt Wright's ancient Formmail.PL script, it's much safer). Avoiding any publication of email addresses might piss Jakob Nielsen off, but under the circumstances I think it's probably a reasonable approach to the situation -- it's way to easy for a public address to get abused...

    1. Re:Uhh... by Webmonger · · Score: 3, Interesting

      You can't embed an image in the href text, so I don't see how this suggestion gains us anything at all.

      Actually, you can.
      data URL examples

      Sick, eh?

  9. How I do it... by Pathwalker · · Score: 2, Informative

    I've been looking at a couple of different techniques over the past year or so. They are closely tied into the Roxen Webserver, and probably won't work with Caudium, or any other webserver.

    The first technique I used (described here) was a simple RXML macro, that defined a tag called <cloak>. It would check to see if the client was on a list of known robots. If the client was a robot, a graphic version of the email address would be returned. If the client looked like a normal browser, then the address would be entity encoded, and returned as a mailto link.

    Shortly after I set that up, I realized that entity encoding was pretty much useless - that if a web browser can figure out the address, so can a spam bot.

    My second attempt appears to be working well. I wrote a Roxen module called mailcloak which takes addresses, and replaces them with a graphic link to a dynamically generated form to send an email to that address.

    As an example, the code <mailcloak> maileater@ofdoom.com</mailcloak> would be replaced with a graphical version of the address maileater@ofdoom.com and a link to this page.

    It also has support for finding and cloaking bare addresses in pages, and I'll probably add support for rewriting mailto tags sometime in the next few weeks.

  10. Missing the point by jtheory · · Score: 4, Insightful

    You have to consider the trade-off of the inconvenience of your readers/customers with the amount of spam you get.

    I have a few websites with my email address all over them, in mailto links. I "mask" the email very lightly, by escaping most of the characters, and it has worked beautifully.

    Here is a webpage that will quickly convert your mailto link into a form that bots will miss.

    Could a bot be written that would be able to harvest these email messages? YES. But would it be worth the spammer's time to code it? NO, so it probably won't happen.

    Put yourself in the spammer's shoes (or slime-covered bedroom slippers). Why would you want to go to a lot of work to build a bot that will harvest the email addresses of the very people you don't want to get your spam, because they will report you to spamcop, harass your ISP, and even hack your computer and post some very unattractive pictures of you on the internet?

    No, they want the chumps, and they want to find them without needing to check every webpage for dozens of patterns.

    --
    There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
    1. Re:Missing the point by eugene+ts+wong · · Score: 2, Interesting
      You wish.

      Just like the mailing list archives that cloak everyone's address "foo AT bar DOT baz".
      I think that a partial solution is to speak about email addresses in a more casual form. For example, if my email address is foo@bar.biz.baz then I should tell people that they can contact me @ foo @ bar biz baz. You should have noticed 2 things.

      Notice that there is no word, "dot", in there? That's because most people should already be able to figure it out on their own. If they can't then they shouldn't be using your time.

      Also, did you notice that I used, "@", twice? That's because I use it as a part of my regular vocabulary. It consists of the same number of keystrokes, yet I end up filling the Internet with more "@" symbols, thus making it harder to find the real addresses.

      Spammers could try to figure out what is an email address by searching for the top level domain names, but I'm sure that that will be harder to find as people begin to smarten up & start using much more casual domain names. Maybe they'll use the regular domain names & split it up into 2 sentences. For example, "You can contact me @ my work email address. The user name is blahblah. The company name is bizbaz.". From there, it shouldn't be too hard to figure out.

      I hope that helps someone.
    2. Re:Missing the point by An+Anonymous+Hero · · Score: 3, Funny
      Here is a webpage that will quickly convert your mailto link into a form that bots will miss.
      You know, there is a concept here. "STOP SPAM FOREVER IN TWO EASY STEPS:
      • enter your email adress HERE
      • click OK!
      This is the BEST, FOOLPROOF way to NOT GIVE YOUR ADDRESS AWAY!!"
  11. Re:Don't bother, it's too late by Rick+the+Red · · Score: 5, Interesting

    No kidding. Comcast gives us seven email addresses, so I set one up for each of us. My three month old gets spam, and nobody has EVER used that account (except me sending a test email when I first set it up). These scum just take a brute-force approach to generating email addresses, and don't care how many are undeliverable. They come with opt-out buttons, but all those do is confirm they found a valid address, and they never send from the same address twice, so adding them to a filter list doesn't work either. Bayesian filters on the content is the only way to go.

    --
    If all this should have a reason, we would be the last to know.
  12. Use a Form by Alethes · · Score: 2, Informative

    I recommend that you use a form that does NOT have the user's email address in a hidden input. Just have the user's ID, then on the server, find the address based on that ID and send the message accordingly. I know you want to keep the mailto: link thing happening, but if you do that, harvesters will always find a way to decode whatever you're doing.

  13. Re:it works like this by FrenZon · · Score: 3, Interesting

    Alternatively, to keep it transparently usable by end-users, you can just do like this:

    <a href="false@false.com" onmouseover="var a = 'in.com'; this.href = 'real@doma'+a;">email me</a>.

  14. "block images from this server" by KnightStalker · · Score: 3, Insightful

    I suspect you're using an ad-blocking browser or proxy, which has blocked the image itself but has left a large (clickable) white space that would be the image if you hadn't blocked it. That's the behavior Firebird shows for me, blocking ads.osdn.com. If you're using Mozilla or Firebird, and you right-click on the "background" I think you'll find "block images from this server" or "block images from ads.osdn.com" checked.

    --
    * And remember, it's spelled N-e-t-s-c-a-p-e, but it's pronounced "Mozilla."
  15. The other cost/benefit by jtheory · · Score: 2, Interesting

    As soon as any reasonable number of people start using the same scheme (and particularly if it's a mailto: designed to still be machine-readable) someone will take the time to harvest that kind of obfuscated address. It's just a matter of the cost/benefit ratio being high enough to make it worthwile.

    I think you're right as more websites use automated obfuscation; then the spammers need to decode it to get to their victims. But as long as most websites aren't doing what I'm doing, I know they don't want to target the techies.

    Here's another POV, though -- I'm considering the *other* cost/benefits ratio. I want my users to be able to easily email me, and giving them a simple mailto: link is the best way to do that. We'll have to wait and see.

    Right now, it seems to be costing nothing, since I'm only getting spammed on the standard "guessed" names at my domains, like "sales@" and "webmaster@". But 5 spams a day would still be worth the trouble.

    If the bots do start to really catch up (they may... I'm hoping enforced laws will start to catch up over the next few years!), at some point I might move on to the next-least-inconvenient masking method, which is probably randomized JavaScript masking. I.e., the mailto: link is generated by custom JavaScript that builds the address across a few lines of code. That would prevent users w/o JavaScript from using the link, though, which is a cost I want to avoid.

    --
    There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
  16. Unicode by vitaflo · · Score: 2, Informative

    I actually just use unicode for the @ symbol (&#064;). It seems that most of the time the harvesters just read the HTML source, and don't actually render HTML entities or unicode. Thus the harvester will get user&#064;example.com, a non valid address, but a user on your site will see user@example.com and the mailto: link will function normally.

  17. Here is what we do by wolfson · · Score: 2, Interesting
    Here is the php code that I use on Aginet.com

    function gen() {
    mt_srand(make_seed());
    $x = "aginet3";
    $list= "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLM NOPQRSTUVWXYZ";
    $x .= $list[mt_rand(0,61)];
    $x .= $list[mt_rand(0,61)];
    $x .= $list[mt_rand(0,61)];
    $x .= $list[mt_rand(0,61)];
    $x .= $list[mt_rand(0,61)];
    $x .= $list[mt_rand(0,61)];
    $x .= $list[mt_rand(0,61)];
    echo "<a href=\"mailto:$x@aginet.com\">$x@aginet.com</a> ";
    $x .= "\t" . date("m/d/Y h:i A");
    $x .= "\t" . $_SERVER["REMOTE_ADDR"];
    $x .= "\t" . $_SERVER["HTTP_REFERER"];
    $x .= "\t" . $_SERVER["HTTP_USER_AGENT"];
    $fp = fopen("/xxx/xxx/xxx.xxx", "a");
    fwrite($fp, $x . "\n", 1024);
    fclose($fp);
    }

    In my mail server I redirect the random addresses to a single e-mail. Then when I get spammed, I can trace it back to an IP, and contact the hosting company or ISP that it originated from.

    Visit blue.aginet.com for my other GPL'd code. Feel free to use the source code in this example. I only ask that you give me credit if its used for a commercial purpose.
    --
    Scott Wolf Senior Software Engineer Slingpage
  18. blocking mailbots by engine+matrix · · Score: 2, Interesting

    I have a 1 pixel transparent gif link at the very top of my page that links to /guestbook/jackhole. In my robots.txt file I have "User-agent: * Disallow: /jackhole/ Disallow: /jackhole/guestbook/". When a harvester traverses this link their IP is added to a text file via a php script that I wrote and they immediately get a 403 page.

    Each page of my site checks against this text file so the mailbot gets a 403 page for almost all pages/sites that I host. To deal with false positives there is a mailto link on the 403 page that goes to a TMDA address. At the very least it saves me bandwidth.

  19. Re:Hivelogic Enkoder by Yottabyte84 · · Score: 2, Informative

    this script reqires a mail deamon that delivers user+anything@example.org to user@example.org.

    #!/usr/bin/perl -w

    use Socket; # Load socket functions
    use CGI qw(:standard); # Load CGI standard functions

    my $name = "harvestbait"; # yourname
    my $domain = "example.org"; # yourdomain.tld

    my $ipaddr = $ENV{'REMOTE_ADDR'}; # Get the requester's IP
    $ipaddr = unpack 'H*', inet_aton($ipaddr); # Convert the IP to hex
    my $date = `/bin/date +%H%M%m%d`; # Get a compact timestamp
    chomp($date); # Get rid of the newline char
    my $addr = $name."+".$ipaddr.$date."@".$domain; # Make email addy from bits

    print header, # Print HTTP header
    start_html(-meta=>{'robot'=>'noindex'},
    # Print HTML document header
    -title=>'Send me an email!'), # Page title
    q(You can send me an email by clicking ), # Page content
    a({href=>"mailto:$addr"},"here"), # The time+ip tagged mailto:
    q(. No junk mail please! ^_^), # More content
    end_html; # End the HTML document

  20. There is a simpler one by zhiwenchong · · Score: 2, Informative

    This one doesn't use Javascript at all. And it's only 4k.
    Obfusticated Email Link Creator

    It does mixed dec and hex. Creates links like this. But check the underlying code....

    It's a Tripod site, so don't /. it.....

  21. blind people by kipple · · Score: 2, Insightful

    already have a lot of trouble with that picture-of-the-email-address thing. it is a neat solution but it lacks portability, to state it another way.

    --
    -- There are two kind of sysadmins: Paranoids and Losers. (adapted from D. Bach)
  22. unicode, base-64 encoded by ubiquitin · · Score: 2, Informative

    I have a unicode converter that works really well. It will put your email address into a form like:

    & # 105;& # 032;& # 100;& # 111;& # 032;& # 105;& # 116;& # 032;& # 116;& # 104;& # 105;& # 115;& # 032;& # 119;& # 097;& # 121;

    For the past three years or so, the spammers haven't caught on to this, and they are unlikely to do so given the few people who take the effort to put this measure into place.

    P.S. It's not just mailto links that are being harvested here. They'll scrape anything with an @ or a "at" or ...

    --
    http://tinyurl.com/4ny52
  23. maybe, just maybe by DrSkwid · · Score: 2, Insightful

    they spam :
    info@yourdomain
    sales@yourdomain
    help@yourdom ain
    webmaster@yourdomain
    postmaster@yourdomain

    etc.etc.

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
  24. Fight the problem, not the symptoms by Baloo+Ursidae · · Score: 2, Interesting
    Focus on reporting, not prevention. You'd be amazed how quickly making yourself a hostile target gets spammers to stop spamming you.

    Also, don't munge.

    --
    Help us build a better map!
  25. Re:Javascript mailto links... vulnerable? by Specialist2k · · Score: 4, Informative
    There are e-mail harvesting bots which use the Microsoft HTML ActiveX control, so they can and will execute any JavaScript present on the page.

    Wait... this provides some nice opportunities to cause them a major headache by including malicious JavaScript code on a page only seen by a bot not following the robots exclusion protocol (to prevent a "real" search engine spider from visiting the page) by linking to that page using some hidden link from your home page...

  26. Unicode actually works! by aquarian · · Score: 2, Insightful

    Believe it or not, this actually works. These days most harvester programs still don't read Unicode. Once I started doing this, I saw a great reduction in spam. It won't work forever, of course -- eventually the spambots will read Unicode, and the game will be over for this technique. But in the meantime, it's easy enough to do a search and replace of every "@" symbol.

    If you want to convert your whole address, E-cloaker is a neat little free program for converting text to Unicode.

  27. Not for Netscape 4 by extra88 · · Score: 2, Insightful

    I haven't checked the stats recently but Netscape 4.x and earlier does not supports Unicode. Pretty much all browsers can handle the HTML entities given in other examples. You may not care.

  28. Re:Fraid Not by Mr+Z · · Score: 2, Informative

    "Beg the question" is a shortening of "beggaring the question"--ie. answering a question with the question itself. "Why don't parallel lines cross? Because lines that never cross are parallel!"

    If you look at the definition for beggar, you'll see one of the definition "One who assumes in argument what he does not prove." (Source: Webster's Revised Unabridged Dictionary, (C) 1996, 1998 MICRA, Inc.) In fact, this meaning of beggar has survived as a submeaning of 'beg.' This link on dictionary.reference.com supports my point. Look at definitions 3a and 3b.

    So, the parent poster to your post is quite correct. His statement was not a hypothesis, but rather closer to fact, based on accepted usage.

    Granted, standard American usage seems to treat "beg the question" as a synonym for "raise the question", but that's a rather incorrect usage, IMHO.

    --Joe
  29. Re:Mail form - bad idea by John+Q.+Public · · Score: 2, Insightful

    My problem with mail forms is that I don't have a record of any messages sent or any information if things go wrong with the delivery. Black hole for information == bad.

    That being said, if you have a copy sent to the sender as well it's not as evil.

  30. Re:Plug by greenhide · · Score: 2, Informative

    Hey, guess what.

    I was able to use your form to send myself spam!

    That's right.

    I entered my e-mail address, a from address, and the mail went through.

    Essentially, your web page is providing the equivalent of an open relay.

    You need to remove the "mailto" field, as that allows the form to be used to send out an address to anybody. Once that's gone, your form should be secure again.

    --
    Karma: Chevy Kavalierma.
  31. Re:Javascript mailto links... vulnerable? by merlin_jim · · Score: 2, Interesting

    this provides some nice opportunities to cause them a major headache by including malicious JavaScript code on a page only seen by a bot not following the robots exclusion protocol

    A lot of people do that with a malicious honeypot page. It just outputs X phony, but real-looking, mailto links, where X is a member of the set of Very Large Integers.

    (note to /. math freaks: yes I know there's no set called Very Large Integers. It's a joke. Laugh.)

    --
    I am disrespectful to dirt! Can you see that I am serious?!