How are You Preventing Mailto-Link Harvesting?

← Back to Stories (view on slashdot.org)

How are You Preventing Mailto-Link Harvesting?

Posted by Cliff on Thursday October 2, 2003 @01:44PM from the making-it-harder-for-the-spammers dept.

mixwhit asks: "In our ever increasing effort against spam, we are now considering replacing all mailto: links on our website with something unharvestable (i.e. 'user (at) address', javascript mailto links, character entity evasion, etc.). Obviously this won't stop the spam, but it seems prudent to stop the harvesting so that the spam may slow down someday (year 2024 maybe?). What are others doing with this issue? We would prefer to preserve mailto link clickability, but also only want to make this adjustment once." One suggestion I would make is to put your email address in an image. People can read it, but harvesters won't be able to harvest it (unless they download the image for OCR), but any barrier you can place in front of the spammer, without blocking people honestly interested in communicating with you, is probably a good thing.

18 of 229 comments (clear)

Min score:

Reason:

Sort:

Un-what? by devphil · 2003-10-02 13:49 · Score: 5, Informative

replacing all mailto: links on our website with something unharvestable (i.e. 'user (at) address'

What makes you think "user at mail dot foo dot com" is unharvestable? The web archives of all the development mailing lists at gcc.gnu.org use that scheme, and we still get spam to unique addresses used only for sending mail to those lists.

It's a handy technique, and useful, but it's certainly not foolproof.

--
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Server side scripting by mikeswi · 2003-10-02 13:52 · Score: 2, Informative

Any method of munging the address must still be clickable within the visitor's browser. If it is clickable, it can be harvested. Javascript and html encoding may stop most of the bots, but bots exist that can slurp the address no matter how much javascript you wrap it in.

I use a PHP email form that never sends the address to the to client accessing it. Short of hacking the server and looking at the php script in plain text, there is no way to harvest the address. I have no need to let the public know my address. If they want to email me, use the form or use my site's message board.

I don't want the guy getting slashdotted, so I won't link his site. If you really want the script I use (available in PHP or ASP), go to hotscripts.com and search for dbmaster's mail form.

--
Only on /.
simple js by anim8 · 2003-10-02 13:54 · Score: 5, Informative

<script>  </script>
Hiveware's Enkoder by jpsowin · 2003-10-02 13:55 · Score: 3, Informative

Just use this. Life is good, eh?
1. Re:Hiveware's Enkoder by dimator · 2003-10-02 14:35 · Score: 3, Informative
  
  This is a really cool idea, actually. Two things though: it increases the document size a good deal, since the my email address (19 characters) becomes a 1383 character string. This could really add up if you had more than one email address on the page (such as a mailing list archive). Although, in the world of broadband, thats a small price to pay.
  
  The other thing is, if you are using this, you'd be wise to change the string 'hiveware_enkoder' to something unique. The reason being, if spam harvesters really wanted to, they could recognize that string, and have their own javascript engine handy run the script to get at the email address hidden inside. That's a lot of work, but not entirely impossible. If the Hiveware system gains many users, it might be worthwhile for them.
  
  --
  python -c "x='python -c %sx=%s; print x%%(chr(34),repr(x),chr(34))%s'; print x%(chr(34),repr(x),chr(34))"
Uhh... by babbage · 2003-10-02 13:59 · Score: 2, Informative

Quoth the original message...
What are others doing with this issue? We would prefer to preserve mailto link clickability, but also only want to make this adjustment once." One suggestion I would make is to put your email address in an image. People can read it, but harvesters won't be able to harvest it (unless they download the image for OCR)

Err, doesn't this exactly not meet the given criteria? The guy wants links to be clickable. If you hide the image, you can only get as far as, say:
<a href="mailto:foo@bar.com"> <img src="email_addy.png"> </a >

But that's just as easily harvestable as it would have been if you left the visible text as the plain address. What's the point?
It's the contents of the href attribute that need to be obscured, not the visible text (or image, or video clip, or whatever). You can't embed an image in the href text, so I don't see how this suggestion gains us anything at all.
---
The suggestion I like best is to encapsulate the address as HTML entities. Currently, this is enough to fend off the average address harvesting software, though if the practice catches on, I assume that the harvesters would start to take this into account -- at which point I don't know what the solution should be...
Barring that, it seems like the only way to provide an address will be to use literal text such as "write to us at foo at bar.com" and hope people just get it.
Alternatively, shy away from giving out your address, and provide a form where visitors can submit comments. This could allow you to filter out some of the incoming traffic (hint, if you're going to use "off the shelf" software for this, use NMS instead of Matt Wright's ancient Formmail.PL script, it's much safer). Avoiding any publication of email addresses might piss Jakob Nielsen off, but under the circumstances I think it's probably a reasonable approach to the situation -- it's way to easy for a public address to get abused...

--
DO NOT LEAVE IT IS NOT REAL
How I do it... by Pathwalker · 2003-10-02 14:03 · Score: 2, Informative

I've been looking at a couple of different techniques over the past year or so. They are closely tied into the Roxen Webserver, and probably won't work with Caudium, or any other webserver.

The first technique I used (described here) was a simple RXML macro, that defined a tag called <cloak>. It would check to see if the client was on a list of known robots. If the client was a robot, a graphic version of the email address would be returned. If the client looked like a normal browser, then the address would be entity encoded, and returned as a mailto link.

Shortly after I set that up, I realized that entity encoding was pretty much useless - that if a web browser can figure out the address, so can a spam bot.

My second attempt appears to be working well. I wrote a Roxen module called mailcloak which takes addresses, and replaces them with a graphic link to a dynamically generated form to send an email to that address.

As an example, the code <mailcloak> maileater@ofdoom.com</mailcloak> would be replaced with a graphical version of the address maileater@ofdoom.com and a link to this page.

It also has support for finding and cloaking bare addresses in pages, and I'll probably add support for rewriting mailto tags sometime in the next few weeks.
Not to give myself away by Anonymous Coward · 2003-10-02 14:07 · Score: 1, Informative

But I use a combination of things:

1. Images for the email text using php and some caching
2. ROT13 for the text to make the images (so the parts of the email address aren't as easily visible).
3. A mailto redirect, rather than a a mailto: url. The redirect goes to the mailto:, and works fine is most clients, except it leaves users with a blank page (which they can easily go back from).
4. I leave "honeypot" email addresses on everypage marked with the IP address and the time the page was viewed. Makes tracking down harvesters easiers, and gives me guaranteed spam for filters/spamcop/etc. I also use a secondary domain which I don't use for any regular email, so I can always drop it if I decide I'm sick of it.
Use a Form by Alethes · 2003-10-02 14:21 · Score: 2, Informative

I recommend that you use a form that does NOT have the user's email address in a hidden input. Just have the user's ID, then on the server, find the address based on that ID and send the message accordingly. I know you want to keep the mailto: link thing happening, but if you do that, harvesters will always find a way to decode whatever you're doing.
Re:Mail form by skookum · 2003-10-02 15:36 · Score: 3, Informative

That is only the case if you are running an ancient, brain dead copy of the original (Matt's Script Archive) formmail.pl. But you'd be a retard for doing that and deserve everything you get. Modern formmail scripts do not allow spam through.
Unicode by vitaflo · 2003-10-02 15:37 · Score: 2, Informative

I actually just use unicode for the @ symbol (@). It seems that most of the time the harvesters just read the HTML source, and don't actually render HTML entities or unicode. Thus the harvester will get user@example.com, a non valid address, but a user on your site will see user@example.com and the mailto: link will function normally.
Re:Hivelogic Enkoder by Yottabyte84 · 2003-10-02 17:52 · Score: 2, Informative

this script reqires a mail deamon that delivers user+anything@example.org to user@example.org. #!/usr/bin/perl -w use Socket; # Load socket functions use CGI qw(:standard); # Load CGI standard functions my $name = "harvestbait"; # yourname my $domain = "example.org"; # yourdomain.tld my $ipaddr = $ENV{'REMOTE_ADDR'}; # Get the requester's IP $ipaddr = unpack 'H*', inet_aton($ipaddr); # Convert the IP to hex my $date = `/bin/date +%H%M%m%d`; # Get a compact timestamp chomp($date); # Get rid of the newline char my $addr = $name."+".$ipaddr.$date."@".$domain; # Make email addy from bits print header, # Print HTTP header start_html(-meta=>{'robot'=>'noindex'}, # Print HTML document header -title=>'Send me an email!'), # Page title q(You can send me an email by clicking ), # Page content a({href=>"mailto:$addr"},"here"), # The time+ip tagged mailto: q(. No junk mail please! ^_^), # More content end_html; # End the HTML document
There is a simpler one by zhiwenchong · 2003-10-02 17:57 · Score: 2, Informative

This one doesn't use Javascript at all. And it's only 4k.
Obfusticated Email Link Creator

It does mixed dec and hex. Creates links like this. But check the underlying code....

It's a Tripod site, so don't /. it.....
unicode, base-64 encoded by ubiquitin · 2003-10-02 20:23 · Score: 2, Informative

I have a unicode converter that works really well. It will put your email address into a form like:

& # 105;& # 032;& # 100;& # 111;& # 032;& # 105;& # 116;& # 032;& # 116;& # 104;& # 105;& # 115;& # 032;& # 119;& # 097;& # 121;

For the past three years or so, the spammers haven't caught on to this, and they are unlikely to do so given the few people who take the effort to put this measure into place.

P.S. It's not just mailto links that are being harvested here. They'll scrape anything with an @ or a "at" or ...

--
http://tinyurl.com/4ny52
Re:Javascript mailto links... vulnerable? by Specialist2k · 2003-10-03 00:30 · Score: 4, Informative

There are e-mail harvesting bots which use the Microsoft HTML ActiveX control, so they can and will execute any JavaScript present on the page.
Wait... this provides some nice opportunities to cause them a major headache by including malicious JavaScript code on a page only seen by a bot not following the robots exclusion protocol (to prevent a "real" search engine spider from visiting the page) by linking to that page using some hidden link from your home page...
Re:Mail form by scrytch · 2003-10-03 02:35 · Score: 2, Informative

> I agree 100%. Either use something like formmail.pl, or write your own custom CGI program to handle emails

Ironic, that in order to stop spam to you, you would use the notoriously buggy and insecure formmail, turning your box into an open mail relay for spammers to use. Use a secure alternative (there's compatible versions, but really it's not hard to use MIME::Lite yourself). Matt has never fixed formmail to a satisfactory degree, and shows no inclination toward doing so.

If you roll your own, it'd probably still be more secure than formmail, as long as you don't allow it to take addressing information from the outside. Hardwire the configuration into the script, and break it out into a nonreadable config file if you have to. But don't use a "flexible" form mailer unless you know you've got it nailed down.

--
I've finally had it: until slashdot gets article moderation, I am not coming back.
Re:Fraid Not by Mr+Z · 2003-10-03 06:24 · Score: 2, Informative

"Beg the question" is a shortening of "beggaring the question"--ie. answering a question with the question itself. "Why don't parallel lines cross? Because lines that never cross are parallel!"

If you look at the definition for beggar, you'll see one of the definition "One who assumes in argument what he does not prove." (Source: Webster's Revised Unabridged Dictionary, (C) 1996, 1998 MICRA, Inc.) In fact, this meaning of beggar has survived as a submeaning of 'beg.' This link on dictionary.reference.com supports my point. Look at definitions 3a and 3b.

So, the parent poster to your post is quite correct. His statement was not a hypothesis, but rather closer to fact, based on accepted usage.

Granted, standard American usage seems to treat "beg the question" as a synonym for "raise the question", but that's a rather incorrect usage, IMHO.
--Joe

--
Program Intellivision!
Re:Plug by greenhide · 2003-10-03 09:40 · Score: 2, Informative

Hey, guess what.

I was able to use your form to send myself spam!

That's right.

I entered my e-mail address, a from address, and the mail went through.

Essentially, your web page is providing the equivalent of an open relay.

You need to remove the "mailto" field, as that allows the form to be used to send out an address to anybody. Once that's gone, your form should be secure again.

--
Karma: Chevy Kavalierma.