How are You Preventing Mailto-Link Harvesting?
mixwhit asks: "In our ever increasing effort against spam, we are now considering replacing all mailto: links on our website with something unharvestable (i.e. 'user (at) address', javascript mailto links, character entity evasion, etc.). Obviously this won't stop the spam, but it seems prudent to stop the harvesting so that the spam may slow down someday (year 2024 maybe?). What are others doing with this issue? We would prefer to preserve mailto link clickability, but also only want to make this adjustment once." One suggestion I would make is to put your email address in an image. People can read it, but harvesters won't be able to harvest it (unless they download the image for OCR), but any barrier you can place in front of the spammer, without blocking people honestly interested in communicating with you, is probably a good thing.
People fighting for those who have difficulty seeing have been complaining about the sites that have a person type a number displayed in an image to verify that they're not a bot. They say it causes undue hardship on sight impaired folks. That may not be a legal fight your company would like to enter.
I can see both sides of this. Can't say I know where to stand though.
Yep, I never spell check.
More incorrect spellings can be found he
No kidding. Comcast gives us seven email addresses, so I set one up for each of us. My three month old gets spam, and nobody has EVER used that account (except me sending a test email when I first set it up). These scum just take a brute-force approach to generating email addresses, and don't care how many are undeliverable. They come with opt-out buttons, but all those do is confirm they found a valid address, and they never send from the same address twice, so adding them to a filter list doesn't work either. Bayesian filters on the content is the only way to go.
If all this should have a reason, we would be the last to know.
Alternatively, to keep it transparently usable by end-users, you can just do like this:
<a href="false@false.com" onmouseover="var a = 'in.com'; this.href = 'real@doma'+a;">email me</a>.
As soon as any reasonable number of people start using the same scheme (and particularly if it's a mailto: designed to still be machine-readable) someone will take the time to harvest that kind of obfuscated address. It's just a matter of the cost/benefit ratio being high enough to make it worthwile.
I think you're right as more websites use automated obfuscation; then the spammers need to decode it to get to their victims. But as long as most websites aren't doing what I'm doing, I know they don't want to target the techies.
Here's another POV, though -- I'm considering the *other* cost/benefits ratio. I want my users to be able to easily email me, and giving them a simple mailto: link is the best way to do that. We'll have to wait and see.
Right now, it seems to be costing nothing, since I'm only getting spammed on the standard "guessed" names at my domains, like "sales@" and "webmaster@". But 5 spams a day would still be worth the trouble.
If the bots do start to really catch up (they may... I'm hoping enforced laws will start to catch up over the next few years!), at some point I might move on to the next-least-inconvenient masking method, which is probably randomized JavaScript masking. I.e., the mailto: link is generated by custom JavaScript that builds the address across a few lines of code. That would prevent users w/o JavaScript from using the link, though, which is a cost I want to avoid.
There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
You can't embed an image in the href text, so I don't see how this suggestion gains us anything at all.
Actually, you can.
data URL examples
Sick, eh?
I recommend the above method plus:
1) Randomize the variable names for u, d, t, and a
2) Randomize the position of var XX = XX statements.
This will reduce simple regex replacements if you site is big enough with enough emails that someone would want to create a simple reg mod to harvest it.
In my mail server I redirect the random addresses to a single e-mail. Then when I get spammed, I can trace it back to an IP, and contact the hosting company or ISP that it originated from.
Visit blue.aginet.com for my other GPL'd code. Feel free to use the source code in this example. I only ask that you give me credit if its used for a commercial purpose.
Scott Wolf Senior Software Engineer Slingpage
I have a 1 pixel transparent gif link at the very top of my page that links to /guestbook/jackhole. In my robots.txt file I have "User-agent: * Disallow: /jackhole/ Disallow: /jackhole/guestbook/". When a harvester traverses this link their IP is added to a text file via a php script that I wrote and they immediately get a 403 page.
Each page of my site checks against this text file so the mailbot gets a 403 page for almost all pages/sites that I host. To deal with false positives there is a mailto link on the 403 page that goes to a TMDA address. At the very least it saves me bandwidth.
I agree 100%. Either use something like formmail.pl, or write your own custom CGI program to handle emails. It is trivial to write a mail form, and users who wish to contact you will be at your website anyways, so why make them read the address and fire up their mail client? Hell, depending on your site (if you have user registrations), you could even use a database-driven email system, and eliminate spam entirely. Just let the user fill out the form, store the message in the database, and when you reply, they should be able to view messages sent to them the next time they log in to your site. You won't get spam, since you aren't using SMTP, but you still have a good (and probably better, since it is more reliable) system of communicating with your customers.
--That's the point of being root, you can do anything you want, even if it's stupid.
Notice that there is no word, "dot", in there? That's because most people should already be able to figure it out on their own. If they can't then they shouldn't be using your time.
Also, did you notice that I used, "@", twice? That's because I use it as a part of my regular vocabulary. It consists of the same number of keystrokes, yet I end up filling the Internet with more "@" symbols, thus making it harder to find the real addresses.
Spammers could try to figure out what is an email address by searching for the top level domain names, but I'm sure that that will be harder to find as people begin to smarten up & start using much more casual domain names. Maybe they'll use the regular domain names & split it up into 2 sentences. For example, "You can contact me @ my work email address. The user name is blahblah. The company name is bizbaz.". From there, it shouldn't be too hard to figure out.
I hope that helps someone.
testing out my trending skills
Also, don't munge.
Help us build a better map!
Faced with this problem I ended up writing my own email-address encoder that has proved quite popular with friends. Whilst not as sophisticated as some, it works pretty well and will generate both HTML and JavaScript links via simple web-form. Try it out at www.diplo.co.uk/encode/. (Obviously, all email addresses' entered into this are sold on :p )
this provides some nice opportunities to cause them a major headache by including malicious JavaScript code on a page only seen by a bot not following the robots exclusion protocol
/. math freaks: yes I know there's no set called Very Large Integers. It's a joke. Laugh.)
A lot of people do that with a malicious honeypot page. It just outputs X phony, but real-looking, mailto links, where X is a member of the set of Very Large Integers.
(note to
I am disrespectful to dirt! Can you see that I am serious?!