How are You Preventing Mailto-Link Harvesting?
mixwhit asks: "In our ever increasing effort against spam, we are now considering replacing all mailto: links on our website with something unharvestable (i.e. 'user (at) address', javascript mailto links, character entity evasion, etc.). Obviously this won't stop the spam, but it seems prudent to stop the harvesting so that the spam may slow down someday (year 2024 maybe?). What are others doing with this issue? We would prefer to preserve mailto link clickability, but also only want to make this adjustment once." One suggestion I would make is to put your email address in an image. People can read it, but harvesters won't be able to harvest it (unless they download the image for OCR), but any barrier you can place in front of the spammer, without blocking people honestly interested in communicating with you, is probably a good thing.
See signature for a form mailer that uses mysql to lookup email addresses based on english names :)
Just use a mail form instead of mailto: links. Once you reply to feedback mail, the sender has your address and you can correspond normally. Meanwhile, evil spambots can't harvest an address that isn't shown anywhere.
Vista:XPSP2::ME:98SE
People fighting for those who have difficulty seeing have been complaining about the sites that have a person type a number displayed in an image to verify that they're not a bot. They say it causes undue hardship on sight impaired folks. That may not be a legal fight your company would like to enter.
I can see both sides of this. Can't say I know where to stand though.
Yep, I never spell check.
More incorrect spellings can be found he
What makes you think "user at mail dot foo dot com" is unharvestable? The web archives of all the development mailing lists at gcc.gnu.org use that scheme, and we still get spam to unique addresses used only for sending mail to those lists.
It's a handy technique, and useful, but it's certainly not foolproof.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Any method of munging the address must still be clickable within the visitor's browser. If it is clickable, it can be harvested. Javascript and html encoding may stop most of the bots, but bots exist that can slurp the address no matter how much javascript you wrap it in.
I use a PHP email form that never sends the address to the to client accessing it. Short of hacking the server and looking at the php script in plain text, there is no way to harvest the address. I have no need to let the public know my address. If they want to email me, use the form or use my site's message board.
I don't want the guy getting slashdotted, so I won't link his site. If you really want the script I use (available in PHP or ASP), go to hotscripts.com and search for dbmaster's mail form.
Only on
I just clicked on the background (or so I thought), but apparently I clicked on an image http://ads.osdn.com/?ad_id=827&alloc_id=1959&site_ id=1&request_id=330735&1065145900624
which was an ad. New kind of annoyance? Maybe Slashdot may want to remove such ads?
If you're religishitty, KILL YOURSELF!
<script> ; ; ; ; ;
<!--
var u = "sales"
var d = "example"
var t = "com"
var a = u + '@' + d + '.' + t
document.write('<a href="mailto:'+a+'">'+a+'</a>')
//-->
</script>
Just use this. Life is good, eh?
Just Unicode the email addresses, not unharvestable, but it makes it slightly more difficult. No change to user functionality.
Meanwhile, I'm keeping an eye out for the next technology to replace email. IM was promising about five years ago, but went to hell faster than email.
Quoth the original message...
Err, doesn't this exactly not meet the given criteria? The guy wants links to be clickable. If you hide the image, you can only get as far as, say:
But that's just as easily harvestable as it would have been if you left the visible text as the plain address. What's the point?
It's the contents of the href attribute that need to be obscured, not the visible text (or image, or video clip, or whatever). You can't embed an image in the href text, so I don't see how this suggestion gains us anything at all.
---
The suggestion I like best is to encapsulate the address as HTML entities. Currently, this is enough to fend off the average address harvesting software, though if the practice catches on, I assume that the harvesters would start to take this into account -- at which point I don't know what the solution should be...
Barring that, it seems like the only way to provide an address will be to use literal text such as "write to us at foo at bar.com" and hope people just get it.
Alternatively, shy away from giving out your address, and provide a form where visitors can submit comments. This could allow you to filter out some of the incoming traffic (hint, if you're going to use "off the shelf" software for this, use NMS instead of Matt Wright's ancient Formmail.PL script, it's much safer). Avoiding any publication of email addresses might piss Jakob Nielsen off, but under the circumstances I think it's probably a reasonable approach to the situation -- it's way to easy for a public address to get abused...
DO NOT LEAVE IT IS NOT REAL
They already have your email address. They'll get your new one when you post to newsgroups, to mailing lists, when your virus-infested friends spew it around the net, and when you register software. Focus on solving the problem (by developing anti-spam software, by lobbying for laws, or by shooting spammers), rather than on trying to find new ways to hide.
I put my e-mail address in the middle of a giant Goatse graphic. It's worked pretty well so far: I've gotten no e-mail.
I've been looking at a couple of different techniques over the past year or so. They are closely tied into the Roxen Webserver, and probably won't work with Caudium, or any other webserver.
The first technique I used (described here) was a simple RXML macro, that defined a tag called <cloak>. It would check to see if the client was on a list of known robots. If the client was a robot, a graphic version of the email address would be returned. If the client looked like a normal browser, then the address would be entity encoded, and returned as a mailto link.
Shortly after I set that up, I realized that entity encoding was pretty much useless - that if a web browser can figure out the address, so can a spam bot.
My second attempt appears to be working well. I wrote a Roxen module called mailcloak which takes addresses, and replaces them with a graphic link to a dynamically generated form to send an email to that address.
As an example, the code <mailcloak> maileater@ofdoom.com</mailcloak> would be replaced with a graphical version of the address maileater@ofdoom.com and a link to this page.
It also has support for finding and cloaking bare addresses in pages, and I'll probably add support for rewriting mailto tags sometime in the next few weeks.
You have to consider the trade-off of the inconvenience of your readers/customers with the amount of spam you get.
I have a few websites with my email address all over them, in mailto links. I "mask" the email very lightly, by escaping most of the characters, and it has worked beautifully.
Here is a webpage that will quickly convert your mailto link into a form that bots will miss.
Could a bot be written that would be able to harvest these email messages? YES. But would it be worth the spammer's time to code it? NO, so it probably won't happen.
Put yourself in the spammer's shoes (or slime-covered bedroom slippers). Why would you want to go to a lot of work to build a bot that will harvest the email addresses of the very people you don't want to get your spam, because they will report you to spamcop, harass your ISP, and even hack your computer and post some very unattractive pictures of you on the internet?
No, they want the chumps, and they want to find them without needing to check every webpage for dozens of patterns.
There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
or maybe you could have a non-clickable email link that is just an image. I believe that is what poster was referring to.or if you really wanted to have it clickable, have it look like this
/>
<a href="wewillnevergethere.html" onclick="alert('myreal' + 'addy@site.com'); return false;">
<img src="pictureofemailaddy.png"
</a>
See it works. Note, it is important to concatenate the email address as i'm willing to bet mailto harvesters don't parse it out as being javascript. The extra obfuscation is necessary. (apologies for any javascript mistakes, i suck at javascript).
Photos.
But I use a combination of things:
1. Images for the email text using php and some caching
2. ROT13 for the text to make the images (so the parts of the email address aren't as easily visible).
3. A mailto redirect, rather than a a mailto: url. The redirect goes to the mailto:, and works fine is most clients, except it leaves users with a blank page (which they can easily go back from).
4. I leave "honeypot" email addresses on everypage marked with the IP address and the time the page was viewed. Makes tracking down harvesters easiers, and gives me guaranteed spam for filters/spamcop/etc. I also use a secondary domain which I don't use for any regular email, so I can always drop it if I decide I'm sick of it.
harvesters don't take the time to decode
Email Encoder
I recommend that you use a form that does NOT have the user's email address in a hidden input. Just have the user's ID, then on the server, find the address based on that ID and send the message accordingly. I know you want to keep the mailto: link thing happening, but if you do that, harvesters will always find a way to decode whatever you're doing.
Use an email address on your website that you don't use anywhere else. If you do start to collect spam there, change to a different email address.
Might be interesting to try encoding the month and year into the email address, and change the address each month. That way you could get some measurements of how much those addresses are being harvested for spam. Who knows, maybe you'd find out October is a big spam harvesting month, when you get deluged with spam to me-oct2003@blahblahblah.com over Thanksgiving break...
Just a thought.
I use some variant on this encoder from Hivelogic, where the whole address is encoded into javascript, which needs to be executed to decode any part of the name.
The downside is that javascript is necessary to read any portion of my email address, and it only works if spambots refuse to execute arbitrary javascript. But in a year of use, I haven't had any problems with it, and my primary email address is remarkably spam-free. Nothing the spam filters can't handle anyway.
In message forums, etc, I just don't use my email address, ever. My name is easily Googled if someone really wants to contact me.
On my website I have a homebrew solution that took me about 15 minutes of time to implement.
.courier files (.qmail where qmail is installed works similarily, check the docs, I'm sure there are other servers with similiar schemes) and execute a script for every e-mail to the catchall address, which parses the headers, adds the ip of the mail server that sent the mail to my server to a blacklist, and then adds the ip address of the bot that scraped my site to a deny list in an .htaccess file. The scripts ends with an error code (in this case 69) that bounces the mail (thus in many cases removing the address from the list anyways)
I change @ to [at] in all e-mail addresses.
Then I have a catch all address something-whateverhere@domain and I have a php script on every page that creates a hidden mailto: link that changes thier IP address to hex and includes it in the mailto: link... i.e. a visitor from 10.0.0.1 would have something-A001@domain put in the mailto.
That's only the first step. The second is to make use of (on my server) the
Then all mail I recieve I run through another script that checks the above mentioned blacklist and if it matches, bounces that mail as well.
Since I've implemented this scheme, I haven't gotten a single scrapped spam e-mail. In fact, along with my other spam protection methods which are easy to implement as well (using a unique address for every site I use an e-mail on, blocking and boycotting those that spam me; blacklisting servers that spam me [I run my own personal rbl]), I get no more then 5-10 spams a month - and this is all the e-mail addresses I use *combined* (about 5 of them) -- and I've had most of the addresses for several years.
If there is enough demand, I may tar up the relevant files and make it available online.
----
Emailing me...
Unfortunately due to "spam" I can't put my email address on the web without "email havesting programs" finding my email address and sending me unsolicited email. However you can probably work out what my email address is...
If you can guess what my email address is, feel free to email me. Most computer programs won't be clever enough to work it out, however I hope you are.
----
(names have been changed to protect the innocent/guilty)
Seems to work well, and keeps visitors to my site amused. But would not work so well on a large site.
Elivs
I suspect you're using an ad-blocking browser or proxy, which has blocked the image itself but has left a large (clickable) white space that would be the image if you hadn't blocked it. That's the behavior Firebird shows for me, blocking ads.osdn.com. If you're using Mozilla or Firebird, and you right-click on the "background" I think you'll find "block images from this server" or "block images from ads.osdn.com" checked.
* And remember, it's spelled N-e-t-s-c-a-p-e, but it's pronounced "Mozilla."
<a href="/x.cgi/mailto:abuse@localhost">mail me</a>
And then had x.cgi be a PERL script that generated an HTTP "Location" header to the real mailto: URL.
If I wanted more complexity, I'd substitute in whatever I felt like for the @ in the address, and have the PERL script un-do that. It's probably also doable in PHP, shells, TCL, or whatever. I like to leave something resembling a "real" address in the HREF, so the most clueless harvesters can grab it.
As soon as any reasonable number of people start using the same scheme (and particularly if it's a mailto: designed to still be machine-readable) someone will take the time to harvest that kind of obfuscated address. It's just a matter of the cost/benefit ratio being high enough to make it worthwile.
I think you're right as more websites use automated obfuscation; then the spammers need to decode it to get to their victims. But as long as most websites aren't doing what I'm doing, I know they don't want to target the techies.
Here's another POV, though -- I'm considering the *other* cost/benefits ratio. I want my users to be able to easily email me, and giving them a simple mailto: link is the best way to do that. We'll have to wait and see.
Right now, it seems to be costing nothing, since I'm only getting spammed on the standard "guessed" names at my domains, like "sales@" and "webmaster@". But 5 spams a day would still be worth the trouble.
If the bots do start to really catch up (they may... I'm hoping enforced laws will start to catch up over the next few years!), at some point I might move on to the next-least-inconvenient masking method, which is probably randomized JavaScript masking. I.e., the mailto: link is generated by custom JavaScript that builds the address across a few lines of code. That would prevent users w/o JavaScript from using the link, though, which is a cost I want to avoid.
There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
I've only seen flash used for spamproof mailtos on one or two sites, but I think it's a pretty good idea as long as all of your users have the Flash player. Just make a little .swf of clickable text linking to the mailto: you want. You probably can even have them dynamically generated if you have a lot of different address across your site. PHP, for example, can do this with its built-in Flash functions.
My php based site has a form that allows people to email me. They never get my email address until I reply to them.
My previous site was only allowed [X]HTML, no PHP/ASP. To combat harvesters, I had in my XHTML:
Then, in an embedded JavaScript file (email.js) I had:How would this extend to multiple authors on a site? I would give each author a samlple link with javascript:emailAuthor("firstname lastname)". The JavaScript file would then need an array to find the corresponding address and change the document.location.href.
I actually just use unicode for the @ symbol (@). It seems that most of the time the harvesters just read the HTML source, and don't actually render HTML entities or unicode. Thus the harvester will get user@example.com, a non valid address, but a user on your site will see user@example.com and the mailto: link will function normally.
Great way of handling it. I'd still use an image or spam armored text for the actual address though. Not all browsers have JS turned on. But killer way to implement it. I knew my JScript knowledge was substandard.
Photos.
One of my colleagues came up with the following the other day:
If you put your email address in a table with the border set to '0' cell-padding and cell-spacing also set to '0', then it will still be readable by humans. But, the code to create the table will obfuscate the address enough that it won't be harvestable.
How are You Preventing Mailto-Link Harvesting? I'm not. I just put up my address on the website and started manually cleaning 40 emails daily. Life was good until I started bothering this guy on eBay to send me my ATM switch 3 months after I paid for it. The day after I threatened him with legal action, and ever since, Ive been receiving 1200+ Microsoft subscription-type spam daily. Short story that particular address has been shut down permanently thus I'm losing possibly good traffic to me. All of a sudden, I'm interested in Bayesian filtering and legal action against spammers. Face it. Spam is a bigger problem than the small speedbreakers in its path (like riding a motocross bike at high speed over a speedbreaker on a flat road), we all will continue to get irritated until some kind of social, legal or technical revolution fixes things for a while.
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
Webmaster //The function will append "@domain.com" to the name provided and return a "mailto:" type link.
function Email(name)
{
var EmailLink = "mailto:" + name + "@domain.com";
parent.location=EmailLink;
} //-->
email me here
I don't make predictions, and I never will.
In my mail server I redirect the random addresses to a single e-mail. Then when I get spammed, I can trace it back to an IP, and contact the hosting company or ISP that it originated from.
Visit blue.aginet.com for my other GPL'd code. Feel free to use the source code in this example. I only ask that you give me credit if its used for a commercial purpose.
Scott Wolf Senior Software Engineer Slingpage
I have a 1 pixel transparent gif link at the very top of my page that links to /guestbook/jackhole. In my robots.txt file I have "User-agent: * Disallow: /jackhole/ Disallow: /jackhole/guestbook/". When a harvester traverses this link their IP is added to a text file via a php script that I wrote and they immediately get a 403 page.
Each page of my site checks against this text file so the mailbot gets a 403 page for almost all pages/sites that I host. To deal with false positives there is a mailto link on the 403 page that goes to a TMDA address. At the very least it saves me bandwidth.
a href="mailto:joeblow@[10.0.0.1]"
Substituing your IP address, of course. Maybe spam harvesting bots would fail to treat that as a valid address.
On another note, this is a CGI thing that looks interesting: Master Spambot Buster.
Set up a CGI server that knows people's email addresses. Anytime you want to post an address, use the cgi which does a mailto:yellowpages@yourdomain.com
Subject: Please send me the email address for #TAG#
Which yellowpages@yourdomain.com answers (with the address in the body).
Maybe that's too much work, though...
I leave my address out in the open and use a spam filter...
I have a pretty simple scheme. I never, ever put my first name in a mailto: link. I use my first initial, and then my last name, usually. That way, the spambots never get my first name. Makes life a LOT easier for my spam filter (Latent Semantic Analysis, in OS X Mail.app) to do its job.
;-)
S.
I'm one of the sysadmins for a CS department in a liberal arts college. I've been working with the web content admins off and on for a couple months as they prepare a system that will execute a Perl script to generate an image that will replace the e-mail address. The project is still in its infancy, but here's the URL to the description, and here's the URL to the current version of the project, in gzip'd tarball format.
This one doesn't use Javascript at all. And it's only 4k.
/. it.....
Obfusticated Email Link Creator
It does mixed dec and hex. Creates links like this. But check the underlying code....
It's a Tripod site, so don't
For simplicity, an image is probably best. Heck, with PHP (and probably other web languages) you could probably hack up some code to automatically create the image for you (more useful if you have a large number of addresses to display).
For folks who won't be able to handle the images, you could put some human decipherable text in the "ALT" or Title text of the image- e.g. jim@_REMOVE_ALL_OF_THIS_23421232_me.com.
I definately use javascript, never the same scheme twice. I'd suggest scrambling the emails into randomly sized segments (you can do this once per email, copy the contents into a DB or the actual page) for JS to put back together.
Does anyone see a vulnerability with this? I know anything can be hacked, given enough time, but I really don't see how a spammer would get around the simple javascript thing (besides executing all scripts on all pages). Any Ideas how it could be abused?
One suggestion I would make is to put your email address in an image. People can read it, but harvesters won't be able to harvest it (unless they download the image for OCR)...
I've tried it, and it doesn't work. I used to put my email address on my web page in this format:
my_address at domain dot com
I receive approximately 20 spam a day. Not bad, huh?
20 a day is still annoying. I've wanted to reduce it, so I converted the email into a gif, I call it em.gif (so no one would imagine it is an email address just by looking at the code), also in the my_address at domain dot com format. A few days later, I begin to receive 80 spam a day.
There is no effective deffense. Any technology (short of smoke signals) will be beaten.
What have we done in the US with the "no call list" but to have made the US government the distributor (free of charge) of a totally "clean" DB of all names and numbers of citizens who are perfect targets of telephone solicitations.
The very idea that the FTC will step in and act where the FCC won't (in the face of two STATUTES struck down by fed ct judges) is absurd.
We just gave the industry a nice clean list.
There are two ways to avoid this problem: eliminate all technology (defaulting to bearskin rugs and stone knives); or, ELIMINATE the PROBLEM.
I favor the second option. SUE THE BASTARDS over and over and over and over. Maybe we could tell the "Right-to-lifer's" that they are good target practice?
The world is nuts: email is way over 50% garbage and the ramp-up has been only a year or so in the making. Hurt these fools now.
I don't advise harming anybody, physically, but the effect of posting these bastards pictures in every grocery store and coffee shop could expose them to a "public shunning" unlike anything since the Puritans.
We could also pass a law barring all medical care for spammers, their family and friends. How about no police response to their calls? No power to their homes? No telco connections? No wood in the winter? How about not taking their checks or cash? Revoking their driving rights? Barring them from holding office or making political donations?
What about we simply deny them the right to BUY food? If they're out hunting for food they can't spam....
In a generation we could select for no spammers.....
already have a lot of trouble with that picture-of-the-email-address thing. it is a neat solution but it lacks portability, to state it another way.
-- There are two kind of sysadmins: Paranoids and Losers. (adapted from D. Bach)
Security through obscurity is always a bad idea.
The trick is finding the right combination of tools to automatically reduce your spam to managable levels. If I get just one or two pieces of spam a day, I'm happy.
On my London Blog I don't use any form of obfuscation. The reason for this is I want people to contact me about my writing. I want to know what people think, and barrier I put in the way will reduce the number of legitimate emails I get. I'm not confident that most of the Internet population would understand that they need to remove the REVOVE.THIS.TO.EMAIL.ME part of my address.
Sure, I drastically increase the number of spams I get, but popfile takes care of them all.
If you're running a mail server with, say, 250 people doing what you're doing, the spam connections might very well denial-of-service your mail server, assuming it is running on a typical T1. The expense of bandwidth, not to mention the inconvenience of less-reliable mail, is what leads sysadmins and the companies who employ them to take email obfuscation seriously.
http://tinyurl.com/4ny52
I have a unicode converter that works really well. It will put your email address into a form like:
...
& # 105;& # 032;& # 100;& # 111;& # 032;& # 105;& # 116;& # 032;& # 116;& # 104;& # 105;& # 115;& # 032;& # 119;& # 097;& # 121;
For the past three years or so, the spammers haven't caught on to this, and they are unlikely to do so given the few people who take the effort to put this measure into place.
P.S. It's not just mailto links that are being harvested here. They'll scrape anything with an @ or a "at" or
http://tinyurl.com/4ny52
I've had email sent to me via the address I posted here and /. auto obfuscates it in various ways.
It is retarded to think that "fred at sheila dot com" won't get converted.
Once one has written one's harvester, it is prudent for one to inspect the results and tweak it.
It's for profit not fun! If it is possible to increaes the yield in *any* measure it will be done by someone somwhere.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
The problem with most encoders is their use of JavaScript, which still isn't universally implemented across all browsers or used by all Web surfers.
The problem with the non-JS-based encoders is that, well, they're based on a simplistic encoding method. Anything you can use your computer to easily encode can be just as easily decoded by a similar program. (We're talking encoding, not encryption.) So in theory, a well-written scraperbot can simply de-ASCII-fy any numerical entities it runs into (the common method of encoding without resorting to JavaScript) and then scrape the address in clear text.
The problem with e-mail forms is that they're a pain in the arse, and people like me who keep archives of all incoming and outgoing e-mail are rather disinclined to use them.
Right now I use solution #2, because spammers don't seem to be writing smart enough scraperbots (yet) to justify moving to either #1 or #3. Instead of moving to either of those, I'll probably end up using a CGI-based solution that does some mucking about with HTTP headers. It combines the absolute unscrapeability and universal compatibility of a form with the ease of use of an encoded address.
p
In Korea, long hair is for old people!
they spam :m ain
info@yourdomain
sales@yourdomain
help@yourdo
webmaster@yourdomain
postmaster@yourdomain
etc.etc.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
I don't know why a people suggest JavaScript tricks to hide their addresses.
In the end , the browser renders it as text which can be selected and copied. what makes you think that mail harvesters will not render the webpage as well before searching for addresses?
Also, don't munge.
Help us build a better map!
Remember that putting email addresses in pictures meens that only people who can read the picture can email you. This excludes anyone who has their computer read the screen for them from contacting you. It is far better to have a contact form on your site that emails you - it will still hide the email address. If you have a domain and want to use a mailto then you can simply change the contact email address when it gets too drowned in SPAM and either bounce messages or simply delete them using your favourite mail processor.
If you want to not only filter out SPAM easily but also track who sold out then get a domain and when you sign up for something that requires an email address then put an ID in the address you use, for example user-site_url@domain.com - this way you can filter out spam from just one source and also stop reading (and thus supporting) junk mail from the offending web site. You can even sue the pants off the web site if the law permits and you really care that much.
I know there are a lot of these dynamic websites out that generate random emails and links to infinitely more pages with more emails and links... Does anyone have any evidence of these things actually working? Does anyone have any record of a spider getting caught in one of these?
Get revenge: Unsolicited Commando
The nine domains for whom my email is the catch-all address receive an average of a hundred spams a day, but I don't see them, thanks to a Bayesian filter.
Any spammer who harvests the email address in my sig just registers their latest spam so that I (and the dozen-odd other people who use the same filter) are that much less likely to see it.
The Web is like Usenet, but
the elephants are untrained.
I've seen a number of varying responses to this question and, quite frankly, none of them work. Here's a short list:
The OP wants to retain link-clickability. The only method above that would do this is the js method. But where it fails, it fails horribly.
Why is anything anything?
Cliff's suggestion of using an image for the email address doesn't take into account that not every visitor to your site is necessarily sighted. This is a bad, bad, BAD idea. Preventing mailto: harvesting by excluding people with visual impairments is not the way to go.
The best method is to use a mailto form that allows you to receive the message but doesn't give away your address. That way you leave your site open and accessible to all users, but can protect your email address.
.. try something like this instead:
http://www.wildpuma.com/steve/
See my address up there? Yup, I'm not letting a few scumbags reduce my ability to use the Internet. I filter so much that I barely notice any more.
Of course, I take more care with other people's addresses; using mailto forms, intra-site private messaging systems, one-time-only addresses, that sort of thing. I also wrote a bit of PHP to munge email addresses (phps/php), but I don't actually use it.
You XHTML users better not be using these JS "solutions" which use document.write() by the way (that's HTML DOM, not XML DOM).. in fact, the entire idea of putting content in JS is so braindead you shouldn't be doing it anyway.
I tackled this problem a year ago because I feared that addresses were being harvested off of our site. The webmaster address was on the front page and got the most traffic, but even those that were buried were affected (looking through the logs showed some artcompendium.com browser, likely a bot harvesting addresses... it hit every page).
I couldn't use images because we have rules regarding usability.
I decided to use forms and server-side scripting. You can do it with PHP or ASP and it doesn't reveal the address to the world, but it allows any browser that handles forms to send e-mail. I also captured the IP address and placed redundant checks in the code to ensure that mail would ONLY go to a single address within our organization. The last thing I wanted to do was open the door to spam abuse through our site.
I provide two options.
1) I have a mail form. It will only send to one mail address, it's not anything like formmail.pl.
2) I generate a unique email address with the IP address and time encoded in it. I actually could use spamgourmet to do this, but I've been doing things by hand because I want to collect some observations about how far a single address travels.
Gentoo Sucks
Assuming spammers read slashdot, this story is the perfect way to find common methods to hide from the spambots and learn how to circumvent them. Wonder if this is helping the spammers or the people being spammed more.
One suggestion I would make is to put your email address in an image. People can read it
Unless they're blind! Yeah, yeah, no one cares about the blind, you insensitive clods.
As a domain name owner, I have found that our basic "webmaster@example.com" doesn't get a huge quantity of spam. Perhaps the spammers recognize that as a corporate entity or something, because it's not so bad.
But it mutates: aster@example.com, r@example.com, bob37, jenna624, etc. etc. Most of the spam we receive isn't to one of our known addresses. But we don't want to lock down all but a few (sales@, help@, webmaster@, orders@, myname@, hername@) so that we can help the poor sods who misspell "orders" (it happens).
So I've put filters on "aster" so far. Last sunday, though, we got 500 near-identical spams (some loan scam) addressed to about 50 different names @www.example.com. I was developing a filter while my wife was manually deleting them... and then they stopped alogether, without my taking action. Some spambot burped, I guess.
Design for Use, not Construction!
It doesn't beg the question. "Begging the question" is making a logical argument that depends on the assumption of that argument's truth as a pillar of that argument.
Instead of "begging the question" it just "makes you want to ask".
Just to note, I realize OCR could be used to get around this, but I've only a few addresses anyway. In the future, I may generate "messy" images that are difficult to process with OCR.
example: http://mhs1994.com/listing/
I hate websites that have no real e-mail address to contact. Yesterday, I was trying to contact anyone at the FBI that was involved in UCR and NIBRS. I couldn't find a single e-mail address anyone. They had mailing addresses. I don't want to send a letter. I want to make a simple e-mail inquiry. It may be nice for others, but I hate that approach. I hate forms, because you never know where that message went off to. I like having a person to contact. Would you like trying to do business with someone without their e-mail address?
problem.
The issue isn't with the emails getting harvested. The issue is with a global infrastructure that uses an old policy of sending emails.
We need a new improved protocol that does a level of authentication at the host/isp level to say, this is a legitemate server with an emal from an acceptable user. All isps should be held up to a spam policy enforcement where if a user violates the policy, are automatically terminated, and their name, with evidence provided of course, is sent to a spammer list system to keep track of these individuals. If a spammer bounces around to a new isp, the isp can check the system and see if this person is listed.
This would require all ISPs to verify the user signing up for the account, and placing strict penalties on the user such as fines. If the right pieces are in place, we can see a substantial amount of email gone in a few years.
I simply create a PNG, JPG, or GIF with a picture of my email address. No they can't copy-paste it, but you'd have to be a really dedicated address-farmer to automatically harvest that.
Well that ain't going to happen anytime soon. And it's not a watertight solution anyway.
Believe it or not, this actually works. These days most harvester programs still don't read Unicode. Once I started doing this, I saw a great reduction in spam. It won't work forever, of course -- eventually the spambots will read Unicode, and the game will be over for this technique. But in the meantime, it's easy enough to do a search and replace of every "@" symbol.
If you want to convert your whole address, E-cloaker is a neat little free program for converting text to Unicode.
I propose the following (somewhat complicated) software solution for generating automatically-expiring email addresses:
On the web server:
Generate all email addresses on the site dynamically, using something like:
Replace (prefix) with a unique meaningful string.
Replace (timestamp) with the UNIX timestamp (the number of seconds since 1970-01-01 at midnight GMT) at which the email address was generated (the page was served).
Replace (ctr) with a unique identifier for the address generation. (The first address should use 1, the second address should use 2, and so on.) This will make the generated address unique in case the timestamp itself is not.
Come up with a password, and replace (hash) with:
On the mail server (or perhaps at the client):
Send all email to (prefix)-*@domain to an automated utility. That utility would be configured with the same password as the web server. For the recipient address in each incoming message, it would check:
If the address passes all of these tests, and if (based on the timestamp in the address) it was generated recently, treat it as valid. If not, treat it as spam.
This won't stop a harvester finding an address and immediately sending spam to it, but it will limit the length of time for which the address is valid.
This also may be difficult to validate if the address is BCC'd, but that in itself could be an indicator of spam.
Depending on the web server's volume of traffic and your caching techniques, it may or may not be desirable or feasible to have the server re-genereate these addresses for each page request. If it is feasible, then you have the added benefit of each user getting a different address. Once that address has been spammed, you could later block that specific address. If it is not feasible, you still have automatically expiring email addresses.
Note that I have not tried or tested this approach, and there may be caveats I can't think of. Caveats that I can think of include:
If I had the time, I would start an open source project for this. But I don't have the time, so I hope someone else has the time and inclination to do so.
Secondly, anything you do to obscure a user's email address will eventually be able to be harvested (without human intervention). It's just a fact in the battle against spam. I highly suggest you use graphics to your advantage. Create small images for all letters; numbers; permissable punctuation such as a period, dash, plus sign (for plus notation), and underscores if you permit userids with that (I mention this one because you might use them even though they violate RFC 2821 IIRC). Now put these in a web directory and have users create their email addresses using IMG SRC links to each of the individual characters. You could spell out your domain in a single image or you could have the users spell it out with multiple images. This could easily be scripted. One thing to note is that this will most likely violate ADA requirements. It's possible that using the ALT text in each IMG for that letter could get you around this. It also gives the spammers something to harvest though. Basically ADA requirements directly interfere with spam fighting in this case.
There are other tricks you can use such as using the HTML encoding for each of the userid characters to hide them. A web browser would decode this just fine but viewing the source will only show the code. Don't provide mailto: links, period. Consider using a list of names and their external userid on a page. Simply state that all the userids below should have @yourdomain.com appended to the end of them in a MUA. This is effective but slightly cludgish. IF you provide the userids ANYWHERE make absolutely certain that you provide an external userid such as firstname.lastname@yourcompany.com which redirects to vanityname@yourcompany.com internally. You should never give away your users real userids. You need to make certain that the users MUA or your MTA transposes the internal to external addresses to hide the internal addresses. You users need to understand this as well. Mail they forward absolutely can not contain information about internal addresses. Your users should always use the external addressing, even to mail their buddy in the next cube. Your users need to know why you're doing this and they need to be educated on the many ways this information can leak out unintentionally. Basically you need a good security policy.
Back when the 'net was young, and there was hope for stopping spam before it snowballed out of proportion, it was hoped that this naive "nip it in the bud" attitude might work. It hasn't. Spammers have proven as resilient as cockroaches, and more prolific.
Keep in mind who is paying for spamming: get-rich-quick losers. There's more than one born every minute. They're typically not successful, but there certainly are a lot of them. By the time one of them has realized they're not "getting rich quickly" and give it up, half a dozen more have started up their own "get rich quick" schemes.
Legislation, anti-spam hassling, RTBLs, threats, ISP cut-offs, they all serve to shut these fools down one at a time. But the population growth of fools far outpaces the ability to shut them up.
I agree that address munging "breaks" how things are supposed to work. The reality of spam dictates that many of us have given up how we want things to be, and instead deal with things as they are. I can't afford to fight every stinking spammer in my inbox, and those are the ones that have successfully run a couple of anti-spam gauntlets. Automated spam reporting tools proved useless to me years ago -- and now the anti-spam RTBL sites are busy collapsing. Bayesian filtering has been mostly effective for me so far, but I still find good mail in my spam box. Changing email addresses helped dramatically at home, but is not an option at work. So, if munging helps reduce the spam, it's just another useful tool in my kit. And if you think address munging prevents someone useful from contacting me, you simply have no idea of the depths of my apathy.
John
If you always repalce @ with at then the change to the spambots is minimal and they get the address anyway. Many websites only display the beginging of the mail address and registered users must request the full address or use a mail entry screen to forward the email to the user without devulging the email at all and giving them privacy until they want to respond to you (your email is entered and provide as the replyto address.
No solution is absolutely watertight, and I know it's not going to happen anytime soon, but the current suggestions mention will create an ongoing situation.
I create uber mailto style 1, a some time passes and I find it's no longer working. I create uber mailto style 2, a some time passes and I find it's no longer working. I create uber mailto style 3, a some time passes and I find it's no longer working. I create uber mailto style Nth, a some time passes and I find it's no longer working.....
The solutions here are not permanent fixes and the time and energy we spend in developing these quick fixes should be put to better use such as a more long term approach.
This works if you have your own domain. Thanks to Andrew for this idea: Put the harvestable email on your site. Also, on the same site, (less conspicously), post a similar email address, with the same domain and a similar username. Don't ever give out the second address, it's just for spam. The magnet address may be in the code but not visible, for best effect. Or make it a real mailto:link, but in invisible color and font.
Write a small filter program on your site that stores all spam coming from the spam magnet address for a week, and deletes any incoming spam on your own email address that looks too much like it. Any mail that comes to both at once is automatically deleted.
If you have a Bayesian filter, use the spammagnet address to continuously feed the filter's blacklist.
so if your address is harvey@nagila.com, post
a spammagnet address on the same pages such as hardly@nagila.com. The similarity will tend to catch the spam mailers that group addressees.
That has the disadvantage that in the unlikely chance any spam with a valid return address hits this address, someone will unexpectedly get your autoresponder. Probably not a spammer, though. What spammer gives a correct return address? I'd add a line saying, "If you did not request John Doe's e-mail address, please disregard this message. It was likely generated by a spam message which had your address forged as the sender."
Start with to contact so-and-so clieck here. Have the users name embedded in the email form and the second half you get from the server. So if the user was thomas@englishmuffin.com the web form would have a hidden input called loosername and its value would be thomas. Call it something different than loosername, but the idea is that you don't want to just say username. When the web form gets posted you can have it read a text file (this is what eartlink uses as cgi email (I think the program is mit's cgi-email) and in that text file is the other half of the 'to' email address @englishmuffin.com. Put the two togeather and you have the email address. What this will do is it will allow you to check to make sure that the sender has a valid email address. Check for at least the @ in the email address and prefferable a .com/uk, etc one dot(.) . This means that the sender does not know what they are sending email to until they get a rely and also it means that web harvesting programs are going to ahve a harder time figuring out wht to harvest.
Only 'flamers' flame!
Does slashdot hate my posts?
This is a good point -- if you are just providing your email for people reading your personal home page, there's no reason to risk getting a few spam emails by using a weak masking method. By displaying your email as an image, you can probably reduce the emails you'll get from people with only a minor comment to make, or the non-tech-savvy.
If you're selling something, it's a different story.
There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
You're missing the tag, needed for graceful degredation. You should include something like: .co_m</NOSCRIPT>
<NOSCRIPT>sales (at domain:) example
to make sure that a non-JS browser still sees something (it won't be a link, but it's good enough - use some obfuscation to prevent harvesting). It would look stupid if your web page had text like "Here's my email address: , write me for sales info" with no actual address.
it's javascript - it doesn't submit the form.
if you're worried you can save the source of the page and run it locally.
cheers
I haven't checked the stats recently but Netscape 4.x and earlier does not supports Unicode. Pretty much all browsers can handle the HTML entities given in other examples. You may not care.
(And yes, I'm aware that lines that never cross are not necessarily parallel. They must be coplanar to be parallel. Lines that are not coplanar and never cross are askew. My question/answer example above that "begs the question" also happens to be a fallacious argument.)
Program Intellivision!
How about a simple layer of redirection to a form, with method "GET" instead of "POST" that is really just an HTML file with a proper mailto: link? Do spam bots chase form submission links too?
Such an approach should be effective against most bots, I'd suspect.
If you want to go a step further, just have some text somewhere on the page and a simple CGI on the other side of the form. "Enter this text into the form field below to reveal my email address."
Thoughts?
--JoeProgram Intellivision!
I for one can't stand sites that implement a mail form, and leave no other way to contact the site administrator. It's intrusive:
Between challenge-response programs, misguided filters that swallow (rather than bounce) messages that might be spam, address-to-image scripts that reduce usability for the blind or Lynx-bound, form mail scripts and (a comparitively minor annoyance:) e-mail address munging programs ("piranha at ely dot ath dot cx")... Why must people go out of their way to make others go out of their way to contact them? Ultimately, it's their choice, but we need a better solution.
My problem with mail forms is that I don't have a record of any messages sent or any information if things go wrong with the delivery. Black hole for information == bad.
That being said, if you have a copy sent to the sender as well it's not as evil.
UserAdvocate: The voice of the user
put spam in the email... so in my case make russorspam@msoe.edu or russor@spam.msoe.edu or something like that into a valid email...
Need a Catering Connection
Put in a javascript function to send the email
function m_me (u) {
pre = "mail";
url = pre + "to:" + u;
document.location.href = url + "@reddawn.net";
}
In your page in place of the mailto:
link put this
href="javascript:m_me('uzik')">
-- Programming with boost is like building a house with lego. It's a cool but I wouldn't want to live in it
if using mod_perl and apache use something like Apache::Filter or Mason's built in filter thingy to use regular expressions to change mailto: links to javascript: ones, passing in the email elements so href="mailto:foo@bar.com" would change to href="javascript:form_email('bar.com','foo')
you can also use String.fromCharCode(64) as a substitute for '@'
this way you can have the acutal addresses in the html files that get rewritten when served
That is only the case if you are running an ancient, brain dead copy of the original (Matt's Script Archive) formmail.pl. But you'd be a retard for doing that and deserve everything you get. Modern formmail scripts do not allow spam through.
Looking through my server's logs, I get a lot of attempts to hijack formmail.pl and spam through it, which is a neat trick since I don't even have formmail.pl.
So I started looking around, and found a great little script which automatically reports the attempt to the spammer's ISP.
I haven't installed it yet, but it looks really great: http://home-port.net/fmreport/
Fire and Meat. Yummy.
Just put somthing before thatll nuke the spambot
like
mailto:@#$%$%$%%##%#%%##%@@@@@22@@@@
then mailto:yourname@youraddress.com
not sure if itll work though
how long until
I'm not an expert on this, but my experience (that most bots DON'T harvest html-encoded addresses) is backed up here.
There may be bots out there that do it, but for now, it seems most don't bother. My experience backs this up -- I started getting a few spams at one address, and sure enough, I'd forgotten to encode it. That bot didn't pick up any of the encoded addresses.
Obviously, things can change... if I do start getting spammed at the encoded addresses, obviously I'll have to make a new plan.
There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
On one of my domains, I use a CGI script that returns mailto: links.
Then I can use, for example,
and have the address visible in plain text for copy-and-pasting (with @-to-@ entity replacement to foil simple-minded spambots and added links to foil slightly more sophisticated bots that nevertheless rely on an unbroken string).mailto.cgi, then, takes the "user=fred" parameter and returns Location: mailto:fred@example.com.
Esli epei etot cumprenan, shris soa Sfaha.
Harvest that!
Eat at Joe's.
How about this approach, which would not require anything special on the client side?
PHP (or another suitable server-side dynamic page generator) would append strings to the email address in question before placing it on the page, (strings built from a function known only to the page author or website admin), resulting in a unique mailto address for every page load. Simultaneously the function logs the unique address in a database (e.g. mysql), with a timestamp.
So the resulting HTML might look like:
The MTA (e.g. qmail) is extended so that it examines all mail coming into the domain. If the addressee has a string on the end that fits the function, the extension checks the database for that entry. That entry is only allowed to pass 1 email through. If no email has passed through yet, and the address was "recently" registered in the db, the MTA extension marks address is "used up" and records who it's from. Then the extension passes the email through to the intended recipient, in this example "slantymoniker@slashdot.org". If the address already was used to pass an email through, or it was registered "a long time ago" (e.g. more than an hour after the page was generated for a browser), then the MTA extension would reject the email.
This way, if a harvester gathers the address in a crawling sweep, it would have to immediately fire off a spam email to that address in order to take advantage of it (unlikely since my understanding is that harvesting happens separately from spam runs). Even if spammers re-tooled to do "collect and spam immediately", they could only send one spam per harvest, and not resell or reuse the email address later since it's only good for one shot (like many anonymous address offerings out there).
I haven't seen anything quite like this implemented but would love to know if someone's already offering/doing this.
...but I've just been too busy to finish... and some Perl nut could probably finish it in 2 seconds:
An Apache plugin which will activate whenever a mail address is going to be rendered, and will do the following:
The mail database can clear itself every arbitrary period... something to make undeliverable messages meaningful.
The system could also be expanded to allow for total refusal of receipt from that server by the recipient clicking on a URL.
It's a bit hackish, and not a thorough plan, there are abuse details I've omitted, but I think it is a relatively simple system. The important part is to make the databases transparent and the system as trival to configure as just saying "load module"... The other important part is that nobody ever sees the email address, and being a module, it even applies to dynamic pages.
Interestingly, I've yet to get spam on the account with which I post to Usenet(!) and I've been using it for at least a month now, albeit not too often. I just have my return address as: Name: remove the scientist Addy: foo@bar.einstein.com It seems to be rather effective, and I'm questioning whether any spambot would bother to parse emails for certain phrases. Though I'm probably gravely underestimating the lengths these people are willing to go through to offer me penis enlargements and bad real estate, you don't have to completely abolish all dreams I might have of a last shred of decency in the human race.
JAWSchlech "The secret to success is knowing who to blame for your mistakes." - Despair.com