Best Method For Foiling Email Harvesters?
pjp6259 writes "One of the common ways that spammers generate email mailing lists is by harvesting email addressess from websites. But in many cases you also need to make it easy for your customers to reach you. I have found three common solutions to this problem: 1.) Use an image to replace your email address. 2.) Use ascii encodings for some/all of the characters. 3.) Use javascript to concatenate and/or obfuscate your email address. Which of these methods are most effective? Are email harvesters able to interpret javascript? What do you use?"
My two favorite methods are:"
- Putting the e-mail in a distorted picture (like a captcha) - this is very difficult for spam crawlers to read
- Using a long human readable message "tset ta tset tod moc.reverse.each.word.prior.to.first.dot.for.addr
In general, your best defense is to employ some method that requires human interpretation.
Crack - Free with every butt and set of boobs
Spend 10 minutes and make an HTML form for people to contact you. Be careful what you name your field names, though, as there are spam bots that can target web forms.
If people need to send you files, they can do so after you reply back to them.
As for whether the harvesters can interpret javascript, I think that it depends on the particular harvester. You could analyze the source or the created page.
I have one email that I use specifically for REPLYING to emails and that one is the one that gets the MOST Spam.
I like microcars
use a table with 3 columns.. the first with the first part of your email addres, the second with @ and the third with domain.com. simple searches on the pages make it hard to find and with a border of 0 the user won't notice the table.
There exists some positive integer N that you are the Nth person to read this signature.
Hide in the webpage a bogus email address. Maybe in comments, maybe in the corner with a super tiny font which matches the background. Whatever mail gets sent to that address should be automagically blocked to all other accounts.
----
Go canucks, habs, and sens!
I've heard the following works fairly well, but haven't tried it m'self.
Put 2 email addresses on your web site, the real one, and a 'decoy' one which is hidden from normal users (eg white-on-white text right at the bottom of the screen).
Any email that arrives at the 'decoy' address is parsed, and the sender added to a blacklist.
Quidquid Latine dictum sit, altum videtur (anything said in Latin sounds important)
You know when they said you were special? They were trying to tell you to just do something different than everyone else. If everyone did a table trick or wrote "blank at blank dot com" or did any other clever little thing a programmer could come along and regex the hell out of it. Be unique and make them deal with your site individually.
That being said, I don't think spammers crawl the net looking for addresses so much. Their zombies have all the addresses they need. Just try to give out your email address to people that don't have an affinity for virus infections. In my case, I protect my customers so my address hasn't been abuse too heavily thus far.
My actual e-mail address, in convenient text format and as a mailto: link, is at the bottom of every single web page at my personal web sites. I really don't see why I should change that just because spammers might harvest it. My e-mail address has been up there since about 1996, so that's at least a decade's worth of harvesting. I've also used the same e-mail address on Usenet posts.
Yes, I get quite a lot of spam. But with the usual techniques (greylisting, SpamAssassin, etc.) I only actually receive maybe half a dozen spam e-mails a day. And more importantly, all my actually valid e-mail still seems to get through just fine. I'm happy with it, and I get the personal satisfaction of being able to use my e-mail address wherever I damn well like without having to cower from spammers.
Same here. I block ALL incoming mail traffic from China, Korea, Japan, etc. on my personal domains because of the volume of spam that originates from those countries. The remainder is fed through SpamAssassin which does a pretty darned good job of tagging likely spam and filtering out obvious spam.
If you run your own mailserver this is a handy option. I have my primary email address that I only give to people I trust that are not using windows machines. Anytime I have to give my email to a "risky" place, like to submit a request for something, that requres a valid email address, or to register, I create a new email alias.
This spring I was shopping for a new SUV, interested in an Escape. I went to ford's web site and they had a "submit email address to have dealers in your area contact you". Sure that's easy enough. But I'm paranoid. Yes it's Ford but still. So I made "v1ford" forward to my main email address. I got five replies from dealers in my area and forgot about the whole thing.
SIX MONTHS LATER I started receiving spam, one per day, to v1ford. Bastards. And they waited half a year before sellign me out, thinking I would not know! So that alias which I had forgotten to delete after I got my replies, I just deleted and they "went away". It astounds me that someone that I am about to buy a $26k product from is doing things to piss me off.
Tho to be fair it was probably one of the five that replied to me, that got his PC owned by a spam virus. But still, that's not responsibly protecting the privacy of your (potential) customers. Just goes to show, you really can't trust ANYONE with your real address nowadays - even if they are reputable and have integrity, you can't count on them ALL being bright bulbs, and it only takes one to ruin it for you.
Using this system I have only received spam on a few occasions, one of which was when a large company I trusted posted my email address on their web site. (d'oh!)
I work for the Department of Redundancy Department.
I try to run any mailtos through an email obfuscator .. as the link says, a 6 month study showed that obfuscated emails "do not receive junk mail."
My theory is that harvesters have enough email addresses out there to gather and that the spammers are too lazy/have no need to write algorithms that interpret these types of mailtos.
...unfortunately no one can be told what The Mat^H^H^HGoatse is...they must experience it for themselves...
I have found that using SPAM as your username works wonders
just post it right there on the webpage or leave it as a mailto:spam@example.com
So many people use NOSPAMjohn@NOSPAMexample.com (remove the NOSPAM to reply)
or some variation of that, I tried using spam@example.com as my email address on Google Groups and previously on Usenet.
I got pretty much nothing. No spam. Not then, not now.
Since the email harvesters apparently filter out variations of addresses with SPAM, NOSPAM, DIESPAMMERS etc in them, once they filter out the "SPAM" part of spam@example.com they are left with @example.com which is not a valid email address.
I like microcars
I think you hit the nail on the head. Strictly speaking, if you want to use text and don't leave a plain text version of your e-mail, you are at risk of being inaccessible.
I made a contact form for my site to avoid harvesters. While spammers do have scripts to submit contact forms, it's easier to trick a robot based on it's form input than based on what the robot can parse from the page (e.g. put a hidden field called phone number and fail the form on the backend if it has a value since most spam bots will try to enter something, and make sure there is an HTTP_REFERER, or ask for the user to duplicate some text in a field that is on the page somewhere else).
A lot of these suggestions are fine for personal sites; but if you're actually in business they aren't practical.
We use Javascript. You don't want to make life more difficult for the person trying to correspond - the point is to raise the cost to the spammer. If they have to add a Javascript parser to their spider, it's going to slow them way down. It's not going to make financial sense for them to do a custom solution for each site (and if they do, the "image" methods will break down as well).
When someone writes to me and says "reply to joe at gmail dot com" (or whatever), they generally don't get a reply. Why is their time more valuable than mine?
#DeleteChrome
How about instead of entire contact form, which might not allow bullet points or attachments, etc. that people may wish to use, just use a basic email submit?
Take a form putting the email alias in the table, and write a simple HTML form control that clicking the submit button takes the text on the page ("example") and appends the '@' sign and the domain ("example.com") in a two-step process, and spits out a "mailto:" link as the final step.
From the user's perspective, you get a little box that has your mailID and an 'Email me!' button right next to it. When they click the button, their mail client pops up and they can get straight to business. Because the address is stored in three-four chunks in the page code, the harvester isn't going to assemble it. Seems to me like that should be fairly effective.
A while ago, I've set up an article on my homepage that combines all techniques without compromising usability:
http://www.thany.org/article/73/E-mail_hiding
If the spammers want so bad email addresses, why not give it to them? List poisoning will sting them right in the buttocks, and will make them think twice before they even consider sending there dumb spiders to your servers again. Take a look at the following sites for more info:
http://www.monkeys.com/wpoison/
http://www.spampoison.com/
My other OS is the MCP!
That's called a challenge-response system.
Those are EVIL and should be banned from the Internet.
My personal domain has been hijacked by spammers. Despite having a valid SPF record, they still send spam with my domain forged as the sender. Consequently, when someone has a challenge-response spam filter configured, those challenge message come to ME, despite the fact that I had nothing to do with the original message. I consider those challenge messages spam themselves, and report them to spamcop as such.
There are better ways of filtering spam. Forcing other people to filter your mail for you is extremely inconsiderate.
"The guide is definitive, reality is frequently inaccurate."
I mean yeah some of the tips and tricks may (or may not) work in the short run but eventually the spammers will get your id (not to mention the trouble to your customers if you obfuscate the id too much). Its not always how you displayed you mailid on your website or webpage that ultimately gets it harvested. More often than not, its stupid users with your address in their contact lists who get it out in the open.
Like most of the people, I use multiple mail ids for different uses. Lots of them are fakes just to register to sites and such, and a couple are private ones which are used only to correspond with the closest friends and family members. Recently one of my friends told me that he has used my address to register for a gaming site since his was already being used for one account and apparently creating a new id takes ages and he may die before he gets a new one so why not use mine which is totally personal to me but who gives a damn. He actually has no idea why he should Not be doing it. And he is a CS major from the one of the best colleges in the country! Now think of the regular users you may have corresponded to and how easy it is for them to fuck everything trick you have tried to evade harvester bots.
Politicians and Pedophiles: Two groups of exploitive bastards who are most dangerous when they're thinking of children.
I didn't even think of that. It seems that you would have to make a website that was readable (by a software page reader) and easily usable by the blind, but still difficult to extract the email address. Maybe you could put an audio clip of contact info, akin to a voicemail message.
What we need for someone to instead of talk, perform two experiments:
1. Create 10 new email addresses, and post them around the net with 10 obfuscation tricks (plenty of examples can be found in this thread). Which of these tricks actually foiled the spammers, and which did not? Obviously, spammers can theoretically get around any obfuscation, but which obfuscations are still "safe"?
2. Do an experiment to figure how how "safer" is an address that was never posted on the Web. Does it just cause a small delay in spam (say, you only start getting spam after a month) or does it get noticably less spam?
The answer to #2 isn't as obvious as some may think. One important problem to consider is spamming worms which use fake "from" addresses. These worms take your friends' email addresses - potentially addresses which have never been published - and use them as spam to random people. If a spammer also receives these mails, he gets a constant stream of real email addresses which were never published on the web. Another obvious issue is dictionary attacks, which are especially practical on large domains (e.g., gmail).
Have your code produce a unique contact e-mail address on the page for each visitor, so for instance:
support-312321@example.com
Then set up a catch all on the first part of the address.
If you get any spam, just block out that one receiving address.
Obfuscating emailaddresses on websites is one way of tackling the spam harvesters problem. Training filters by becoming somewhat of a spam-magnet is another way. The only problem herein lies in the differentiation between ham and spam. Spam is here and will be here for a long time to come because people do make (a lot of) money with it. SO you could say detecting it is more sensible compared to avoiding it.
I've been experimenting by adding an automatically generated code to my email adresses on my page (recipientDELIMcode@domain.ext). Spammers keep on sending me spam on these addresses, and i accept, and train my mailfilter this way. The only thing I have to do is add 'contaminated' email addresses to my shitlist once i've found spam being sent to it. As you might already have guessed... the shitlist is a simple forward to sa-learn.
Adding an auto whitelister based on my own address book (LDAP is sweet) tackles the problem of addressbook harvesters, mail from these sources will not be fed to sa-learn, even if the email address its received on is shitlisted.
A friend of mine, who listens to the name of 'the wanker who cant keep his antivir up to date'/Paul created the need for me implement this feature by becoming infected by a _addressbook_leechin_virus_
To receive even more spam to feed to my hungry sa-learn there's a set of email addresses on my site (>50% of all email addresses there are in hidden fields/autogen'd pages) which are passed thru to sa-learn by default.
I've also been thinking of combining the unique id email address with a database in which i store served (generated) email addresses and giving them a grace period of N mins. If i recieve an email within these N mins i assume this email was sent by a visitor on my site who clicked the mailto: link and the message is passed to my mailbox and the unique id generated email address is flagged as non-spam source. However.. if I recieve mail on that email address after the N mins i assume its a spam-run and feed it to sa-learn I'm not sure on ROI (code-time/overhead/extra dependencies serverside) with this technique because what i have now works well enough for me.
The downside is you can't give out your email address on things like a business card (lastname@domain.ext). A possible solution to this is replacing your email address with an URL like http://lastname.domain.ext/ on which a mailto: refresh is generated with the unique id'ed email address. Or trusting the intelligence of the lean-mean-(and pretty well trained)-spamkilling-machine, which is good enough for me.
My 2ct.