Honeypot For Identifying Email-Harvesters
Cheese Man writes "Mark Pilgrim describes a simple way to identify email-harvesters: "In each page I serve, I include a bogus email address, encoded with the date of access as well as the host IP address ... This has allowed me to trace spam back to specific hosts and/or robots." There's even a simple one-line example done with PHP. (Thanks to BoingBoing for the links.)"
That there should be email addresses that the big companies "float" out onto spamming lists. When a mass email comes back with these email addresses, it's a flag that its spam, and block the whole message from going into the system. Of course, security on what those email addresses are would have to be pretty tight...
Use it to build blacklists. Any email coming from addressed formatted like this can get recorded into a nice bayesian filter as more known spam.
I am plesently suprised that my anti-spam encoded email address still has not been spammed. And even a recent spam study found that only normal email addresses got spam.
It wouldnt take much to find and decode most of the simple spam-protected email addresses. And I dont think it would take long for the spammers to detect a system such as this and bypass it, but I dont think they will bother at the current climate.
But pretty soon I suspect we will get much cleverer email collecting tools and the problem is going to get to the scale of the virus/anti-virus stage.
Mouse powered Chips, Open source Processors and Lego
What can you do with somebody's IP address (that was in the email they harvested)? Resolve it and hope email sent to abuse@theirdomain.com does something?
John Kerry is a Joke!
I wonder if maybe someone could create a network of honeypots, and feed the data into a database that could be accessed in real time by web servers, to deny access.
It would probably impose too much of a performance hit for a popular site, but maybe for smaller stuff -- your bio page, or whatever -- it would be appropriate.
While there's no way to pursue email harvesters through legal channels, there's other ways this technique is useful.
In the example given, the spam harvester used a unique User-Agent string and a constant IP address for spidering. As a web site owner, you could block requests based on either of those credentials. In addition, you can publish your findings so that other web sites and networks can block the harvesters you find too.
You can also complain to the harvester's ISP. Since spam is often sent with open relays, you can't track down spammers through email headers. But by recording the IP address that harvested your email address, you know the initial source of the spam. The email address gives you a point of contact to start complaining to ISPs and possibly track down spammer's marketing site.
my blog
These guys come like a thief in the night. They load your page like any other search engine spider. Its like knowing the face of the guy who went through your neighborhood, trying every door knob in the guise of distributing an advertising flyer, then later he disclosed to other thieves, unknown to you, whose at home during the day and who is not.
Yes, its helpful in building a case, like knowing who is going through a neighborhood trying all the doors, but catching the actual guy in the act is not as easy.
Some of this spam is really getting nasty. Just two days ago, I received this spam in my box purporting to be from the fraud department of Best Buy regarding CD players some guy in New York is trying to buy with my credit card. It seemed a really professional email, except they didn't know my name, and apparently had to get my email addy from a national credit bureau agency. When the links did not point as shown, I really became leery. The whole thing was apparently a ruse to get me to log into their site and disclose all sorts of personal information, playing on my fear that if I did not do so, the fraudulent transaction would complete.
Watch out, guys. There's a lot of deception going on out there.
Any tools and techniques we make to help us find out who these little rascals are is really welcome. Being some students just got nailed for their life savings for just their involvement in sharing a few songs, I trust this same environment can be used for those involved in internet scams which often cost not just a few record sales, but often substantial, I mean really substantial, grief for the victim.
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
Surely the email harvester will just 'learn' to remove it's own IP number and possibly a date (or even better, just increment the IP number date to generate an infinite number of email addresses)
A more advanced method would probably hash the ip with the date in a non-obvious way, but it'd have to be a one-to-one mapping of IP's at least and a two way hash to retreive the IP number.
Even storing the IP number as the apache-log line (if that's possible) would work, but real addresses would always work better but would require a dummy domain (e.g a dictionary of names stuck together with ._-). But unless you encode the IP you need a lookup table from your logs which is overhead.
Of course, this still doesn't address the real problem, the people who should be traced and punished are not the spammers but the companies that use the spammers, there will always be foreign companies willing to spam for you if the law makes it illegal. Few of the spams I see are international companies (ok, most of them are porn sites which are probably just harvesters).
The first link in the story also had a link to Cyveilance, which keeps appearing in my spamcop reports as "3rd party interested in spam), apparently their a chase (suspected) copyright infringement on the web....not sure I want to help them anymore..
BBThe only email address I have on my site is blockme@mydomain and if anyone sends an email to that one they get blacklisted. Easy but effective.
Nah, just put up a WebPoison page and spoil their ill gotten gains by fooling the harvesters into grabbing lots of apparently valid (tho very fake) email addresses. If enough of their customers get pissed for being sold bad email lists, eventually the problem will be lessened. http://www.monkeys.com/wpoison/ "So the basic idea behind Wpoison is to trap unwary and badly engineered address harvesting web crawlers, and to fool them into adding enormous quantities of completely bogus e-mail addresses to the E-mail address data bases of the spammers, thus polluting those data bases so badly that they become essentially useless, thereby putting the spammers who are using them out of business, or at least shutting them down for a time and causing them some major headaches while they try to clean up the messes in their now-heavily-polluted e-mail address data bases." "...if one of these spammer address harvesting web crawlers is left to try to digest your entire web site, say, overnight, then within a few hours (and certainly by morning) its data base of e-mail addresses will have been well and throughly polluted by millions of utterly bogus e-mail addresses..."
WebPoison has been around for a while, so I wouldn't be surprised if spamware can detect and filter wpoison pages. (Barring a wpoison tweak to fool that spamware, followed by a tweak of the spamware, etc.)
One line blog. I hear that they're called Twitters now.
If they are misbehaving bots (feed them a robots.txt too), just block their IPs and don't bother being polite. (Or feed them wpoison.)
One line blog. I hear that they're called Twitters now.
You should do what I do, and set up a "tar pit" on your website, with a bunch of bogus randomly generated e-mail addresses, and links back to itself. On last count, I've handed out over 100,000 false e-mail addresses.
Michael C. Hollinger
great idea; I have a static page with thousands of random email addresses generated by this Perl script, but this wpoison is sweet; the pages seem genuine and it would keep a robot busy for a long time.
I'd like to see millions of web sites adopt this approach; then perhaps spammers would be overwhelmed by bogus email addresses and it would cost them more money to figure out ways around it, if it's even possible.
The principle is similar to the Nigerian spam baiting that some of us engage in; if thousands of us did it, these turds would simply be overwhelmed and would have to find some other way to make a living!
it's = "it is"; its = possessive. E.g., it's flapping its wings.
It's been a few years ago, but I had a typo on my car registration and title. I was going to get it fixed, but within 2 days of my regestration, I got mail with the same wrong name. Then I started getting sales calls. I never fixed the registration. My vehicle registration was good for about 1/3 of my snail mail junk.
It came from places you wouldn't expect it. Sideing salesmen were the worst. I was renting an apartment at the time.
The truth shall set you free!
I would at least conver the IP address to hex (e.g. ef0f3bad) so its not really obvious what you're doing -- makes the address look more "real" too