Stopping Spambots: A Spambot Trap
Neil Gunton writes "Having been hit by a load of spambots on my community site, I decided to write a Spambot Trap which uses Linux, Apache, mod_perl, MySQL, ipchains and Embperl to quickly block spambots that fall into the trap. "
Looking at my Day Job and personal web site, other than the very cool technical achievement of the trap (I'll have to see if I can rewrite this for my Checkpoint FW system), there were one things I learned about good design from this article:
Eliminate mailto - makes sense. You should have an http based "send me a message system" - force a live person to type stuff in instead of letting a program pick out addresses.
Eliminating mailto alone would probably help in mot of my spam problems (as I have my "contact me" address right on the first page).
52 Weeks, 52 Religions with John Hummel
"I have a truly marvelous demonstration of this proposition which this bandwidth is too narrow to transmit."
www.timcoleman.com is a total waste of your time. Never go there.
Why on Earth would you like to block a spambot? So it doesn't get any more useful addresses? /give/ it a next page. With a nicely formatted word1word2num1num2@word1word2.com, where words and nums are random.
No way, man.
If you realize you're serving to a bot, go on serving. Each time the bot follows the "next page" link, you
Give it thousands, millions of addresses this way.
The dude fell in his own trap. :-D
As it turns out, I really haven't received that much mail to this address. About the only mail I've ever received to it is someone from trafficmagnet.net, who tells me that I'm not listed on a few search engines and that I can pay them to have my site listed. I need to send her a nasty reply saying that I don't care about being listed on Bob's Pay-Per-Click Search Engine, and that if she had actually read the page, she would have noticed that she was sending mail to an invalid address. Besides, the web server is for my inline skate club and we don't have a $10/month budget to pay for search engine placement.
I think I've received more spam from my Usenet posting history, from my other web site, and from my WHOIS registrations than I've received from the skate club web site.
From the website:
The Problem: Spambots Ate My Website
s/Spambots/Slashdot/
My PHP spider-trap - See an infinity of email addresses and links in action!
Removing mailto: links is a bad solution to the problem. It might be the only solution, but it is bad.
I hate the editor in my web browser. No spell check (and a quick read of this message will prove who diasterious that is to me), not good editing ability, and other problems. By contrast my email client has an excellent editor, and a spell checker. Let me pull up a real mail client when I want to send email, please!
In addition, I want people to contact me, and not everyone is computer literate. I hang out in antique iron groups, I expect people there to be up on the latest in hot tube ignition technology, not computer technology. To many of them computers are just a tool, and they don't have time to learn all the tricks to make it work, they just learn enough to make it do what they want, and then ignore the rest. Clicking on a mailto: link is easy and does the right thing. Opening up a mail client, and typing in some address is error prone at best.
Removing mailto: links might be the only solution, but I hope not. So I make sure to regualrly use spamcop.
This isn't such a good idea - for every random (non-existent) domain that you generate, a root DNS server will be queried when an email is sent to this address, which increases the load on the root servers, which is generally a bad thing. How about instead, returning pages with the email address abuse@domain-that-spambot-is-coming-from all over them...
After the Battle Creek incident with ORBZ, the maintain changed the way it worked; instead of being pro-active on checking for open relays, he now has a 'honeypot' like system where a unique email address that isn't directly visible on the site but still may be harvested by a spam bot. Any server that sends email to that address is automatically added to The List. Mail server admins that believe that they should not be on this list can argue their case to remove their server.
"Pinky, you've left the lens cap of your mind on again." - P&TB
"I can see my house from here!" - ST:
Superior Labs spambot_trap mirror
-Spack
Here's a tip for those of you writing spambot traps... How about not blindly responding to the faked Return-Path address?
Now that should be illegal. You people whine about your 10 spams a day, try 10,000 from 2000 different email addresses. Idiot postmasters should be caught and jailed.
formmail itself (even the most recent version) can still be abused by spammers to use your webserver as a bulk mail relay - see the advisory ato ry . df
http://www.monkeys.com/anti-spam/formmail-advis
It's a shame he didn't suggest the more robust formmail replacement at nms which is maintained, and attempts to close all the known bugs and insecurities.
Add a couple of sleep(20); into the cgi script that generates the bot fodder. The bot will still stay busy waiting for your webserver's response, but your script will exactly consume zero resources.
For additional kicks, set up a DNS teergrube.
Say no to software patents.
I've found that a lot of people just won't send email if there's not a link to facillitate it. I've become rather fond of using javascript to write the address to the page. Spambots read the source so they don't piece the address together but *most* browsers will still do it right. Just use something like:
<script>document.write("<A CLASS=\"link\" HREF=\"mailto: " + "myname" + String.FromCharCode(64) + "mydomain"</script>
Seems to work fine. Anyone know of any reason it shouldn't, or have any other way to keep down spam without totally removing the Mailto: ? I know this won't work with *every* browser, but it beats totally removing mail links. And I don't think spammers can get it without having a human actually look at the page...
do not read this line twice.
My setup (catches some of the more commonly used spambots) uses mod_rewrite to send spammers to a trap.
Setup details at http://www.bero.org/NoSpam/isp.php
This message is provided under the terms outlined at http://www.bero.org/terms.html
Have your page linked on slashdot! Page gets slashdotted, problem solved.
1) Put a link such as: mailto:dedicatedaddress@wherever.com?Subject= [Question] About your site (or whatever)
2) Trash any email sent to dedicatedaddress that doesn't have the [Question] tag in the subject.
Hope this helps.
-- B.
This sig does in fact not have the property it claims not to have.
Why is this a bad thing? They are owned by Verisign.
How about instead, returning pages with the email address abuse@domain-that-spambot-is-coming-from all over them...
This is also a good idea. In fact, I have a script which does a traceroute to the IP of the bot, and then looks up the admin contact using whois for the last couple of hops, and returns these. Oh, and for additional fun, throw in a couple of addresses of especially loved "friends"...
Say no to software patents.
Write some of your email address using html code for the ascii characters, like $ # 114 for "r".
(Yes, I've posted about this before, but it does work for me.) Browsers render it so users get the address they want, but spambots try to grab it from the raw html and get something meaningless.
Add a couple of sleep(20); into the cgi script that generates the bot fodder. The bot will still stay busy waiting for your webserver's response, but your script will exactly consume zero resources.
Zero resources, except for memory.
A much better solution would be to point the bot at a set of "servers" with IP addresses where you're running a stateless tarpit.
Tarsnap: Online backups for the truly paranoid
The page is already slashdoted. Here is a little .htaccess file with mod_rewrite turned on
/dont_go_here /images /cgi-bin
R EMOTE_HOST);
script that traps bots (and others) that use your robots.txt
to find directories to look through. Requires an
robots.txt
#################
User-agent: *
Disallow:
Disallow:
Disallow:
dont_go_here/index.php
############
$now = date ("h:ia m/d/Y");
$IP=getenv(REMOTE_ADDR);
$host=getenv(
$your_email_address=you@whatever;
$ban_code =
"\n".
'# '."$host banned $now\n".
'RewriteCond %{REMOTE_ADDR} ^'."$IP\n".
'RewriteRule ^.*$ denied.html [L]'."\n\n";
$fp = fopen ("/path/to/.htaccess", "a");
fwrite($fp, $ban_code);
fclose ($fp);
mail("$your_email_address", "Spambot Whacked!", "$host banned $now\n");
AdFuel
From the website: Wpoison is a free tool that can be used to help reduce the problem of bulk junk e-mail on the Internet in general, and at sites using Wpoison in particular.
It solves the problems of trapped spambots sucking up massive bandwidth/CPU time, as well as sparing legitimate spiders (say, google) from severe confusion.
Actually, I've done this w/a bot trap on my site at home. It's a perl script that generates a bunch of weird-sounding text w/some fake email addresses at the bottom and a bunch of database-query-looking links back to the original page.
The bots don't fall for it anymore. Some dorks in Washington state decided to make a couple requests a second to it once, but in the two years I've had it up, they're the only ones.
A pretty good article, but being able to install modules into Apache may not be the best situation for everyone who wants to stop Spambots..
Shameless plug, but I've got an ongoing series in the Apache section of /. that deals with easy ways that administrators *and* regular users can keep Spambots off their sites:
Stopping Spambots with Apache
and
Stopping Spambots II - The Admin Strikes Back
Just some more options and choices to help people out!
I agree. And, come on, how much technology do you need?
This is my solution to stopping spambots. It's in a JavaServlet technology and I am posting it here to prevent my company's site from being slashdotted. It does not prevent the spammer from harvesting emails it just slows them down.... a lot :) If everyone had a script like this, spambots would be unusable.
Feel free to use the code in anyway you please (LGPL like and stuff)
Put robots.txt in your root folder. Content:
User-agent: *Disallow:
Put StopSpammersServlet.java in WEB-INF/classes/com/parsek/util:
package com.parsek.util;import java.io.File;
import java.io.StringWriter;
import javax.servlet.ServletContext;
import java.net.URL;
import java.util.Enumeration;
import java.lang.reflect.Array;
public class StopSpammersServlet extends javax.servlet.http.HttpServlet {
private static String[] names = { "root", "webmaster", "postmaster", "abuse", "abuse", "abuse", "bill", "john", "jane", "richard", "billy", "mike", "michelle", "george", "michael", "britney" };
private static String[] lasts = { "gates", "crystal", "fonda", "gere", "crystal", "scheffield", "douglas", "spears", "greene", "walker", "bush", "harisson" };
private String[] endns = new String[7];
private static long getNumberOfShashes(String path) {
int i = 1;
java.util.StringTokenizer st = new java.util.StringTokenizer(path, "/");
while(st.hasMoreTokens()) { i++; st.nextToken(); }
return(i);
}
public void doGet (javax.servlet.http.HttpServletRequest request,
javax.servlet.http.HttpServletResponse response)
throws javax.servlet.ServletException, java.io.IOException {
response.setContentType("text/html; charset=UTF-8");
java.io.PrintWriter out = response.getWriter();
try {
ServletContext servletContext = getServletContext();
endns[0] = "localhost";
endns[1] = "127.0.0.1";
endns[2] = "2130706433";
endns[3] = "fbi.gov";
endns[4] = "whitehouse.gov";
endns[5] = request.getRemoteAddr();
endns[6] = request.getRemoteHost();
String query = request.getQueryString();
String path = request.getPathInfo();
out.println("<html>");
out.println("<head>");
out.println("<title>Members area</title>");
out.println("</head>");
out.println("<body>");
out.println("<p>Hello random visitor. There is a big chance you are a robot collecting mail addresses and have no place being here.");
out.println("Therefore you will get some random generated email addresses and some random links to follow endlessly.</p>");
out.println("<p>Please be aware that your IP has been logged and will be reported to proper authorities if required.</p>");
out.println("<p>Also note that browsing through the tree will get slower and slower and gradually stop you from spidering other sites.</p>");
response.flushBuffer();
long sleepTime = (long) Math.pow(3, getNumberOfShashes(path));
do {
String name = names[ (int) (Math.random() * Array.getLength(names)) ];
String last = lasts[ (int) (Math.random() * Array.getLength(lasts)) ];
String endn = endns[ (int) (Math.random() * Array.getLength(endns)) ];
String email= "";
double a = Math.random() * 15;
if(a if(a if(a if(a if(a if(a if(a if(a if(a if(a if(a if(a if(a email = email + "@" + endn;
out.print("<a href=\"mailto:" + email + "\">" + email + "</a><br>");
response.flushBuffer();
Thread.sleep(sleepTime);
} while (Math.random()
out.print("<br>");
do {
int a = (int) (Math.random() * 1000);
out.print("<a href=\"" + a + "/\">" + a + "</a> ");
Thread.sleep(sleepTime);
response.flushBuffer();
} while (Math.random() out.println("</body>");
out.println("</html>");
} catch (Exception e) {
out.write("<pre>");
out.write(e.getMessage());
e.printStackTrace(out);
out.write("</pre>");
}
out.close();
}
}
Put this in your WEB-INF/web.xml
<servlet><servlet-name>stopSpammers</servlet-name& gt;
<servlet-class>com.parsek.util.StopSpammersS ervlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>stopSpammers</servlet-name& gt;
<url-pattern>/members/*</url-pattern>
</servlet-mapping>
Here you go. No PHP, no APache, no mySQL, no Perl, just one servlet container.
Ciao
boky
However, the instructions for installating Wpoison more or less assumes that one has a single website to protect. I have around 20 virtual hosts. So instead of creating a renamed cgi-bin in every DocumentRoot, I added a single
ScriptAlias /runme/ "/var/www/cgi-bin/"
to httpd.conf and then linked it like this:
<A HREF="/runme/addresses.ext"><IMG SRC="pixel.gif" BORDER=0></A>
I also added a single transparent pixel to the link to keep it invisible but still fool the spiders. Add the runme directory as excluded in the robots.txt and you should be on your way. Muhahahah, and so on.
Money for nothing, pix for free
There's a spam-blacklist, so how about a spambot-blacklist?
You'd have a standardized spambot trap (like the one described in the article) on various webservers. The new spambot info could go into a "New SpamBots" database (which wouldn't be blocked). Once a day, the webserver would connect up with a central database and submit the new spambot info it's obtained. Then the server would download a mirror of the updated "SpamBots" database which it would use to block spambots.
The centralized SpamBots database would take all of the new SpamBot info every day and analyze them in some manner as to detect abuse of the system (ensuring that only true spambots are entered). E-mails could be fired off to the abuse/postmaster/webmaster for the offending IP address. Finally, the new SpamBot info would be integrated into the regular SpamBot database.
This way you'd be able to quickly limit the effectiveness of the Spambot-traps across many websites.
My sci-fi novel, Ghost Thief, is now available from Amazon.com.
Especially loved "friends"...
Like hotline@mpaa.org, cdreward@riaa.org, senator@hollings.senate.gov for example?
Dear Spambot Authors,
Thanks again for your interest. I hope that we were able to help you write the spambots of the future that will be able to detect and sidestep as many of the above protection schemes as possible. We tried to work all of our knowledge into one convienient thread for your development team to peruse.
Thanks for your interest in SlashDot, home of too much information.
------
Today's Top Deals
Why on Earth would you like to block a spambot? So it doesn't get any more useful addresses? /give/ it a next page. With a nicely formatted word1word2num1num2@word1word2.com, where words and nums are random.
No way, man.
If you realize you're serving to a bot, go on serving. Each time the bot follows the "next page" link, you
Give it thousands, millions of addresses this way.
This would be good to do with known bad addresses, but random addresses only add more unknowing people to the list. You may add 1000 email addresses to the list and slow them down, but if even 10 of those email addresses are real, you've added to the problem. The bad addresses will be taken out as they are found to be bad, and the good ones will be left in. You've signed JoeRandomUser@RandomDomain.com up for all the spam he can handle, even if he has taken great lengths to keep his email address off the spam lists. In theory this sounds like a great idea, until your the guy getting your email address randomly fed to the bots.
"Information wants to be expensive" - Stewart Brand, the same guy who said "Information wants to be free"
Take a look at these two bits of code from http://www.slickhosting.com/contact.shtml :
O ver="window.status='mailto:hostingsli ckhosting.com';return true;"c khosting.com</A>
<A HREF="mailto:hosting%40slickhosting.com"
onMouse
onMouseOut="window.status='';">hostingsli
<!-- Spam trap
abuse@ (your domain) HREF="mailto:abuse@ (your domain) "
root@ (your domain) HREF="mailto:root@ (your domain) "
postmaster@ (your domain) HREF="mailto:postmaster@ (your domain) "
uce@ftc.gov HREF="mailto:uce@ftc.gov"
-->
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
postmaster@127.0.0.1 and abuse]@127.0.0.1postmaster@127.0.0.1 and abuse@127.0.0.1
Good idea but, I'm sure spam software has been rejecting 127.0.0.1 for many years.
How about a few people volunteering real FQDNs that all resolve to 127.0.0.1? I realize that people would be volunteering horsepower and bandwidth for DNS lookups, but it would be in the name of dramatically reducing spam. Then, keep a list of all the "loopback FQDN's" and let the rest of us feed those FQDN's into spam-trap generators. Eventually, there would be so many real-looking spam trap email addresses that the spam software wouldn't be able to keep up with the list of loopback FQDN's.
To take it to the next level, you could hide the list of "loopback FQDN's" by making a reverse DNS lookup against a couple of volunteered IP addresses return a random FQDN from the list of loopback FQDN's at the time that the spamtrap page is dynamically generated.
Spammers would never know the entire list of FQDN's that resolve to loopback.
Intelligent Life on Earth
Way too much work. Here's similar Escapade [escapade.org] code:
<QUIET ON>
<html><head><title>Members area</title></head><body>
<p>Hello random visitor. There is a big chance you are a robot collecting mail
addresses and have no place being here.
Therefore you will get some random generated email addresses and some random links
to follow endlessly.</p>
<p>Please be aware that your IP has been logged and will be reported to proper
authorities if required.</p>
<DBOPEN "SpamFood", "localhost", "login", "password">
<FOR I=1 TO 100 STEP 1>
<SQL select * from names order by rand() limit 1>
<LET FN="$Name">
</SQL>
<SQL select * from lasts order by rand() limit 1>
<LET LN="$Last">
</SQL>
<SQL select * from addresses order by rand() limit 1>
<LET AD="$Address">
</SQL>
<a href="mailto:$FN.$LN@$AD">$FN.$LN@$AD</a> <br>
</FOR>
</body>
</html>
-- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
I don't stop spambots, I feed them. I feed them phony email addresses and addresses of spammers (gathered from places such as my fake /cgi-bin/formmail.pl). I use
http://www.devin.com/sugarplum/, mentioned before on /. to dish it out!
is that some of the fake emails it generates will be real.
$5 / month hosted VPS on linux = awesome!
Speaking of spam, I've come across this new program called mailwasher. You can check your mail while it's still on the server, and then - get this - fake a bounced message. There are probably other programs that do this, but this is the first one I've heard of.
Anyway, AFAIK, it's WinBlows only, and available at http://www.mailwasher.com, although right now it seems the site is down, all I get is a 404!
...so that you can leave them out of your HTML source:
j s
http://artificeeternity.com/includes/linkwrite.
Instructions for use are included in comments. The script fragment that replaces mailto: links in the page will actually shorten your code -- it only requires entering the username and domain once. Also, the @ sign is added in by the script, so the address itself never appears in your HTML.
Way too much work. Here's similar Escapade [escapade.org] code:
Not similar enough. That makes 300 queries per hit against your database, and I don't think you even used prepared statements. His code slowed their software to a crawl by sleeping. Yours will slow your software to a crawl by excessive database traffic.
It's actually http://www.mailwasher.net/.
If you're not messing with DNS, though, there are lots of addresses that can cause trouble:
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
And somewhere out there is a far nastier variant on a teergrube that can keep a typical smtp session up for hours with only a few kilobits/minute, using tricks like setting TCP windows very small, NAKing lots of packets so TCP retransmits them, etc. (It basically works by saying "No, SMTP/TCP/IP isn't a set of protocol drivers in my Linux kernel, it's a definition of a set of messages and there's no reason I should user a bunch of well-tuned efficient reliable kernel routines when I can send raw IP packets myself designed for maximal ugliness."
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks