Slashdot Mirror


Stopping Spambots: A Spambot Trap

Neil Gunton writes "Having been hit by a load of spambots on my community site, I decided to write a Spambot Trap which uses Linux, Apache, mod_perl, MySQL, ipchains and Embperl to quickly block spambots that fall into the trap. "

304 comments

  1. 1st Spam by Anonymous Coward · · Score: 0, Informative

    FS !!

    1. Re:1st Spam by Anonymous Coward · · Score: 0

      I think the answer to spambots and the servers that run is not blockage. To trully solve the problem I think they should be hunted down and annilated. Like a hacker group called Spam Busters! Yeah, that would get the job done. Those bastards are the reasons I get phone calls all hours of the night and day.. automated pieces of shit. Aight, I'm done ranting for now.

  2. Elements of good design I'd missed by Dark+Paladin · · Score: 4, Informative

    Looking at my Day Job and personal web site, other than the very cool technical achievement of the trap (I'll have to see if I can rewrite this for my Checkpoint FW system), there were one things I learned about good design from this article:

    Eliminate mailto - makes sense. You should have an http based "send me a message system" - force a live person to type stuff in instead of letting a program pick out addresses.

    Eliminating mailto alone would probably help in mot of my spam problems (as I have my "contact me" address right on the first page).

    1. Re:Elements of good design I'd missed by hagardtroll · · Score: 5, Interesting

      I put my email address in a jpeg image. Haven't found a spambot yet that can decipher that.

    2. Re:Elements of good design I'd missed by DickPhallus · · Score: 1

      If that technique became common placed, I'm sure Optical character recognition software could be used... but for now you're safe.

      --

      --
      Some weasel took the cork out of my lunch.
    3. Re:Elements of good design I'd missed by Anonymous Coward · · Score: 1, Insightful
      I put my email address in a jpeg image. Haven't found a spambot yet that can decipher that.

      But neither could blind internet users...

    4. Re:Elements of good design I'd missed by Dark+Paladin · · Score: 2, Informative

      Good point - some sites (I think AOL did once) can get sued if you're a large enough business and don't make your site accessable to the blind. (Americans with Disabilities Act thing.)

    5. Re:Elements of good design I'd missed by carm$y$ · · Score: 3, Insightful

      Eliminating mailto alone would probably help in mot of my spam problems

      You're 100% right. And fighting against spambots by relying on UserAgent is akin to... well.... security thru obscurity, albeit somehow in reverse.

      What also looks strange is that he doesn't consider that one can get a link directly to a page on the n-th level: as human browsers don't usually download robots.txt either, sounds like he's gonna ban some poor guys who got a link from a friend...

      --
      -- No sig today
    6. Re:Elements of good design I'd missed by Anonymous Coward · · Score: 0

      Excellent Idea!!! Will be using that in the near future....

    7. Re:Elements of good design I'd missed by cholokoy · · Score: 1

      The put an audio file since I would think that blind people can still hear.

      -----
      Return the bells of Balangiga

      --
      Return the bells of Balangiga.
    8. Re:Elements of good design I'd missed by Anonymous Coward · · Score: 0

      What that is ridiculous. I'm sorry but if they don't want to support the blind why should they? Have you ever had to walk a blind person through anything over the phone it takes 2-3x the amount of time which costs $$$$ fuck em

    9. Re:Elements of good design I'd missed by blibbleblobble · · Score: 1

      I don't think blind people would be -that- interested in a skating club...

      Of course, that's just an assumption

    10. Re:Elements of good design I'd missed by jonbrewer · · Score: 2

      I've found a text file works as well. Spambots don't seem to bother loading "contact.txt".

    11. Re:Elements of good design I'd missed by British · · Score: 2

      AOL's personal ads(not that I visit that) do that already. They just use a GIF that looks just like it was a regular string of text. Very clever. I'm assuming there's a module out there that can do this easily on the fly?

    12. Re:Elements of good design I'd missed by Permission+Denied · · Score: 2, Informative

      I put my email address in a jpeg image. Haven't found a spambot yet that can decipher that.

      But neither could blind internet users...


      Add an alt tag that describes how to email you. Eg, "The first part of my email address is 'username' and the second part is 'host.com' - the two parts are separated by an '@' sign." I've been doing the jpeg thing for three years; works great.

    13. Re:Elements of good design I'd missed by dattaway · · Score: 2, Funny

      but...but...a blind AND deaf internet user couldn't read your webpage.

      I'm sure you don't want THAT kind of lawsuit.

    14. Re:Elements of good design I'd missed by Technician · · Score: 3, Informative

      I like the way geocaching.com handles the problem. To email a user, you have to click on a link containing the user profile. A link in the profile provides a contact user option which provides a form to fill out - if you are also a regisered user of the site. If you are not a user of the site, then you are prompted to log in or become a user. If you are a user and contacting another user, there is a checkbox when if checked will also send your real address to the user you are contacting so then with his permission, contact may be made via regular mail. This is useful for sending graphics and attachments. The best part is your address is not given out unless you specificaly permit it on a case by case basis. I love it.

      --
      The truth shall set you free!
    15. Re:Elements of good design I'd missed by Anonymous Coward · · Score: 0

      What? This has nothing to do with complying with the ADA- there's just the simple fact that if you rely on images, you are making your site inaccessible to a portion of the users of the Internet. Aside from being plain rude, it's also shortsighted to cut yourself off from possible contact with arbitrary groups of people.

    16. Re:Elements of good design I'd missed by Anonymous Coward · · Score: 0

      What if they are deaf and blind?

      Won't someone please think of the children!

    17. Re:Elements of good design I'd missed by nathanm · · Score: 2
      I'm assuming there's a module out there that can do this easily on the fly?
      Yes
    18. Re:Elements of good design I'd missed by BlueUnderwear · · Score: 2
      I don't think blind people would be -that- interested in a skating club...

      Dunno about skating, but blind people do ski. They are preceded by a guide who shouts them directions (or uses a wireless intercom, in order to not disturb the other skiers). Have seen such pairs several times at 2 Alpes. It must still be a helluva difficult, but they manage to do it anyways.

      --
      Say no to software patents.
    19. Re:Elements of good design I'd missed by Anonymous Coward · · Score: 0

      Dunno about skating, but blind people do ski. They are preceded by a guide who shouts them directions (or uses a wireless intercom, in order to not disturb the other skiers). Have seen such pairs several times at 2 Alpes. It must still be a helluva difficult, but they manage to do it anyways.

      If you're wondering what this is like, those of you in the sighted community can get roughly the same experience by sitting in a shopping cart and hitching yourself to a truck on the freeway in rush-hour traffic with y9our eyes closed.

    20. Re:Elements of good design I'd missed by prizog · · Score: 2

      Or a blind person.

    21. Re:Elements of good design I'd missed by soloport · · Score: 1

      What about a SlashBotted sight? Like the one featured on this article? (It's down at the moment).

    22. Re:Elements of good design I'd missed by LiENUS · · Score: 1

      odly enough, its called "fly" i cant find the url anymore tho.

    23. Re:Elements of good design I'd missed by evil_one · · Score: 2

      It's called Braille.
      A Freshmeat search turns up quite a bit of information about using it on posix OSs.
      Here's the Linux braille tty driver.

      --
      Desperation is a stinky cologne
    24. Re:Elements of good design I'd missed by evil_one · · Score: 1

      Damn right!

      --
      Desperation is a stinky cologne
    25. Re:Elements of good design I'd missed by Bender+Unit+22 · · Score: 1

      But neither could blind internet users...

      No they can't, and isn't that a shame. I too use that concept and while I agree it sucks, it clearly had a good effect on the rate of spam mails. Old, now unused, email adr. that I had on webpages, got up to 30 spam mails every day. It was almost as bad as a Hotmail account.

    26. Re:Elements of good design I'd missed by Guignol · · Score: 1

      And fighting against spambots by relying on UserAgent is akin to... well.... security thru obscurity, albeit somehow in reverse
      Yeah.. except it's not what's being done there.(the trap doesn't rely on it, that is)

      What also looks strange is that he doesn't consider that one can get a link directly to a page on the n-th level: as human browsers don't usually download robots.txt either, sounds like he's gonna ban some poor guys who got a link from a friend...
      That's not true...
      I suspect you only checked the first few lines of the article so that you could promptly have some insigthful coments about it (can't believe it actualy worked *congratualtions*) but didn't care about what was actualy done at all

      Join The (Hopefully) Great Slashdot Blackout! April 21-27
      Oh.. now I understand you are in fact focused on more interesting matters like the unthinkable lack of respect and recognition of the value of your insightful coments.. Of course.. well thank you for this peaceful week then

    27. Re:Elements of good design I'd missed by Anonymous Coward · · Score: 0

      Ugh, why JPEG? JPEG is horrible for text. Are you saying that spambots are intelligent enough to decipher GIF or PNG images?

    28. Re:Elements of good design I'd missed by Anonymous Coward · · Score: 0

      Just make the ALT tag for the image say "my_email AT blah blah DOT com". there, no valid email, but any speech recognition thing should say it out loud just fine.

    29. Re:Elements of good design I'd missed by Webmoth · · Score: 2

      I put my email address in a jpeg image. Haven't found a spambot yet that can decipher that.

      The flaw: OCR.

      Try ASCII art next time. And never use the @:

      \/\/ e |3 /\/\ () + |-| (a) \/\/ e |3 /\/\ () + |-| * ( 0 /\/\

      Warning: if you spam me, you WILL be blocked. We proactively block spammers at our mail server through either the use of ipchains rules or header parsing. Our ipchains are already blocking at least a million addresses in China (only 1,277,730,500 to go).

      --
      Give me my freedom, and I'll take care of my own security, thank you.
    30. Re:Elements of good design I'd missed by Anonymous Coward · · Score: 0

      Perhaps a very well trained seeing-eye-dog...

      Oh, shit... I'm gonna burn in hell.

    31. Re:Elements of good design I'd missed by packeteer · · Score: 1

      tell me... how often do deaf/blind people use the internet?... im serious here no troll or flame... and im not being a jerk either... im sure they would have someone with them helping them along so i think that is something you dont hafta worry about

      --
      unzip; strip; touch; finger; mount; fsck; more; yes; unmount; sleep
    32. Re:Elements of good design I'd missed by HD+Webdev · · Score: 0

      sounds like he's gonna ban some poor guys who got a link from a friend...

      Not to mention those of us who crontab mirror our web links over a network AND between work/home so that we are always using ONE book mark file.

      --
      This is not a dream, not a dream...we are transmitting from the year 1-9-9-9.
    33. Re:Elements of good design I'd missed by Anonymous Coward · · Score: 0


      Golly, I had to look really closely to read that your email address is webmoth@webmoth.com

      Good one! That seems like a really good way to stop spambots from harvesting your address like they would if it were in plaintext.

      ..... oops.

    34. Re:Elements of good design I'd missed by Eil · · Score: 2

      - If you "hide" the links to those pages and make it obvious enough to users, then the "friend" will not have gotten the link in the first place.

      - And if a normal user accidently gets banned, they can send an email to get unbanned.

      - And if they don't want to send an email, oh well, the occasional moron can't visit your site for a full 24 hours.

      - And there's no danger of a search engine finding those pages either, if they follow robots.txt

    35. Re:Elements of good design I'd missed by chuqui · · Score: 1

      > Eliminate mailto - makes sense. You
      > should have an http based "send me a
      > message system" - force a live person to
      > type stuff in instead of letting a program pick
      > out addresses

      and if your site breaks, how does someone send you e-mail to tell you all your CGIs are failing?

      --
      Chuq Von Rospach, Internet Gnome = When his IQ reaches 50, he should sell
    36. Re:Elements of good design I'd missed by Anonymous+DWord · · Score: 2

      Put in an alt="whatever@here.com" tag - it'll get read.

      --
      "If he thinks he can hide and run from the United States and our allies, he's sorely mistaken." Bush on bin Laden
    37. Re:Elements of good design I'd missed by Anonymous Coward · · Score: 0

      I stand corrected, thx.

    38. Re:Elements of good design I'd missed by Anonymous Coward · · Score: 0

      that was great.. thanks for the laugh

    39. Re:Elements of good design I'd missed by Anonymous Coward · · Score: 0

      Guide the blind skiers... Sounds like some kind of Dom Jolly concept.... What about deaf and blind skiers? - I dont think braille on the trees is gonna help!

      Anyway- If its that much of a problem, cant the site encode braile as some kind of image format- instead of ascii.. I would like to see spambots get that- and you can still sort out your deaf and blind surfers. Unfortunately I know little of Braille systems- but I can imagine they use straight ascii and a bunch of movable pins..
      Couldnt an upgrade of the concept allow for a much better experience anyway- like a tablet where braille, or just embossed images could be rendered to.

    40. Re:Elements of good design I'd missed by Firefly1 · · Score: 1

      Amusing, true; however, it is also incomprehensible to the average person, and - happening as it does to fall under the category of 'leetspeak' - irritating to many people regardless of whether or not they can read it.

      --
      - White Knight of the Order of Mihoshi Enthusiasts
  3. /.ed by Anonymous Coward · · Score: 2, Funny

    Looks like you should've written some code to handle an overload from slashdot too!

    1. Re:/.ed by HiQ · · Score: 3, Funny

      The dude fell in his own trap. :-D

  4. Slashbot by Ctrl-Z · · Score: 3, Funny


    "I have a truly marvelous demonstration of this proposition which this bandwidth is too narrow to transmit."

    --
    www.timcoleman.com is a total waste of your time. Never go there.
    1. Re:Slashbot by msquadrat · · Score: 1

      Hope it won't take another 400 years to solve this one...

  5. but can you ... by filtrs · · Score: 1

    You can stop a SpamBot, but can you stop a /.'ing?

    --
    My mother always used to tell me: If you can't find anything nice to say, say something bad about Windows.
  6. Okay... by zaren · · Score: 1

    Well, his idea of removing "mailto:"s is an obvious one...

    I dunno, most of this stuff sounds like common sense work for someone who's got a well-trafficed web site. The badhosts_loop looks like an interesting addition, though...

    On the surface, it almost looks like this system could be built up to act like a SPEWS for web servers.

    Aww, FSCK!

    --
    Come to the University of Mars! Classes starting soon!
    1. Re:Okay... by DNS-and-BIND · · Score: 2
      You cannot defeat that which you do not understand. I think that you really can't talk about spam prevention unless you have one-on-one familiarity with programs like Atomic Harvester. Spammers certainly do (many, many other programs can be found with a google search for email harvester). Without knowledge of who the developers of these programs are, what kind of work they do, their track record in other projects, etc, it's pretty pointless to talk about spam-blocking in an educated manner.

      Matter of fact, I think it'd be a good idea to have an open-source email harvester. . . it'd give the good guys an idea of what works and what doesn't, and of course the open-source version would be free, polite to webservers, and best of all would steal thousands of sales from the real bad guys, the fellows who write spambots. (ObPipeDream) With any luck one of them would steal the code and resell it, and the GPL could get a slam-dunk court test.

      --
      Shutting down free speech with violence isn't fighting fascism. It IS fascism!
    2. Re:Okay... by Anonymous Coward · · Score: 0

      An open source email harvester?

      You, sir, are a significantly undeveloped life form.

      Someone had to say it.

    3. Re:Okay... by Izanagi · · Score: 1

      This follows my idea, It needs an email harvester that will get .gov addresses. Then a server safe from the US(off-shore, say goats.cx, HEHE) government. Next, setup an addy that fowards all email to the .gov addresses. This way we can foward our SPAM to our lawmakers forcing them to pass a national ANTI-SPAM law. Making lawsuits easier for the average joe.

      The server and/or address would need to change often to make sure it is not being blocked. Maybe setup a mailing list or site to inform us of the weekly or daily changes.

      Would this work fellow /.ers? I know it can't stop SPAM but, canonly fight it.

      --
      SCO (noun.)- A Slimy Corporate Ogre. Often seeks free money.
  7. Re:Not fp, but still a wide page! by aozilla · · Score: 0, Offtopic

    Looks fine in Mozilla 0.9.9, too...

    --
    ok then your [sic] infringing on my copyright! Could you as [sic] me next time before STEALING my comments for your own?
  8. Block? Are you kidding? by Anonymous Coward · · Score: 5, Interesting

    Why on Earth would you like to block a spambot? So it doesn't get any more useful addresses?
    No way, man.
    If you realize you're serving to a bot, go on serving. Each time the bot follows the "next page" link, you /give/ it a next page. With a nicely formatted word1word2num1num2@word1word2.com, where words and nums are random.
    Give it thousands, millions of addresses this way.

  9. http-referrer by sofar · · Score: 2


    hmm, just a wild guess, but does this technique involve using the http-referrer to see if there are too many clients coming from just a particalar address (which would obviously be a *bad* thingy), and subsequently block them too?

    might explain why we can't see it no more :-(

    I want it too!!! it seems to work pretty good!

    1. Re:http-referrer by cheekymonkey_68 · · Score: 1

      Wouldn't it block search engine bots like the googlebot as well, if so bang goes all your hard work on SEO...

      For instance when you get "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

      turning up in your logs ?

    2. Re:http-referrer by DutchSter · · Score: 2, Interesting

      No. The point the author made was that good bots follow the 'robots.txt' standard. A versatile program like this can differentiate. If a robot comes in and plays by the rules on robots.txt, it's welcomed. OTOH, if one comes in and just starts grabbing at everything, it will quickly find itself blocked.

      I believe the exact quote in regards to why robots.txt should still be used is: "Most bad spambots don't even check the robots.txt file, so this is mainly for protection of the good bots."

      Another thing I find appealing is that on a large enough system the DB could be shared amongst several servers to provide common protection for all. I've always taken a don't put an address on the page approach, but it's cool to see someone looking at how these bots operate from a technical standpoint.

      Some ISPs (like mine) have policies against SPAM that stipulate that in addition to not actually spamming people, using their resources to prepare/collect addresses to SPAM is just as bad. The advantage the database gives you is that you can track the most recent offenders. A quick lookup to who owns the address, with hard evidence of one of their subscribers abusing both your system, and their policy will, if nothing else, cause the cost of spamming to rise. The reason SPAM is so popular is because it is VERY cheap to do. Once its costs approach those of 'traditional' marketing, things might get a bit more selective rather than sending my three year old '1-3 inches in 6 weeks!','Stop paying for cable', or 'Get out of debt now!' messages. Hardly directed.

      (Now I don't want anyone marketing to my three year old, but I know it will happen so I'd like to at least think they would be reasonable things, perhaps a bit relevant)

    3. Re:http-referrer by Anonymous Coward · · Score: 0

      No, he was talking about http-referer, not agent...

    4. Re:http-referrer by DutchSter · · Score: 1

      In my crunch for time, I forgot to add the following line to the top: "Referer and agent can both be forged by the client." The author said that he noticed that they were often coming into his site right in the middle, as if they have a list of all his pages. If you blindly type in a URL, or come from a bookmark, there will be no referer. I've seen this on my site before, you get a lot of legit people with no referer because they either have the URL written down somewhere, or they bookmarked a page. At the same time, I also get inexpliciable hits on the site that don't seem to be people because they don't follow any logical transition from subject. The problem is that requires human analysis.

      Robots.txt is useful for now until the spambots are smart enough to figure out that following it will tell them how to avoid the trap. It's a neverending game of catch up.

    5. Re:http-referrer by Robert+The+Coward · · Score: 1

      Yes but your robots.txt will get the spambots away from your email address as well. Thing like guest list should be in a robots.txt anyway so I don't do a search and comeup with 300 Guest books.

  10. How I track spammers using PHP by Elkman · · Score: 5, Interesting
    I did something rather low-tech: I created a "Contact Us" page on my web server that has an automatically-generated address at the bottom. It says, "Note: The address spamtest.1018617636@example.com is not a valid contact address. It's just here to catch spammers." The number is actually the current UNIX timestamp, so I know exactly who grabbed this mail address and sent me mail.

    As it turns out, I really haven't received that much mail to this address. About the only mail I've ever received to it is someone from trafficmagnet.net, who tells me that I'm not listed on a few search engines and that I can pay them to have my site listed. I need to send her a nasty reply saying that I don't care about being listed on Bob's Pay-Per-Click Search Engine, and that if she had actually read the page, she would have noticed that she was sending mail to an invalid address. Besides, the web server is for my inline skate club and we don't have a $10/month budget to pay for search engine placement.

    I think I've received more spam from my Usenet posting history, from my other web site, and from my WHOIS registrations than I've received from the skate club web site.

    1. Re:How I track spammers using PHP by chuqui · · Score: 1

      > trafficmagnet.net, who tells me that I'm not
      > listed on a few search engines and that I
      > can pay them to have my site listed. I need
      > to send her a nasty reply saying

      don't bother. It'll just generate more spam from them.

      I just took and blackholed those domains...

      --
      Chuq Von Rospach, Internet Gnome = When his IQ reaches 50, he should sell
  11. Hammered already.... by cswiii · · Score: 5, Funny

    From the website:
    The Problem: Spambots Ate My Website

    s/Spambots/Slashdot/

    1. Re:Hammered already.... by Technician · · Score: 2

      OK, who is the turkey hunting and clicking the 1 pixel graphic?

      --
      The truth shall set you free!
    2. Re:Hammered already.... by Requiem · · Score: 1

      s/Spambots/Mr. T
      s/Website/Balls

  12. mod_perl!!! I can hardly contain myself!! by cscx · · Score: 1, Flamebait

    Hold back the excitment, people, it's another episode of story recycling.

    This site is pretty handy, now that I'm on the topic. Also make sure to check out RobotCop. Out for Apache now, coming soon for IIS and Zeus!

  13. re: spidertrap by blibbleblobble · · Score: 4, Interesting

    My PHP spider-trap - See an infinity of email addresses and links in action!

  14. Re:problem with not giving an email address ... by wmoore · · Score: 2, Insightful


    The only problem with the idea of using entirely http based "send me a message systems" is that some people, like myself, would much rather have an actual email address to use instead of having to use 50 different layouts and 50 different configurations and 50 different methods of communicating with someone or a company. Every html based contact system has its own quirks and problems, I'd rather just need to learn my email programs issues instead.

  15. removing mailto: a bad solution by bluGill · · Score: 5, Interesting

    Removing mailto: links is a bad solution to the problem. It might be the only solution, but it is bad.

    I hate the editor in my web browser. No spell check (and a quick read of this message will prove who diasterious that is to me), not good editing ability, and other problems. By contrast my email client has an excellent editor, and a spell checker. Let me pull up a real mail client when I want to send email, please!

    In addition, I want people to contact me, and not everyone is computer literate. I hang out in antique iron groups, I expect people there to be up on the latest in hot tube ignition technology, not computer technology. To many of them computers are just a tool, and they don't have time to learn all the tricks to make it work, they just learn enough to make it do what they want, and then ignore the rest. Clicking on a mailto: link is easy and does the right thing. Opening up a mail client, and typing in some address is error prone at best.

    Removing mailto: links might be the only solution, but I hope not. So I make sure to regualrly use spamcop.

    1. Re:removing mailto: a bad solution by ichimunki · · Score: 1

      I like SpamAssassin myself. It's pretty accurate in tagging spam. I agree, obscuring your address gets to be a pain. And it's usually not going to keep your more common addresses from getting passed around at some point. I get most of my spam as a result of shopping online, eBay, or just plain having a registered domain name.

      As to web browsers, wouldn't it be a great plugin that could transform a text field into a mini-WP, complete with a limited function spell-checker, and minimal HTML-compliant formatting (for sites like Slashdot where it would be nice not to have to compose HTML to do things like bold or italics or blockquoting)? You know, you select the word you want bold, hit the bold tool in the toolbar, and on submit the correct tags are added? No more forgetting to close tags!

      --
      I do not have a signature
    2. Re:removing mailto: a bad solution by DrProton · · Score: 1

      No worries, mate. Just use the simple cgi email handler. It's very simple to install and it works great (requires unix, apache, and cgi script support).


      --
      "Mit der Dummheit kaempfen Goetter selbst vergebens." - Schiller
    3. Re:removing mailto: a bad solution by Anonymous Coward · · Score: 0

      You don't have to remove the links. Just use java script to generate them. Then the browsers see them, but crawlers (which don't execute javascript) don't. Yeah, some people disable javascript, but for them you can use images instead.

      For an example, look at the HTML source at http://abuse2.com/members/Jeremy_Scott/

      Jonathan

    4. Re:removing mailto: a bad solution by Arandir · · Score: 0, Offtopic

      I hate the editor in my web browser. No spell check, not good editing ability, and other problems.

      Hmmm, my HTML editor is Emacs, and I don't have these problems at all.

      --
      A Government Is a Body of People, Usually Notably Ungoverned
    5. Re:removing mailto: a bad solution by Anonymous Coward · · Score: 0

      oh my... arn't you clever...

  16. Take this tool... by SkyLeach · · Score: 0, Troll

    And install it in Hawaii. Those Somoans even eat that sh*t at nice restaurants!

    Yuck

    Eww

    QUICHE!?

    Cookbook!?

    --
    My $0.02 will always be worth more than your â0.02, so :-p
  17. Re:Not fp, but still a wide page! by loply · · Score: 1

    Fine in Konqueror. Why, is it somehow broken in other browsers?

  18. Re:Block? Are you kidding? by cholokoy · · Score: 1

    At first glance this might be a good idea but this will be resource burden on your system.

    Not a good way to stop spammers.

    ------
    Return the bells of Balangiga

    --
    Return the bells of Balangiga.
  19. Re:Not fp, but still a wide page! by aozilla · · Score: 1, Troll

    Those other browsers must suck

    --
    ok then your [sic] infringing on my copyright! Could you as [sic] me next time before STEALING my comments for your own?
  20. Re:Block? Are you kidding? by f3lix · · Score: 5, Interesting

    This isn't such a good idea - for every random (non-existent) domain that you generate, a root DNS server will be queried when an email is sent to this address, which increases the load on the root servers, which is generally a bad thing. How about instead, returning pages with the email address abuse@domain-that-spambot-is-coming-from all over them...

  21. Similar to how the new ORBZ works? by Masem · · Score: 4, Interesting

    After the Battle Creek incident with ORBZ, the maintain changed the way it worked; instead of being pro-active on checking for open relays, he now has a 'honeypot' like system where a unique email address that isn't directly visible on the site but still may be harvested by a spam bot. Any server that sends email to that address is automatically added to The List. Mail server admins that believe that they should not be on this list can argue their case to remove their server.

    --
    "Pinky, you've left the lens cap of your mind on again." - P&TB
    "I can see my house from here!" - ST:
    1. Re:Similar to how the new ORBZ works? by Slash+Veteran · · Score: 1
      can argue their case to remove their server

      Actually, there's no arguing involved. Just submit your IP, and you're removed -- until the next time your mailer sends mail to the trap address.

    2. Re:Similar to how the new ORBZ works? by toupsie · · Score: 4, Interesting
      he now has a 'honeypot' like system where a unique email address that isn't directly visible on the site but still may be harvested by a spam bot. Any server that sends email to that address is automatically added to

      This is the same method I have been using for a while. I have an e-mail account called "cannedham" that I had posted on several web sites as a mailto: anchor on a 1x1 pixel graphic. Any e-mail sent to that address updates my Postfix header_checks file to protect the rest of my accounts. It works like a charm.

      --
      Strange women lying in ponds distributing swords is no basis for a system of government.
    3. Re:Similar to how the new ORBZ works? by doorbot.com · · Score: 1

      I have an e-mail account called "cannedham" that I had posted on several web sites as a mailto: anchor on a 1x1 pixel graphic. Any e-mail sent to that address updates my Postfix header_checks file to protect the rest of my accounts. It works like a charm.

      That sounds like a fantastic idea... care to post your methods for configuring postfix for that?

  22. Re:Huh? by loply · · Score: 1

    Whats wrong with MySQL? It does everything the website claims it does.

  23. Now, let's fake the other end. by iggly_iguana · · Score: 1

    This gives me an idea for a spam version of a roach motel (Spam gets in, but it never gets out).

    I wonder what it would take to create an open relay server that would fool spammers into using it.

    Ideas would be welcome. This could be just the revenge I've been looking for!!!

    Sig: "That's not a duck!"

    1. Re:Now, let's fake the other end. by Technician · · Score: 2

      I want a mail relay that refuses to process more than 10 mails from any single IP in a 24 hour period. It would be usable for home residential mail, but useless for bulk mail. As an added bennifit it would severly restrict the impact of the latest MS outlook exploit.

      --
      The truth shall set you free!
  24. Re:problem with not giving an email address ... by Luyseyal · · Score: 2

    But, if you send him the message once with your return address, he'll know you're for real and when he replies you can use your regular mailer.

    $0.02USD,
    -l

    --
    Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!
  25. Take a look in the mirror by Spackler · · Score: 5, Informative
  26. Re:Block? Are you kidding? by Anonymous Coward · · Score: 0

    I think the spamer will filter abuse@ ...

  27. A tip by anthony_dipierro · · Score: 5, Informative

    Here's a tip for those of you writing spambot traps... How about not blindly responding to the faked Return-Path address?

    Now that should be illegal. You people whine about your 10 spams a day, try 10,000 from 2000 different email addresses. Idiot postmasters should be caught and jailed.

    1. Re:A tip by RollingThunder · · Score: 2

      That's not usually the spambot trap, it's usually the MTA, when the spammer sends to an invalid address.

      Although, the MTA would be looking at the envelope sender if it's any good, but most of the time those are faked too.

    2. Re:A tip by anthony_dipierro · · Score: 2
      Count Microsoft in that list of crappy MTAs...

      Received: from cpimssmtpa20.msn.com ([65.31.179.139]) by cpimssmtpa41.msn.com with Microsoft SMTPSVC(5.0.2195.4905);
      Thu, 11 Apr 2002 15:10:19 -0700
      Message-Id: <5n5k5movjtde50vvgss.io387p65vg1v88f1@cpimssmtp a20.msn.com>
      From: [NOT MY ADDRESS]@usa.net
      Date: Thu, 11 Apr 2002 15:08:40 -0800
      Subject: PERSCRIPTIONS! CHEAP AND PRIVATE!!
      Content-Type: text/html;
      charset="iso-8859-1"
      Content-Transfer-Encoding: 8BIT
      X-Mailer: Mozilla 4.08 [en] (Win98; I)
      To: [NOT MY ADDRESS]@xmail.com
      CC: [NOT MY ADDRESS]@msn.com
      Return-Path: [MY ADDRESS!]
      X-OriginalArrivalTime: 11 Apr 2002 22:10:20.0586 (UTC) FILETIME=[AB7A58A0:01C1E1A5]

      Freaking mke-65-31-179-139.wi.rr.com is sending the mail (I hid the rest of the addresses, which are most likely innocent), and I get the damn (well over 10,000) bounces. That's what I get for being publically against spam laws on slashdot, I guess. I wonder how hard it would be to subpeona the name and address of the original sender...

  28. he suggests formmail, another spam tool by nwc10 · · Score: 5, Informative
    Interestingly within the article he suggests hiding your e-mail addresses by making a feedback page. One of the programs that he suggests is formmail, and he links to Matt's original version.

    formmail itself (even the most recent version) can still be abused by spammers to use your webserver as a bulk mail relay - see the advisory at
    http://www.monkeys.com/anti-spam/formmail-adviso ry . df

    It's a shame he didn't suggest the more robust formmail replacement at nms which is maintained, and attempts to close all the known bugs and insecurities.

    1. Re:he suggests formmail, another spam tool by nwc10 · · Score: 1

      I believe he has now updated the page to link to the nms version, but as the server is now thoroughly slashdotted, I'm unable to reload the page to see what has been changed.

    2. Re:he suggests formmail, another spam tool by KjetilK · · Score: 2
      Yeah, I had a funny incident where my address was put in the From:-field of a pr0n-spam sent using a formmail exploit. I quickly made an autoresponder to the people who complained to me, but it turned out to be just a handful of people (who I then took the opportunity of educating about many things).

      But there are later versions of formmail that are patched, aren't there?

      --
      Employee of Inrupt, Project Release Manager and Community Manager for Solid
    3. Re:he suggests formmail, another spam tool by Chagrin · · Score: 2

      Yes! It's becoming a popular target for spammers. If you have formmail in a common location (like mysite.com/cgi-bin/formmail.pl) it will be eventually scanned for and picked up.

      I've seen it happen to sites I administer a number of times in the past, where individuals apparently using some sort of AOL name harvesting tool were using the formmail.pl scripts to send mass messages. Looking at the User-Agent headers, it looks like there's a VB script out there designed specifically to automate this exploit.

      --

      I/O Error G-17: Aborting Installation

  29. Speed by egon · · Score: 1

    If he really wants to make the thing run faster, turn those varchars into regular chars. And index index index!

    --
    Give a man a match, you keep him warm for an evening.
    Light him on fire, he's warm for the rest of his life
    1. Re:Speed by Anonymous Coward · · Score: 0

      And if he'd wan't to do it really quickly he would use mod_idsa and idsad - http://jade.cs.uct.ac.za/

      Involves no perl and no busy loop.

  30. Re:Block? Are you kidding? by BlueUnderwear · · Score: 3, Insightful
    At first glance this might be a good idea but this will be resource burden on your system.

    Add a couple of sleep(20); into the cgi script that generates the bot fodder. The bot will still stay busy waiting for your webserver's response, but your script will exactly consume zero resources.

    For additional kicks, set up a DNS teergrube.

    --
    Say no to software patents.
  31. Suicidal by Captain+Large+Face · · Score: 0, Redundant

    Wow, this guy slashdotted himself..

    Stopping Spambots: A Spambot Trap

    Using Linux, Apache, mod_perl, Perl, MySQL, ipchains and Embperl

    Copyright 2002 by Neil Gunton

    This document describes my experiences with spambots on my websites, and the techniques I have developed to stop them dead. I assume the reader has basic familiarity with Linux, Apache, mod_perl, Perl, MySQL and firewall rules using ipchains - each of these topics could fill a book, so I won't talk about installation or basic configuration. I will, however, provide full scripts and instructions on using these within the context of these tools. If you'd like some basic pointers on getting set up using these tools, then you could take a look at my short series of three Linux Network Howto articles.

    Contents

    • The Problem: Spambots Ate My Website
    • Overview of the Spambot Trap
    • Banishing 'mailto:'
    • MySQL
    • BlockAgent.pm
    • ipchains
    • badhosts_loop
    • spambot_trap/ Directory
    • robots.txt
    • Your HTML Files
      • Embperl
    • httpd.conf
    • Monitoring
    • Conclusions
      • Strengths
      • Weaknesses
      • Possible future enhancements

    The Problem: Spambots Ate My Website

    Spambot: (noun) - A software program that browses websites looking for email addresses, which it then "harvests" and collects into large lists. These lists are then either used directly for marketing purposes, or else sold, often in the form of CD-ROMs packed with millions of addresses. To add insult to injury, you may receive a spam email which is asking you to buy one of these lists yourself. Spambots (and spam) are a pestilence which needs to be stamped out wherever it is found.

    I have a website, http://www.crazyguyonabike.com, which has bicycle tour journals, message boards and guestbooks. I started noticing around the end of 2001 that the site was getting hit a lot by spambots. You can spot this sort of activity by looking for very rapid surfing, strange request patterns, and non-browser User-Agents.

    Another distinctive behavior was that the spambots would follow only those links which had certain keywords which would seem promising if you're looking for email addresses: "guestbook", "journal", "message", "post" and so on. On each of the pages in my site there were many other links in the navbars, but only links with these keywords were being followed. Also, robots.txt was never even being read, let alone followed. Moreover, the bot would come in, scan pages rapidly for maybe a few seconds, and then stop for a while. So it was obviously making at least some attempt to circumvent blocks based on frequency/quantity of requests.

    This was very annoying. For one thing, these things were picking off email addresses from my website (at that point, I was letting people who posted on my message boards decide for themselves whether they wanted their email addresses to be visible or not). But quite apart from that, it was taking up resources, and was just plain rude. I hate spam. I resent my webserver having to play host to people whose obvious goal is to cynically exploit the co-operative protocols of the internet to their own selfish, antisocial gain. So, I decided to do something about it.

    The first thing I did was to look at the User-Agent fields which were being used by the bots. There were a variety, including variations on the following:

    • DSurf15a 01
    • PSurf15a VA
    • SSurf15a 11
    • DBrowse 1.4b
    • PBrowse 1.4b
    • UJTBYFWGYA (and other strings of random capital letters)

    I searched the internet for references to these strings, but all I found was a slew of website statistics analysis logs. This meant that these particular spambots obviously got around. It was also discouraging, because there was no mention anywhere of what these things actually were. I was surprised that there seemed to be no discussion whatsoever of something that seemed to be pandemic. Then I found a couple of other websites with guestbooks that had actually been defiled by these spambots: (if you follow these links and you don't see a lot of empty messages left by the above user agents, then that means the webmaster of the site has finally found a way to stop it, so good for them...)

    • http://www.virtualglasgow.com/guestbook.html
    • http://www.donotenter.com/guestbook/gbook.html

    I reckon the spambots didn't really intend to leave empty messages. They just tend to want to follow links with the keyword 'post'. So if the guestbook posting form has no preview or confirmation page, then the spambot would leave a message simply by following this link! My guestbooks and message boards have a preview page, which is probably why I hadn't had any of this.

    Anyway, I started thinking about what kind of program this thing was. First of all, it comes from all kinds of different IP addresses. I couldn't quite believe that this many different IP addresses were all intentionally using the same software, of which I could find absolutely no mention anywhere on the Web. This made me think it might be some kind of virus/trojan/worm or whatever that silently installed itself on people's computers, and then used the CPU and bandwidth to surf the Web without the owner being aware of it. I thought that if this was the case, then it must be sending the results somewhere - and if we could find out where, then we could go about shutting the operation down. But I have had no luck at all in getting any help from the sysadmins at ISP's I have contacted. A typical exchange was the one with a guy at Cox internet, which was where a persistent offending IP address was sourced. He just couldn't be bothered, and eventually told me that spidering was not against the law, or their terms of service. I asked whether actions which were blatantly obviously geared toward the generation of spam were against their terms of use, but he never replied to that. I had no more luck anywhere else: Nobody had heard of this thing. I even sent an email to CERT, but no response. So, I turned instead to thinking about how I could erase these pests from my life as much as possible. This document is about my quest to stop spambots (not just this one, but ALL spambots) from abusing my website. Hopefully it will be useful to you.

    Overview of the Spambot Trap

    There are three main parts to the technique which I outline here:

    • Banish visible email addresses from your websites altogether, or else obfuscate them so they can't be harvested. Examples of how to do this are given. This is your fail-safe, in case the spambots figure out a way around your other defences. Even if they manage to cruise your website on their very best behavior, they still should not be able to harvest email addresses!
    • Block known spambots: Certain User-Agents are just known to be bad, so there's no reason to let them come on your site at all. True, spambots could in theory spoof the User-Agent, but the simple reality is that a lot of them don't. We use an enhanced version of the BlockAgent.pm module from the O'Reilly mod_perl book. This extension adds offending IP addresses to a MySQL (or other relational) database, which is picked up by the third part of our cunning system...
    • Set a Spambot Trap, which blocks hosts based on behavior. We set a trap for spambots, which normal users with browsers and well-behaved spiders should not fall into. If the bot falls in the trap, then its IP address is quickly blocked from all further connections to the webserver.
    • This works using a persistent, looping Perl script called badhosts_loop, which checks every few seconds for additions to a 'badhosts' database. This script then adds 'DENY' rules for each bad hosts to the ipchains firewall. Blocks have an expiry, which is initially set to one day. If a host falls in the trap again after the block expires, then that IP is blocked again - and the expiration time is doubled to 2 days. And so on. This algorithm ensures that the worst offenders get progressively more blocked, while one-time offenders don't stick around in our firewall rules eating up resources.

    There are various components to the Spambot Trap, including the badhosts_loop Perl script, the BlockAgent.pm module, ipchains config, MySQL database, httpd.conf, robots.txt, and your HTML files. These are all covered in the sections below.

    Banishing 'mailto:'

    The first and most urgent thing you need to do is to get email addresses off your website altogether. This means, unfortunately, banishing the venerable mailto: link. It's a real shame that perfectly good mechanisms should be removed because of abuse, but that's just the way the world is these days. You need to be defensive, and assume that the spammers will try to take advantage of your resources as much as possible.

    It's an arms race

    The important thing that you need to realize is that no matter what blocks we put in place, this game is an arms race. Eventually the spambot writers will develop smarter bots which circumvent our techniques. Therefore you want to have a failsafe, which will prevent email addresses from getting into the hands of the spambot even if all else fails. The only real way to do that is to completely remove all email address from your website.

    Contact forms

    You should replace the mailto: links with links to a special form where people can type their name, email address and message. A CGI can then deliver the email, and your email address never has to be disclosed. There are a number of different mailer scripts out there - just be careful to check for vulnerabilities which could allow malicious users to use the form to send email to third parties (i.e. spam, ironically enough) using your server. The formmail script is popular, but an earlier version had such a vulnerability (since fixed). The Embperl package has a simple MailFormTo command to send an email from a form.

    Since I have seen guestbooks out there which have been extensively defiled by spambots, I would add that you should have a preview screen on your contact forms. This will ensure that an email doesn't get fired off simply by a spambot following the 'post' or 'contact' link (which it will likely try to do).

    Alternatives to totally banishing mailto:

    There are alternatives to completely removing email addresses, but they all depend on the stupidity of the spambot, and so could be compromised by a new generation of pest. These include:

    • Write out email addresses in a non-email format, e.g. instead of writing 'username@domain.com' you would write 'username at domain dot com', or something similar. It would only take some spambot with a little more intelligence to be able to scan these patterns and pick up "likely" addresses, so this strategy is a little risky. Any consistent method you choose to write out email addresses could in theory be analyzed and decoded by a savvy bot.
    • Add stuff to the email address to make it invalid, but so that a human could easily know what to do to make it work. An example of this is writing 'username@_NO_SPAM_domain.com'. You need to remove the "_NO_SPAM_" part to make the email address valid. You can have some kind of explanation to make it clear what people have to do to use the address. Personally, I don't like this - you're depending on a level of sophistication on the part of your users which is risky. In my experience, there are a lot of very 'novice' level users out there, who only know how to click on a link. They don't know how to edit an email address. Heck, I've had people come to my site by typing the URL into Google, rather than the 'Location' box of their browser. Also, people don't read instructions.
    • Make graphics images which contain the email address. Spambots usually don't download graphics, and even if they did, they probably couldn't decode the bits to get the text. However, they could do it in theory, since software for doing OCR (optical character recognition, getting text from scanned documents) has been around for a while. A downside to this approach is that the user has to manually copy down the email address, since it can't be cut'n'pasted. Also, you can't put a mailto: link on the image, otherwise you're back to square one. But you could put a link to a contact form, with an argument in the link telling your server internally what email address to use. For example, the link could say "contact.cgi?to=23", where '23' is some database key to the actual email address. But the downside here is that you still need to generate the image, which is a bit of a pain in the ass if you have a lot of them. You can do it automatically, if you're willing to put the work in and write the scripts. There are some very nice graphics generation packages out there on CPAN for Perl. Here's an example of an email address presented as an image:

    MySQL

    Download badhosts MySQL database dump

    We need to set up a MySQL database, where we store records of the hosts which are to be blocked. This doesn't have to be MySQL, but I use it because it's extremely fast, and very appropriate for this kind of application. You need to create a new database, called 'badhosts'. You then create a table, again called 'badhosts', with the following structure:

    Field
    Type
    Comment

    ip_address
    varchar(20) not null, indexed
    The IP address of the host to be blocked

    user_agent
    varchar(255) not null
    The HTTP User-Agent of the spambot, for reference

    expire_days
    int unsigned not null
    How many days is this block for. Doubled every time a new block has to be created for a particular IP address

    created
    datetime not null
    When this block was created

    expiry
    datetime not null, indexed
    When this block expires

    You could use the dump provided above to load directly into your database:

    shell> mysqladmin create badhosts
    shell> mysql badhosts < badhosts.dump

    That's about it! The fields which are marked as 'indexed' are the only ones which need indexes, because they are searched on to see if a particular IP address has been previously blocked, and also to see which blocks should be removed because they've expired. If you have access privilages set on your MySQL databases, then you need to allow the Apache user (usually 'nobody') access. The other script that will require access is badhosts_loop, which runs as root.

    Next, we look at the script that populates this database.

    BlockAgent.pm

    Download BlockAgent.pm

    Download bad_agents.txt

    The BlockAgent.pm Apache/mod_perl module is taken from the excellent book "Writing Apache Modules with Perl and C" by Lincoln Stein & Doug MacEachern (O'Reilly). This script basically acts as an Apache authentication module which checks the HTTP User-Agent header against a list of known bad agents. If there's a match, then a 403 'Forbidden' code is returned. The script compiles and caches a list of subroutines for doing the matches, and automatically detects when the 'bad_agents.txt' file has changed. I have found that it has no noticeable impact on the performance of the webserver. This script is useful in the case where you know for certain that a certain User-Agent is bad; there's no point in letting it go anywhere on your site, so it's a good first line of defense. We'll cover how to add this module to your website a little later, along with the rest of the configuration settings in the section on httpd.conf.

    Of course, one of the first arguments you'll see with regard to this method of blocking spambots is that it's easy to circumvent, by simply passing in a User-Agent string which is identical to the major browsers out there. This is perfectly true, but don't ask me why the spambot writers haven't done this - maybe it's a question of pride or ego, they want to see their baby out there on record in Web server logs. I honestly don't know. The main point is that at present, the User-Agent header CAN be used very effectively to block most bad agents. But, I have added more features so that we can also block agents which look ok, but behave badly by going somewhere they shouldn't - the Spambot Trap. More on that soon.

    You'll notice that the bad_agents.txt file which I have supplied here is very comprehensive. A good strategy here is probably to save the full version somewhere (perhaps as bad_agents.txt.all), and just keep the ones you actually encounter in the bad_agents.txt file. Then you keep the list shorter, and more relevant to what actually hits you. For example, my bad_agents.txt file currently has the following lines in it, because these are the spambots that I see most frequently:

    • [A-Z]+$
    • .Browse\s
    • .Eval
    • EO Browse
    • .Surf
    • Microsoft.URL
    • ^Mozilla\/3.0.+Indy Library
    • Zeus.*Webster

    You'll notice from this that BlockAgents.pm is very flexible, being able to take full advantage of the excellent regular expression capabilities of Perl. This means you can capture a lot of different agents with just one line. For example, the very first line catches all the variations of the agent which passes in random strings of capital letters, e.g. FHASFJDDJKHG or UYTWHJVJ. The spambot obviously thinks it's being pretty smart by looking different each time, but by using an easily identifiable pattern, it shoots itself in the foot. Hah.

    The original version of the BlockAgent.pm script is well explained in the O'Reilly book, but I've added an extra hook that checks to see whether the client is accessing any of the spambot trap directories. If it is, then we add an entry to the MySQL database (you could use another relational database if you want, as long as it's accessible from Perl DBI).

    The first time an IP address is blocked, an expiry of one day is set. If the same host subsequently comes in and falls into the trap again, then the expiry time is doubled. And so on. This way, the block gets longer and longer, in proportion to how persistently the spambot revisits our website. Once the IP address is blocked, the spambot can't even connect to our web server, since we use 'Deny' in the ipchains rule. This means that no acknowledgement is given to any packets coming in from the badhost, and as far as they know, our server has just gone away. Hopefully, after this happens for long enough, our server will be taken off the spambot's "visit" list. Another nice little side-effect of this is that the spambot will probably have to wait for a while before giving up each connection attempt. Anything that makes them waste more time is ok by me!

    BlockAgent.pm notifies the badhosts_loop script that something has happened by touching a file called /tmp/badhosts.new. The badhosts_loop file checks this file every few seconds and if it has changed then it knows that a new record's been added to the database, and it needs to re-generate the blocks list.

    The BlockAgent.pm script is our alarm system. It's what tells us that something happened. In order to act on this information, we need to be able to add rules to the ipchains firewall. We'll cover this next.

    ipchains

    Download sample ipchains config file

    The ipchains module (here's the HOWTO doc) is a very nice way of providing a good level of basic network security to your server. If you haven't already set it up (or it's successor, iptables), then you really should. It's a very easy way to configure who can and cannot have access to your machine. A good resource for learning about this is "Building Linux and OpenBSD Firewalls", by Wes Sonnenreich and Tom Yates (Wiley). This is where I learned about ipchains, and it's on their excellent explanations and examples that I based my own config file. Another is "Linux Firewalls" by Ziegler (New Riders), which seems to have a more recent 2nd edition that covers iptables too.

    The example ipchains config file given here is complete, but the bit which is most important to us is that we create a chain called 'blocks'. This is our own custom chain, which we can then add rules to. The badhosts_loop script will flush this chain and build it back up whenever a spambot falls in your trap. Once the spambot's IP address is on the blocks list, that host cannot connect to your server at all.

    Remember to restart ipchains after you've changed the config file. Next, we'll look at the script that actually adds the firewall rules. badhosts_loop

    Download badhosts_loop script

    You run this script in the background, as root. It has to be run as root, because only root has the ability to add rules to the firewall. The script spends most of its time sleeping. It wakes up every five seconds or so and does a quick check on /tmp/badhosts.new. If this file has been changed since the last time it looked, then it goes and re-generates the firewall blocks list with all the current (non-expired) blocks. If nothing else happens, then the script will automatically do this at least once a day, to ensure that blocks really do expire even if there is no new activity.

    You should probably add the following line to your /etc/rc.local file (or equivalent), so that the script is automatically started up on reboot:

    /path/to/badhosts_loop --loop &

    This will start the script looping in the background. The script automatically checks to see if it is already running, by attempting to lock /var/lock/badhosts_loop.lock. If the file is already locked then the script will exit with an error message. If you want to just run the script once, without looping, then just omit the '--loop' option. This can be useful for testing.

    Logging is done to /var/log/badhosts_loop.log by default. Every time the script generates the blocks list, it writes a list of all the blocks to the log. This is a good place to monitor if you're interested in what hosts are being blocked. Here's an example of the log output:

    EDITOR: SNIPPED

    Thu Apr 11 16:09:07 2002: Flushing blocks chain: Generating blocks list:

    Adding 63.148.99.247 (1) 2002-04-11 11:16:11 to 2002-04-12 11:16:11 Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)

    The log shows the IP address which is being added, then (in brackets) the number of days the block is effective for (doubling each time), then the start and end dates of this block, and finally the name of the User-Agent which committed the crime. This can be useful for quickly seeing whether you need to add a new one to the bad_agents.txt file.

    This is a pretty stable script that should just sit there and chug quietly, not taking up much in the way of resources. Checking for a file being changed every five seconds is not a big deal in Unix, so you shouldn't even notice it.

    Now you have to create the trap itself - the spambot_trap directory.

    spambot_trap/ Directory

    Download gzipped tarball of sample spambot_trap directory

    View the sample directory

    You can create this directory anywhere on your server. We will create an alias the httpd.conf to access it. I put mine in /www/spambot_trap/. The point is, this doesn't have to be a real directory under your webserver directory root. If you use the directive, then multiple websites can access the same spambot_trap directory, potentially through different aliases. You can use the sample tarball as a starting point, it has subdirectories and links which the spambots I have seen find irresistable. You should create your own image file for the unblock_email.gif file, to have a valid email address of your own.

    The spambot_trap and spambot_trap/guestbook/ directories are not used directly to spring the trap. This is because I wanted to have a warning level, a lead-in, where real users would be able to realize they are getting into dangerous waters and could then back out. You're going to be placing hard-to-click links on your web pages which lead into the real trap, and there's always a chance that a real user will accidentally click on one of these. So, some of the links will point into the warning level. I have made a GIF image which contains a warning text. Why an image? Mainly because spambots can't understand images, and I didn't want to give big clues like "WARNING!!! DO NOT ENTER" in plain text. So, the user sees the warning, the spambots don't. If the spambot proceeds into any of the subdirectories (email, contact, post, message), then the trap is sprung and the host is blocked.

    You also need to try to stop good spiders (e.g. google) from falling into the spambot trap and being blocked. To do this, we utilize the robots.txt file.

    robots.txt

    Download sample robots.txt

    This should allow good robots (such as google) to surf your site without falling into the spambot trap. Most bad spambots don't even check the robots.txt file, so this is mainly for protection of the good bots.

    You'll see that we list a bunch of directories under '/squirrel'. This could be anything; you'll set an alias later in httpd.conf. In fact, you may even want this to be dynamically generated (see later, under Embperl), so that you can quickly change the name of the spambot trap directory if the spambots adapt and start avoiding it. At present, a static setup should work just fine, however.

    Next, we need to look at the bait - links within your HTML files which lead the spambot into the trap.

    Your HTML Files

    Download sample HTML code

    Download sample transparent 1 pixel image for hiding the trap

    Here's an example of HTML with links into the spambot trap:

    <HTML>
    <BODY BGCOLOR="beige">
    <A HREF="/squirrel/guestbook/message/"></A> <A HREF="/squirrel/guestbook/post/"><IMG SRC="/guestbook.gif" WIDTH=1 HEIGHT=1 BORDER=0></A>
    Body of the page here
    <TABLE WIDTH=100%> <TR>
    <TD ALIGN=RIGHT> <A HREF="/squirrel/guestbook/"> <SMALL><FONT COLOR="beige">guestbook</FONT></SMALL& gt; </A></TD>
    </TR>
    </TABLE>
    </BODY>
    </HTML>

    Spambots tend to be stupid. You'd think they would check for empty links (which don't show up in a real browser), but they don't seem to. Sure, they may get smarter, but meantime you might as well pick the low hanging fruit. So, the very first thing in the body of your HTML should be an empty link which goes straight into the trap proper - not the warning level, but the actual trap itself. This is because there is no way for someone using a real browser to click on this link, and good spiders will ignore it anyway because it's in the robots.txt file.

    We also use a one pixel big transparent GIF (a favorite web bug technique) to anchor a link to the trap, just in case the spambot is smart enough to avoid empty links. If we put this as the very first thing in the body, then it'll be pretty hard for a real user to click on, since it's only one pixel in size. But a spambot will quite happily go there!

    Finally, there is an example of a non-graphic, text based link. This will be placed on the right side of the screen by the table, and the text will appear in the same color as the background (in this example, beige). The link does not go straight into the trap, but into the warning level, because with this one there is a bigger chance that real people could click on it accidentally. The link may be invisible, but it's still there, and someone could find it. So, they get to see a nice warning, and they should back off from there. But the spambot won't. By the way, we have the link going to /squirrel/guestbook/ rather than just /squirrel/ because some of the spambots seem to specifically follow links with certain keywords, e.g. 'guestbook', 'message', 'post', etc.

    You can sprinkle these links all around your HTML files. I put them in every single one, since I use Embperl templates which make that sort of thing very easy.

    Embperl

    Download sample dynamic robots.txt using Embperl

    Download sample dynamic HTML code using Embperl

    The point of this is to make it easier to change the spambot trap directory without having to edit a whole bunch of files. We pass an environment variable to Perl from httpd.conf (see below), which says what the trap directory is called. We then use this in Embperl to substitute into the HTML and robots.txt files at request time. Thus if we wanted to change the name of the trap from 'squirrel' to 'badger', then we only need to change httpd.conf, restart apache, and we're done. All the links in the HTML are dynamic, as is robots.txt (see the samples above).

    Now, we bring it all together in the Apache configuration file.

    httpd.conf

    Download sample httpd.conf directives

    Download sample startup.pl script (used in httpd.conf)

    You need to have mod_perl installed before you can use BlockAgent.pm. You should take a look at the sample given above, and integrate these directives into your own virtual hosts. The most important lines are:

    Alias /squirrel /www/spambot_trap
    PerlSetEnv SPAMBOT_TRAP_DIR squirrel

    You should set the 'squirrel' name to whatever you'd like for your website; you'll then access the trap using a URL something like http://www.yourdomain.com/squirrel/guestbook/messa ge. This will spring the trap. You also need to set up the BlockAgent.pm access handler:


    PerlAccessHandler Apache::BlockAgent
    PerlSetVar BlockAgentFile /www/conf/bad_agents.txt

    This ensures that all accesses to your website will go through BlockAgent.pm first. You should choose your own location for the bad_agents.txt file.

    Finally, you might want to install Embperl so that you can embed Perl into your HTML code (always executed on the server side, never seen on the client side):

    # Set EmbPerl handler for main directory

    # Handle HTML files with Embperl

    SetHandler perl-script
    PerlHandler HTML::Embperl
    Options ExecCGI

    # Handle robots.txt with Embperl

    SetHandler perl-script
    PerlHandler HTML::Embperl
    Options ExecCGI

    That about does it. You should now have the setup which will allow you to block spambots. You'll probably be interested in monitoring what happens...

    Monitoring

    Download sample script for monitoring web server logs

    This simple script just tails the badhosts_loop log. You'll have fun (I do) seeing what comes on your site and promptly falls into the trap, and then SPLAT. No more spambot. Heh heh heh.

    Conclusions

    This setup works pretty well for me at the moment. I've no doubt there are flaws in my design, but it seems stable and is "good enough" for the time being. If you can see any improvements then I'd love to hear about them. To finish up, here's a summary of the strengths and potential weaknesses of the Spambot Trap system.

    Strengths

    • Does not rely exclusively on the HTTP User-Agent header, but at the same time allows us to block agents which we know to be bad.
    • Does not rely on the spambot abusing the robots.txt file. Many spambots don't even load it. But the robots.txt file will protect "good" robots from falling into the spambot trap. So, for example, googlebot will be just fine.
    • The blocks happen based on behavior, rather than trusting anything the spambot tells us about itself (e.g. User-Agent). Thus we don't rely on any prior knowledge of the spambots in order to block them; an entirely new one that we've never seen before will still fall in the trap and be duly blocked.
    • Once a spambot is blocked, then it cannot connect to your server again at all for the duration of the block. If it tries to connect, it won't even get a 'connection refused' error, because the firewall rule just quietly drops all the packets from the bad hosts. The ipchains firewall is very effective, and more efficient at blocking hosts than anything you could put together with Apache. So, you save on server resources. If you're wondering whether the block lists might get large, I have found that with the constant expiring of one day blocks, the active block list has never been more than about 20 IP addresses at a time, out of a list (so far) of 100 distinct hosts.
    • The blocks initially expire after one day. This means that one-off offenders are quickly removed from the firewall rules. On the other hand, repeat offenders get progressively longer and longer blocks (doubled each time). This means that the more abusive a host is, the more it will be blocked. It also means that if a bot is coming in from multiple IP addresses (through a proxy), then each of the individual IP addresses will probably not go on to be blocked for too long. Thus you won't be blocking everyone in AOL. On the other hand, if you continue to get hit from the same network, then it's obviously a source of trouble and should be blocked. If it's a major network like AOL, which you really don't want to block, then you need to take the IP addresses and times of the abuse, and send it to the sysadmin at the ISP concerned. There's really not a lot else you can do. I haven't seen this in reality, though. In my experience, the spambots come in from all sorts of different IP addresses, and the ones that are very persistent over time are mostly static IPs from DSL and small ranges of IPs from cable modems. These are the people with the always-on, high bandwidth capabilities which are needed for large scale email harvesting.
    • The system uses a relational database to manage the blocks, and so it is very scalable, and potentially you could share the database between multiple servers. If any one server gets a spambot, the the offending IP address can automatically also be blocked at all the other servers. Also, the fact that we don't delete expired blocks means that we can keep track of the history of the blocks, and perhaps perform analyses which would lead to more permanent ipchains blocks of entire subnets, if desired.

    Weaknesses

    • It would be possible for the spambots to get wise, and start following the robots.txt file rules. Then the spambot could in theory surf your entire site (or at least the bits allowed by robots.txt) without falling into the trap. However this also means that you can control where the spambot goes, which is the whole point of robots.txt. If you want, you can allow google into one part of the site, but exclude all others. Still, you should remove all email addresses from your site as the fail-safe.
    • It's possible that a spambot could come in through a proxy such as AOL, which means you'll be blocking multiple AOL IP addresses. This is not very nice, and I'm not sure what the solution is at the moment. All I can say is that it hasn't happened yet, and the worst offenders on my site all have static IPs. They seem to come in from cable and DSL connections mostly.
    • I don't know how feasible this would be, but it may be possible to conduct a "denial of service" type attack on your webserver by making many requests to the spambot trap directory from different IP addresses. I think, however, that you actually need to have those IP addresses (rather than spoofing them) in order to set up a real TCP connection with the web server. I don't know how likely this is, but it comes more under the "attack" category than spambots. If someone tries this on your site, then it's definitely something that can be pursued with legal means. It's no longer just a petty annoyance, but rather a hostile action which must be chased down. Also, the motivation is totally different - the spammers don't want to do this kind of thing. They just want their email addresses. The DDOS attacks are notoriously difficult to track, but I think in the couple of years that have passed since the first ones brought down Amazon and Yahoo!, there has been some progress made. Anyhow, I just wanted to bring the idea into the light of day. If anyone has any clues about it then I'd be glad to know.

    Possible Future Enhancements

    • Spot large numbers of blocks occurring on a particular subnet, and automatically consolidate blocks into a single one which blocks the entire subnet (e.g. 128.123.31.0/24).
    • More interactive tools to allow removal of blocks
    • Analysis tools which can tell us something about patterns of abuse from particular networks.

    If you can think of any more potential problems (or unrecognised strengths!) then I'd be happy to hear about it. I'd also like to hear about any comments on this document.

  32. Removing the Mailto: may not be the best plan.. by liquidsin · · Score: 5, Interesting

    I've found that a lot of people just won't send email if there's not a link to facillitate it. I've become rather fond of using javascript to write the address to the page. Spambots read the source so they don't piece the address together but *most* browsers will still do it right. Just use something like:

    <script>document.write("<A CLASS=\"link\" HREF=\"mailto: " + "myname" + String.FromCharCode(64) + "mydomain"</script>

    Seems to work fine. Anyone know of any reason it shouldn't, or have any other way to keep down spam without totally removing the Mailto: ? I know this won't work with *every* browser, but it beats totally removing mail links. And I don't think spammers can get it without having a human actually look at the page...

    --
    do not read this line twice.
    1. Re:Removing the Mailto: may not be the best plan.. by SuperCal · · Score: 1

      Awsome... great Idea. It so simple that I'm kicking myself for not thinking of it myself. It will keep working untill everyone does it and spam bot writers figure it out. Ironicly, I'm changeing all my websites' mailto tags to the java format... sorry

      --
      Business News and Resources: www.usasource.net
    2. Re:Removing the Mailto: may not be the best plan.. by bero-rh · · Score: 2

      This also makes it invisible to anyone who disabled JavaScript, and anyone using a browser that doesn't do JavaScript (lynx, links, etc.)

      --
      This message is provided under the terms outlined at http://www.bero.org/terms.html
    3. Re:Removing the Mailto: may not be the best plan.. by Bob9113 · · Score: 1

      It seems likely to me that those who would disable javascript are also those who would be willing to hand type an email address from a graphic image that shows the email address. So you could have the javascript generating the mailto: link attached to a graphic showing your email address, and the lost contacts would be minimal.

    4. Re:Removing the Mailto: may not be the best plan.. by Cynikal · · Score: 1

      good plan! perhaps an equally good plan would be to embed the link into a flash button, as far as i know spambots cant decode flash, right?

      but i like that idea as well

    5. Re:Removing the Mailto: may not be the best plan.. by Hieronymus+Howard · · Score: 1

      Except that if they're using a text-based browser such as Lynx or Linx, they won't be able to see the images. You could embed the email address into the ALT tag, but then the spambots might find it.

      HH

    6. Re:Removing the Mailto: may not be the best plan.. by liquidsin · · Score: 4, Interesting

      hell, go one step further:

      <img src="myemailaddress.jpg" alt="me at domain dot com">

      that way people who use browsers that speak (ie. the blind) would still hear your address correctly, so long as spambots don't start to pick up on the spelling out of "at" and "dot".

      --
      do not read this line twice.
    7. Re:Removing the Mailto: may not be the best plan.. by Kiaser+Zohsay · · Score: 2

      In the description of the trap, the author has a warning page just in case a real user hits one of the bogus links. That page would also benfit from a handy javascript history.go(-1). You might consider an HTTP redirect header, but the bot might be smart enough to follow that.

      --
      I am not your blowing wind, I am the lightning.
    8. Re:Removing the Mailto: may not be the best plan.. by mamba-mamba · · Score: 1

      Please mod this up to track root parent (currently 4) since it is the best solution of the bunch, IMHO.

      MM
      --

      --
      By including this sig, the copyright holders of this work or collection unreservedly place it in the public domain.
    9. Re:Removing the Mailto: may not be the best plan.. by Anonymous Coward · · Score: 0

      Until the mail harvister adds this to its list of harvest rules. And since you just Open Sourced it. They already have... Oh well.

    10. Re:Removing the Mailto: may not be the best plan.. by e_n_d_o · · Score: 3, Interesting

      On my company's Web site we've had success with this technique. The addresses posted on the Web site have not received any significant amount of spam. I have yet to see a single spam message that hits all four of the addresses on our contact page at once, which I believe would be a likely indicator we've been hit by a spambot.

      We embed this JavaScript code on each page that needs mailtos:

      <script type="text/javascript" language="JavaScript1.3">
      // Anti e-mail address harvester script.
      function n_mail(n_user) {
      self.location = "mailto:" + n_user + "@" + "yourdomain" + "." + "com";
      }
      </script>

      And then make email address links of this form:

      <a href="javascript:n_mail('foo');">foo<!-- antispam -->@<!-- antispam -->yourdomain<!-- antispam -->.<!-- antispam -->com<!-- antispam --></a>

      Our addresses even show up correctly in lynx, but are "clickable" only in JavaScript-enabled browsers.

      Of course, it's probably only a matter of time before spambots can compensate for this code. A more secure approach would be to put email addresses "components" in borderless cells of tables, or as a previous poster suggested, in images.

    11. Re:Removing the Mailto: may not be the best plan.. by liquidsin · · Score: 2

      that's what I did for my company's site as well. I have a linked src file with variables (orders = "steve" or what have you) and then just use a document.write( eval(orders) + String.FromCharCode(64)... and so on as I stated before. I definitely like your idea for borderless tables though. Between javascript, borderless tables, images, and the alt property, we should be able to keep harvesting bots confused. I'm sure there's a way to use the summary property of table tags for more fun, but I'm too tired to figure it out right now (it's 4pm on a friday...)

      --
      do not read this line twice.
    12. Re:Removing the Mailto: may not be the best plan.. by Permission+Denied · · Score: 1
      Please mod this up to track root parent (currently 4) since it is the best solution of the bunch, IMHO.


      What about my suggestion, posted forty minutes before the above?


      Not whoring, just pointing out that moderators aren't doing their jobs correctly (when I mod, I read -1 threshold, flat, newest first). If you don't have the time to mod, turn off the option. Don't want to sound like I'm whining, but there's lots of good stuff floating around at 1 or 2 whereas a mod point toward a 4 does little good.

    13. Re:Removing the Mailto: may not be the best plan.. by Requiem · · Score: 1

      Spambots have been doing this for years, along with automatically removing "SPAM" and "removeme" and other common anti-spam phrases from e-mail addresses.

      It's always been a case of trying to out-smart spammers, and then trying again once the current popular method has been noticed and worked around.

    14. Re:Removing the Mailto: may not be the best plan.. by mamba-mamba · · Score: 1

      Hmmm. I'm assuming that most of your comment is mostly not directed at me personally, since I very seldom moderate, and when I do I set newest first, threaded, -1. Next time I'll try flat. I'm not that big of a slashdot person.

      And when I said "best solution of the bunch," I meant for this thread only. But since you probably can't read minds, you didn't know that ;-)

      Anyway, I certainly agree that there are problems with moderation. Sometimes I see sarcastic but topical and cogent comments at -1, and I see too many 5's and 4's. Comments at 2 or 3 very seldom deserve further upward moderation. A 5 should be profound or hilarious, or whatever.

      MM
      --

      --
      By including this sig, the copyright holders of this work or collection unreservedly place it in the public domain.
  33. mirror by loraksus · · Score: 0, Redundant

    looks like /. ate his website, not spambots :)

    The Problem: Spambots Ate My Website
    Spambot: (noun) - A software program that browses websites looking for email addresses, which it then "harvests" and collects into large lists. These lists are then either used directly for marketing purposes, or else sold, often in the form of CD-ROMs packed with millions of addresses. To add insult to injury, you may receive a spam email which is asking you to buy one of these lists yourself. Spambots (and spam) are a pestilence which needs to be stamped out wherever it is found.

    I have a website, http://www.crazyguyonabike.com, which has bicycle tour journals, message boards and guestbooks. I started noticing around the end of 2001 that the site was getting hit a lot by spambots. You can spot this sort of activity by looking for very rapid surfing, strange request patterns, and non-browser User-Agents.

    After looking at the server logs, I realized a couple of things: Firstly, the spambots came from many different IP addresses, so this precluded the simple option of adding the source IP to my firewall blocks list. Secondly, there seemed to be a common behavior between the bots - even if this was the first visit from a particular IP address (or even a particular network, so no chance of just being a different proxy) they would come straight into the middle of my website, at a specific page rather than the root. This means that the spambots obviously had some kind of database of pages, which had presumably been built up from previous visits, before I'd noticed the activity, and this database was being shared between a large number of different hosts, each of which was apparently running the same software.

    Another distinctive behavior was that the spambots would follow only those links which had certain keywords which would seem promising if you're looking for email addresses: "guestbook", "journal", "message", "post" and so on. On each of the pages in my site there were many other links in the navbars, but only links with these keywords were being followed. Also, robots.txt was never even being read, let alone followed. Moreover, the bot would come in, scan pages rapidly for maybe a few seconds, and then stop for a while. So it was obviously making at least some attempt to circumvent blocks based on frequency/quantity of requests.

    This was very annoying. For one thing, these things were picking off email addresses from my website (at that point, I was letting people who posted on my message boards decide for themselves whether they wanted their email addresses to be visible or not). But quite apart from that, it was taking up resources, and was just plain rude. I hate spam. I resent my webserver having to play host to people whose obvious goal is to cynically exploit the co-operative protocols of the internet to their own selfish, antisocial gain. So, I decided to do something about it.

    The first thing I did was to look at the User-Agent fields which were being used by the bots. There were a variety, including variations on the following:

    DSurf15a 01
    PSurf15a VA
    SSurf15a 11
    DBrowse 1.4b
    PBrowse 1.4b
    UJTBYFWGYA (and other strings of random capital letters)
    I searched the internet for references to these strings, but all I found was a slew of website statistics analysis logs. This meant that these particular spambots obviously got around. It was also discouraging, because there was no mention anywhere of what these things actually were. I was surprised that there seemed to be no discussion whatsoever of something that seemed to be pandemic. Then I found a couple of other websites with guestbooks that had actually been defiled by these spambots: (if you follow these links and you don't see a lot of empty messages left by the above user agents, then that means the webmaster of the site has finally found a way to stop it, so good for them...)

    http://www.virtualglasgow.com/guestbook.html
    http://www.donotenter.com/guestbook/gbook.html
    I reckon the spambots didn't really intend to leave empty messages. They just tend to want to follow links with the keyword 'post'. So if the guestbook posting form has no preview or confirmation page, then the spambot would leave a message simply by following this link! My guestbooks and message boards have a preview page, which is probably why I hadn't had any of this.

    Anyway, I started thinking about what kind of program this thing was. First of all, it comes from all kinds of different IP addresses. I couldn't quite believe that this many different IP addresses were all intentionally using the same software, of which I could find absolutely no mention anywhere on the Web. This made me think it might be some kind of virus/trojan/worm or whatever that silently installed itself on people's computers, and then used the CPU and bandwidth to surf the Web without the owner being aware of it. I thought that if this was the case, then it must be sending the results somewhere - and if we could find out where, then we could go about shutting the operation down. But I have had no luck at all in getting any help from the sysadmins at ISP's I have contacted. A typical exchange was the one with a guy at Cox internet, which was where a persistent offending IP address was sourced. He just couldn't be bothered, and eventually told me that spidering was not against the law, or their terms of service. I asked whether actions which were blatantly obviously geared toward the generation of spam were against their terms of use, but he never replied to that. I had no more luck anywhere else: Nobody had heard of this thing. I even sent an email to CERT, but no response. So, I turned instead to thinking about how I could erase these pests from my life as much as possible. This document is about my quest to stop spambots (not just this one, but ALL spambots) from abusing my website. Hopefully it will be useful to you.

    Overview of the Spambot Trap
    There are three main parts to the technique which I outline here:

    Banish visible email addresses from your websites altogether, or else obfuscate them so they can't be harvested. Examples of how to do this are given. This is your fail-safe, in case the spambots figure out a way around your other defences. Even if they manage to cruise your website on their very best behavior, they still should not be able to harvest email addresses!

    Block known spambots: Certain User-Agents are just known to be bad, so there's no reason to let them come on your site at all. True, spambots could in theory spoof the User-Agent, but the simple reality is that a lot of them don't. We use an enhanced version of the BlockAgent.pm module from the O'Reilly mod_perl book. This extension adds offending IP addresses to a MySQL (or other relational) database, which is picked up by the third part of our cunning system...

    Set a Spambot Trap, which blocks hosts based on behavior. We set a trap for spambots, which normal users with browsers and well-behaved spiders should not fall into. If the bot falls in the trap, then its IP address is quickly blocked from all further connections to the webserver.
    This works using a persistent, looping Perl script called badhosts_loop, which checks every few seconds for additions to a 'badhosts' database. This script then adds 'DENY' rules for each bad hosts to the ipchains firewall. Blocks have an expiry, which is initially set to one day. If a host falls in the trap again after the block expires, then that IP is blocked again - and the expiration time is doubled to 2 days. And so on. This algorithm ensures that the worst offenders get progressively more blocked, while one-time offenders don't stick around in our firewall rules eating up resources.

    There are various components to the Spambot Trap, including the badhosts_loop Perl script, the BlockAgent.pm module, ipchains config, MySQL database, httpd.conf, robots.txt, and your HTML files. These are all covered in the sections below.

    Banishing 'mailto:'
    The first and most urgent thing you need to do is to get email addresses off your website altogether. This means, unfortunately, banishing the venerable mailto: link. It's a real shame that perfectly good mechanisms should be removed because of abuse, but that's just the way the world is these days. You need to be defensive, and assume that the spammers will try to take advantage of your resources as much as possible.
    It's an arms race
    The important thing that you need to realize is that no matter what blocks we put in place, this game is an arms race. Eventually the spambot writers will develop smarter bots which circumvent our techniques. Therefore you want to have a failsafe, which will prevent email addresses from getting into the hands of the spambot even if all else fails. The only real way to do that is to completely remove all email address from your website.
    Contact forms
    You should replace the mailto: links with links to a special form where people can type their name, email address and message. A CGI can then deliver the email, and your email address never has to be disclosed. There are a number of different mailer scripts out there - just be careful to check for vulnerabilities which could allow malicious users to use the form to send email to third parties (i.e. spam, ironically enough) using your server. The formmail script is popular, but an earlier version had such a vulnerability (since fixed). The Embperl package has a simple MailFormTo command to send an email from a form.
    Since I have seen guestbooks out there which have been extensively defiled by spambots, I would add that you should have a preview screen on your contact forms. This will ensure that an email doesn't get fired off simply by a spambot following the 'post' or 'contact' link (which it will likely try to do).

    Alternatives to totally banishing mailto:
    There are alternatives to completely removing email addresses, but they all depend on the stupidity of the spambot, and so could be compromised by a new generation of pest. These include:

    Write out email addresses in a non-email format, e.g. instead of writing 'username@domain.com' you would write 'username at domain dot com', or something similar. It would only take some spambot with a little more intelligence to be able to scan these patterns and pick up "likely" addresses, so this strategy is a little risky. Any consistent method you choose to write out email addresses could in theory be analyzed and decoded by a savvy bot.

    Add stuff to the email address to make it invalid, but so that a human could easily know what to do to make it work. An example of this is writing 'username@_NO_SPAM_domain.com'. You need to remove the "_NO_SPAM_" part to make the email address valid. You can have some kind of explanation to make it clear what people have to do to use the address. Personally, I don't like this - you're depending on a level of sophistication on the part of your users which is risky. In my experience, there are a lot of very 'novice' level users out there, who only know how to click on a link. They don't know how to edit an email address. Heck, I've had people come to my site by typing the URL into Google, rather than the 'Location' box of their browser. Also, people don't read instructions.

    Make graphics images which contain the email address. Spambots usually don't download graphics, and even if they did, they probably couldn't decode the bits to get the text. However, they could do it in theory, since software for doing OCR (optical character recognition, getting text from scanned documents) has been around for a while. A downside to this approach is that the user has to manually copy down the email address, since it can't be cut'n'pasted. Also, you can't put a mailto: link on the image, otherwise you're back to square one. But you could put a link to a contact form, with an argument in the link telling your server internally what email address to use. For example, the link could say "contact.cgi?to=23", where '23' is some database key to the actual email address. But the downside here is that you still need to generate the image, which is a bit of a pain in the ass if you have a lot of them. You can do it automatically, if you're willing to put the work in and write the scripts. There are some very nice graphics generation packages out there on CPAN for Perl. Here's an example of an email address presented as an image:

    MySQL
    Download badhosts MySQL database dump
    We need to set up a MySQL database, where we store records of the hosts which are to be blocked. This doesn't have to be MySQL, but I use it because it's extremely fast, and very appropriate for this kind of application. You need to create a new database, called 'badhosts'. You then create a table, again called 'badhosts', with the following structure:

    Field Type Comment
    ip_address varchar(20) not null, indexed The IP address of the host to be blocked
    user_agent varchar(255) not null The HTTP User-Agent of the spambot, for reference
    expire_days int unsigned not null How many days is this block for. Doubled every time a new block has to be created for a particular IP address
    created datetime not null When this block was created
    expiry datetime not null, indexed When this block expires

    You could use the dump provided above to load directly into your database:

    shell> mysqladmin create badhosts
    shell> mysql badhosts < badhosts.dump

    That's about it! The fields which are marked as 'indexed' are the only ones which need indexes, because they are searched on to see if a particular IP address has been previously blocked, and also to see which blocks should be removed because they've expired. If you have access privilages set on your MySQL databases, then you need to allow the Apache user (usually 'nobody') access. The other script that will require access is badhosts_loop, which runs as root.
    Next, we look at the script that populates this database.

    BlockAgent.pm
    Download BlockAgent.pm
    Download bad_agents.txt
    The BlockAgent.pm Apache/mod_perl module is taken from the excellent book "Writing Apache Modules with Perl and C" by Lincoln Stein & Doug MacEachern (O'Reilly). This script basically acts as an Apache authentication module which checks the HTTP User-Agent header against a list of known bad agents. If there's a match, then a 403 'Forbidden' code is returned. The script compiles and caches a list of subroutines for doing the matches, and automatically detects when the 'bad_agents.txt' file has changed. I have found that it has no noticeable impact on the performance of the webserver. This script is useful in the case where you know for certain that a certain User-Agent is bad; there's no point in letting it go anywhere on your site, so it's a good first line of defense. We'll cover how to add this module to your website a little later, along with the rest of the configuration settings in the section on httpd.conf.
    Of course, one of the first arguments you'll see with regard to this method of blocking spambots is that it's easy to circumvent, by simply passing in a User-Agent string which is identical to the major browsers out there. This is perfectly true, but don't ask me why the spambot writers haven't done this - maybe it's a question of pride or ego, they want to see their baby out there on record in Web server logs. I honestly don't know. The main point is that at present, the User-Agent header CAN be used very effectively to block most bad agents. But, I have added more features so that we can also block agents which look ok, but behave badly by going somewhere they shouldn't - the Spambot Trap. More on that soon.

    You'll notice that the bad_agents.txt file which I have supplied here is very comprehensive. A good strategy here is probably to save the full version somewhere (perhaps as bad_agents.txt.all), and just keep the ones you actually encounter in the bad_agents.txt file. Then you keep the list shorter, and more relevant to what actually hits you. For example, my bad_agents.txt file currently has the following lines in it, because these are the spambots that I see most frequently:

    ^[A-Z]+$
    ^.Browse\s
    ^.Eval
    ^EO Browse
    ^.Surf
    ^Microsoft.URL
    ^Mozilla\/3.0.+Indy Library
    ^Zeus.*Webster

    You'll notice from this that BlockAgents.pm is very flexible, being able to take full advantage of the excellent regular expression capabilities of Perl. This means you can capture a lot of different agents with just one line. For example, the very first line catches all the variations of the agent which passes in random strings of capital letters, e.g. FHASFJDDJKHG or UYTWHJVJ. The spambot obviously thinks it's being pretty smart by looking different each time, but by using an easily identifiable pattern, it shoots itself in the foot. Hah.
    The original version of the BlockAgent.pm script is well explained in the O'Reilly book, but I've added an extra hook that checks to see whether the client is accessing any of the spambot trap directories. If it is, then we add an entry to the MySQL database (you could use another relational database if you want, as long as it's accessible from Perl DBI).

    The first time an IP address is blocked, an expiry of one day is set. If the same host subsequently comes in and falls into the trap again, then the expiry time is doubled. And so on. This way, the block gets longer and longer, in proportion to how persistently the spambot revisits our website. Once the IP address is blocked, the spambot can't even connect to our web server, since we use 'Deny' in the ipchains rule. This means that no acknowledgement is given to any packets coming in from the badhost, and as far as they know, our server has just gone away. Hopefully, after this happens for long enough, our server will be taken off the spambot's "visit" list. Another nice little side-effect of this is that the spambot will probably have to wait for a while before giving up each connection attempt. Anything that makes them waste more time is ok by me!

    BlockAgent.pm notifies the badhosts_loop script that something has happened by touching a file called /tmp/badhosts.new. The badhosts_loop file checks this file every few seconds and if it has changed then it knows that a new record's been added to the database, and it needs to re-generate the blocks list.

    The BlockAgent.pm script is our alarm system. It's what tells us that something happened. In order to act on this information, we need to be able to add rules to the ipchains firewall. We'll cover this next.

    ipchains
    Download sample ipchains config file
    The ipchains module (here's the HOWTO doc) is a very nice way of providing a good level of basic network security to your server. If you haven't already set it up (or it's successor, iptables), then you really should. It's a very easy way to configure who can and cannot have access to your machine. A good resource for learning about this is "Building Linux and OpenBSD Firewalls", by Wes Sonnenreich and Tom Yates (Wiley). This is where I learned about ipchains, and it's on their excellent explanations and examples that I based my own config file. Another is "Linux Firewalls" by Ziegler (New Riders), which seems to have a more recent 2nd edition that covers iptables too.
    The example ipchains config file given here is complete, but the bit which is most important to us is that we create a chain called 'blocks'. This is our own custom chain, which we can then add rules to. The badhosts_loop script will flush this chain and build it back up whenever a spambot falls in your trap. Once the spambot's IP address is on the blocks list, that host cannot connect to your server at all.

    Remember to restart ipchains after you've changed the config file. Next, we'll look at the script that actually adds the firewall rules.

    badhosts_loop
    Download badhosts_loop script
    You run this script in the background, as root. It has to be run as root, because only root has the ability to add rules to the firewall. The script spends most of its time sleeping. It wakes up every five seconds or so and does a quick check on /tmp/badhosts.new. If this file has been changed since the last time it looked, then it goes and re-generates the firewall blocks list with all the current (non-expired) blocks. If nothing else happens, then the script will automatically do this at least once a day, to ensure that blocks really do expire even if there is no new activity.
    You should probably add the following line to your /etc/rc.local file (or equivalent), so that the script is automatically started up on reboot:

    /path/to/badhosts_loop --loop &

    This will start the script looping in the background. The script automatically checks to see if it is already running, by attempting to lock /var/lock/badhosts_loop.lock. If the file is already locked then the script will exit with an error message. If you want to just run the script once, without looping, then just omit the '--loop' option. This can be useful for testing.
    Logging is done to /var/log/badhosts_loop.log by default. Every time the script generates the blocks list, it writes a list of all the blocks to the log. This is a good place to monitor if you're interested in what hosts are being blocked. Here's an example of the log output:

    Thu Apr 11 16:09:07 2002:
    Flushing blocks chain:
    Generating blocks list:
    Adding 68.5.99.89 (8) 2002-04-04 14:08:11 to 2002-04-12 14:08:11 DSurf15a 01
    Adding 24.234.28.85 (8) 2002-04-07 10:43:42 to 2002-04-15 10:43:42 DBrowse 1.4b

    The log shows the IP address which is being added, then (in brackets) the number of days the block is effective for (doubling each time), then the start and end dates of this block, and finally the name of the User-Agent which committed the crime. This can be useful for quickly seeing whether you need to add a new one to the bad_agents.txt file.
    This is a pretty stable script that should just sit there and chug quietly, not taking up much in the way of resources. Checking for a file being changed every five seconds is not a big deal in Unix, so you shouldn't even notice it.

    Now you have to create the trap itself - the spambot_trap directory.

    spambot_trap/ Directory
    Download gzipped tarball of sample spambot_trap directory
    View the sample directory
    You can create this directory anywhere on your server. We will create an alias the httpd.conf to access it. I put mine in /www/spambot_trap/. The point is, this doesn't have to be a real directory under your webserver directory root. If you use the <Alias> directive, then multiple websites can access the same spambot_trap directory, potentially through different aliases. You can use the sample tarball as a starting point, it has subdirectories and links which the spambots I have seen find irresistable. You should create your own image file for the unblock_email.gif file, to have a valid email address of your own.
    The spambot_trap and spambot_trap/guestbook/ directories are not used directly to spring the trap. This is because I wanted to have a warning level, a lead-in, where real users would be able to realize they are getting into dangerous waters and could then back out. You're going to be placing hard-to-click links on your web pages which lead into the real trap, and there's always a chance that a real user will accidentally click on one of these. So, some of the links will point into the warning level. I have made a GIF image which contains a warning text. Why an image? Mainly because spambots can't understand images, and I didn't want to give big clues like "WARNING!!! DO NOT ENTER" in plain text. So, the user sees the warning, the spambots don't. If the spambot proceeds into any of the subdirectories (email, contact, post, message), then the trap is sprung and the host is blocked.

    You also need to try to stop good spiders (e.g. google) from falling into the spambot trap and being blocked. To do this, we utilize the robots.txt file.

    robots.txt
    Download sample robots.txt
    This should allow good robots (such as google) to surf your site without falling into the spambot trap. Most bad spambots don't even check the robots.txt file, so this is mainly for protection of the good bots.
    You'll see that we list a bunch of directories under '/squirrel'. This could be anything; you'll set an alias later in httpd.conf. In fact, you may even want this to be dynamically generated (see later, under Embperl), so that you can quickly change the name of the spambot trap directory if the spambots adapt and start avoiding it. At present, a static setup should work just fine, however.

    Next, we need to look at the bait - links within your HTML files which lead the spambot into the trap.

    Your HTML Files
    Download sample HTML code
    Download sample transparent 1 pixel image for hiding the trap
    Here's an example of HTML with links into the spambot trap:
    <HTML>

    <BODY BGCOLOR="beige">
    <A HREF="/squirrel/guestbook/message/"></A>
    <A HREF="/squirrel/guestbook/post/"><IMG SRC="/guestbook.gif" WIDTH=1 HEIGHT=1 BORDER=0></A>

    Body of the page here

    <TABLE WIDTH=100%>
    <TR>
    <TD ALIGN=RIGHT>
    <A HREF="/squirrel/guestbook/">
    <SMALL><FONT COLOR="beige">guestbook</FONT></SMALL& gt;
    </A>
    </TD>
    </TR>
    </TABLE>

    </BODY>

    </HTML>

    Spambots tend to be stupid. You'd think they would check for empty links (which don't show up in a real browser), but they don't seem to. Sure, they may get smarter, but meantime you might as well pick the low hanging fruit. So, the very first thing in the body of your HTML should be an empty link which goes straight into the trap proper - not the warning level, but the actual trap itself. This is because there is no way for someone using a real browser to click on this link, and good spiders will ignore it anyway because it's in the robots.txt file.
    We also use a one pixel big transparent GIF (a favorite web bug technique) to anchor a link to the trap, just in case the spambot is smart enough to avoid empty links. If we put this as the very first thing in the body, then it'll be pretty hard for a real user to click on, since it's only one pixel in size. But a spambot will quite happily go there!

    Finally, there is an example of a non-graphic, text based link. This will be placed on the right side of the screen by the table, and the text will appear in the same color as the background (in this example, beige). The link does not go straight into the trap, but into the warning level, because with this one there is a bigger chance that real people could click on it accidentally. The link may be invisible, but it's still there, and someone could find it. So, they get to see a nice warning, and they should back off from there. But the spambot won't. By the way, we have the link going to /squirrel/guestbook/ rather than just /squirrel/ because some of the spambots seem to specifically follow links with certain keywords, e.g. 'guestbook', 'message', 'post', etc.

    You can sprinkle these links all around your HTML files. I put them in every single one, since I use Embperl templates which make that sort of thing very easy.

    Embperl
    Download sample dynamic robots.txt using Embperl
    Download sample dynamic HTML code using Embperl
    The point of this is to make it easier to change the spambot trap directory without having to edit a whole bunch of files. We pass an environment variable to Perl from httpd.conf (see below), which says what the trap directory is called. We then use this in Embperl to substitute into the HTML and robots.txt files at request time. Thus if we wanted to change the name of the trap from 'squirrel' to 'badger', then we only need to change httpd.conf, restart apache, and we're done. All the links in the HTML are dynamic, as is robots.txt (see the samples above).
    Now, we bring it all together in the Apache configuration file.

    httpd.conf
    Download sample httpd.conf directives
    Download sample startup.pl script (used in httpd.conf)
    You need to have mod_perl installed before you can use BlockAgent.pm. You should take a look at the sample given above, and integrate these directives into your own virtual hosts. The most important lines are:
    Alias /squirrel /www/spambot_trap
    PerlSetEnv SPAMBOT_TRAP_DIR squirrel

    You should set the 'squirrel' name to whatever you'd like for your website; you'll then access the trap using a URL something like http://www.yourdomain.com/squirrel/guestbook/messa ge. This will spring the trap. You also need to set up the BlockAgent.pm access handler:
    <Location />
    PerlAccessHandler Apache::BlockAgent
    PerlSetVar BlockAgentFile /www/conf/bad_agents.txt
    </Location>

    This ensures that all accesses to your website will go through BlockAgent.pm first. You should choose your own location for the bad_agents.txt file.
    Finally, you might want to install Embperl so that you can embed Perl into your HTML code (always executed on the server side, never seen on the client side):

    # Set EmbPerl handler for main directory
    <Directory "/www/vhosts/www.yourdomain.com/htdocs/">

    # Handle HTML files with Embperl
    <FilesMatch ".*\.html$">
    SetHandler perl-script
    PerlHandler HTML::Embperl
    Options ExecCGI
    </FilesMatch>

    # Handle robots.txt with Embperl
    <FilesMatch "^robots.txt$">
    SetHandler perl-script
    PerlHandler HTML::Embperl
    Options ExecCGI
    </FilesMatch>

    </Directory>

    That about does it. You should now have the setup which will allow you to block spambots. You'll probably be interested in monitoring what happens...
    Monitoring
    Download sample script for monitoring web server logs
    This simple script just tails the badhosts_loop log. You'll have fun (I do) seeing what comes on your site and promptly falls into the trap, and then SPLAT. No more spambot. Heh heh heh.
    Conclusions
    This setup works pretty well for me at the moment. I've no doubt there are flaws in my design, but it seems stable and is "good enough" for the time being. If you can see any improvements then I'd love to hear about them. To finish up, here's a summary of the strengths and potential weaknesses of the Spambot Trap system.
    Strengths
    Does not rely exclusively on the HTTP User-Agent header, but at the same time allows us to block agents which we know to be bad.

    Does not rely on the spambot abusing the robots.txt file. Many spambots don't even load it. But the robots.txt file will protect "good" robots from falling into the spambot trap. So, for example, googlebot will be just fine.

    The blocks happen based on behavior, rather than trusting anything the spambot tells us about itself (e.g. User-Agent). Thus we don't rely on any prior knowledge of the spambots in order to block them; an entirely new one that we've never seen before will still fall in the trap and be duly blocked.

    Once a spambot is blocked, then it cannot connect to your server again at all for the duration of the block. If it tries to connect, it won't even get a 'connection refused' error, because the firewall rule just quietly drops all the packets from the bad hosts. The ipchains firewall is very effective, and more efficient at blocking hosts than anything you could put together with Apache. So, you save on server resources. If you're wondering whether the block lists might get large, I have found that with the constant expiring of one day blocks, the active block list has never been more than about 20 IP addresses at a time, out of a list (so far) of 100 distinct hosts.

    The blocks initially expire after one day. This means that one-off offenders are quickly removed from the firewall rules. On the other hand, repeat offenders get progressively longer and longer blocks (doubled each time). This means that the more abusive a host is, the more it will be blocked. It also means that if a bot is coming in from multiple IP addresses (through a proxy), then each of the individual IP addresses will probably not go on to be blocked for too long. Thus you won't be blocking everyone in AOL. On the other hand, if you continue to get hit from the same network, then it's obviously a source of trouble and should be blocked. If it's a major network like AOL, which you really don't want to block, then you need to take the IP addresses and times of the abuse, and send it to the sysadmin at the ISP concerned. There's really not a lot else you can do. I haven't seen this in reality, though. In my experience, the spambots come in from all sorts of different IP addresses, and the ones that are very persistent over time are mostly static IPs from DSL and small ranges of IPs from cable modems. These are the people with the always-on, high bandwidth capabilities which are needed for large scale email harvesting.

    The system uses a relational database to manage the blocks, and so it is very scalable, and potentially you could share the database between multiple servers. If any one server gets a spambot, the the offending IP address can automatically also be blocked at all the other servers. Also, the fact that we don't delete expired blocks means that we can keep track of the history of the blocks, and perhaps perform analyses which would lead to more permanent ipchains blocks of entire subnets, if desired.
    Weaknesses
    It would be possible for the spambots to get wise, and start following the robots.txt file rules. Then the spambot could in theory surf your entire site (or at least the bits allowed by robots.txt) without falling into the trap. However this also means that you can control where the spambot goes, which is the whole point of robots.txt. If you want, you can allow google into one part of the site, but exclude all others. Still, you should remove all email addresses from your site as the fail-safe.

    It's possible that a spambot could come in through a proxy such as AOL, which means you'll be blocking multiple AOL IP addresses. This is not very nice, and I'm not sure what the solution is at the moment. All I can say is that it hasn't happened yet, and the worst offenders on my site all have static IPs. They seem to come in from cable and DSL connections mostly.

    I don't know how feasible this would be, but it may be possible to conduct a "denial of service" type attack on your webserver by making many requests to the spambot trap directory from different IP addresses. I think, however, that you actually need to have those IP addresses (rather than spoofing them) in order to set up a real TCP connection with the web server. I don't know how likely this is, but it comes more under the "attack" category than spambots. If someone tries this on your site, then it's definitely something that can be pursued with legal means. It's no longer just a petty annoyance, but rather a hostile action which must be chased down. Also, the motivation is totally different - the spammers don't want to do this kind of thing. They just want their email addresses. The DDOS attacks are notoriously difficult to track, but I think in the couple of years that have passed since the first ones brought down Amazon and Yahoo!, there has been some progress made. Anyhow, I just wanted to bring the idea into the light of day. If anyone has any clues about it then I'd be glad to know.

    Possible Future Enhancements
    Spot large numbers of blocks occurring on a particular subnet, and automatically consolidate blocks into a single one which blocks the entire subnet (e.g. 128.123.31.0/24).

    More interactive tools to allow removal of blocks

    Analysis tools which can tell us something about patterns of abuse from particular networks.

    If you can think of any more potential problems (or unrecognised strengths!) then I'd be happy to hear about it. I'd also like to hear about any comments on this document.

    --
    1q2w3e4r5t6y7u8i9o0pqawsedrftgthyjukilo;p'azsxdcfv gbhnjmk,l.;/
  34. Similar setup without SQL requirements by bero-rh · · Score: 4, Interesting

    My setup (catches some of the more commonly used spambots) uses mod_rewrite to send spammers to a trap.
    Setup details at http://www.bero.org/NoSpam/isp.php

    --
    This message is provided under the terms outlined at http://www.bero.org/terms.html
  35. Pollute their database by Steev · · Score: 1, Redundant

    I think a better idea was one that I heard a while back. This guy set up a script to constantly create new pages with randomly created garbage email addresses and links to new random pages with new random garbage email addresses, ad infinitum. Sure, you'll get a few more hits from the spambot, but it'll keep crawling your script-based heirarchy and keep polluting its database with email addresses that don't exist!

    1. Re:Pollute their database by nochops · · Score: 2, Insightful

      This helps, but not much...

      Think about it. With the scarcity of domain names lately, chances are that while the garbage email addresses may not be valid, more than a few domain names would be valid.

      So then the spammer fills his database with these non-existant addresses on existing domain names. He then sends his spam to these addresses, and their mail servers not only have to process the message to determine that it's an invalid address, but they also have to bounce the message back as undeliverable.

      IMO this is going to use twice the bandwidth, since you now have to consider the bandwidth used by all of those bounces.

      You could always use some non-existant domain names for the garbage email addresses, but the spammer could just as easily check a domain name's validity before sending spam to it, making it trivial to remove all of the trash from his database.

      Remember, the spammer couldn't care less about sending mail to bad addresses, as long as the good addresses are spammed as well. It's left to the poor sysadmin to clean up the mess.

      --
      "A terrorist is someone who has a bomb but doesn't have an air force." -William Blum
    2. Re:Pollute their database by Steev · · Score: 2

      Not if the TLD isn't .com, .net, or .org! There's almost NO chance that it's valid if the TLD is also random.

      Remember, the spammer couldn't care less about sending mail to bad addresses, as long as the good addresses are spammed as well.

      True, but the their address lists will depreciate in value because the authenticity of most of the addresses would be in doubt.

    3. Re:Pollute their database by Gorgonzola · · Score: 1

      You mean TLD's like as, au, ca, uk, nl, de, fr, sa, se, no, de, it, il, tw, jp, kr, in, ch, cn?

      Sure, there is almost no chance that they are valid.

      --
      -- Spelling and grammar errors tend to be a sign of erroneous thinking.
    4. Re:Pollute their database by Steev · · Score: 2

      Ok, just make sure the TLD is longer than say, 5 characters and you can be almost certain that randomly created ones don't exist.

      I am fully aware of the non-com/net/org TLDs...just look at *mine* :)

    5. Re:Pollute their database by Sir+Tristam · · Score: 2
      So then the spammer fills his database with these non-existant addresses on existing domain names. He then sends his spam to these addresses, and their mail servers not only have to process the message to determine that it's an invalid address, but they also have to bounce the message back as undeliverable.
      So? The right answer is to make sure that the domain names you are giving out will be valid. Just go to http://www.spamhaus.org/ and compile a list of valid domains of current active spammers. Randomly select from these when generating bogus email addresses. It doesn't get rid of the network bandwidth issue, but it does chew up the resources of the spammers' servers since they will have to process the incoming messages. Let the spammers DOS each other.

      Chris Beckenbach

  36. Another way to stop spambots by PanBanger · · Score: 3, Funny

    Have your page linked on slashdot! Page gets slashdotted, problem solved.

  37. Removing email addresses by Mr_Silver · · Score: 2
    I used a very nifty bit of javascript which masks your mailto address. Provided the person has javascript on (and lets face it, nearly everyone who doesn't read /. does) then it works well.

    You can generate the code for your own email address here or, if you want some source code, then you can find an implementation of it here.

    --
    Avantslash - View Slashdot cleanly on your mobile phone.
  38. Simple solution! by Balinares · · Score: 3

    1) Put a link such as: mailto:dedicatedaddress@wherever.com?Subject= [Question] About your site (or whatever)
    2) Trash any email sent to dedicatedaddress that doesn't have the [Question] tag in the subject.

    Hope this helps.

    --

    -- B.
    This sig does in fact not have the property it claims not to have.
    1. Re:Simple solution! by c=sixty4 · · Score: 3, Insightful
      1. Put a link such as: mailto:dedicatedaddress@wherever.com?Subject= [Question] About your site (or whatever)
      2. Trash any email sent to dedicatedaddress that doesn't have the [Question] tag in the subject.
      Congratulations. You just ensured you can't be emailed by anyone not running Internet Explorer.
      --
      "The good die first." "Most of us are morally ambiguous, which explains our random dying patterns." --- MST3K
    2. Re:Simple solution! by Anonymous Coward · · Score: 0

      Why would you ever want to talk to such people?

    3. Re:Simple solution! by fanatic · · Score: 3, Informative

      Congratulations. You just ensured you can't be emailed by anyone not running Internet Explorer.

      This seems to work fine (the window comes upo with the right email address in the to: line and the '[Question]' tag in the subject: line) in Netscape 4.76

      and Lynx Version 2.8.3rel.1

      and Mozilla 0.9.7, which implies Netscape 6.x, and Galeon will work as well, though I haven't tested these.

      --
      "that's not encryption - it's a new perl script that I'm working on..." - from some Matrix parody
    4. Re:Simple solution! by Anonymous Coward · · Score: 0

      Hehe, dumbass, better be sure you know what you're talking about before spouting off.

      The "mailto" thing referred to works on older browsers and non-IE browsers. I know for a fact Netscape 3.04 is fine with it. Can't remember about earlier versions...

    5. Re:Simple solution! by Tablizer · · Score: 1

      (* Congratulations. You just ensured you can't be emailed by anyone not running Internet Explorer. *)

      Older versions of Eudora Lite also does not like URL parameters after the address.

    6. Re:Simple solution! by Anonymous Coward · · Score: 0


      (Score: -42, Wrong)

  39. Re:Block? Are you kidding? by BlueUnderwear · · Score: 5, Funny
    - for every random (non-existent) domain that you generate, a root DNS server will be queried when an email is sent to this address, which increases the load on the root servers, which is generally a bad thing.

    Why is this a bad thing? They are owned by Verisign.

    How about instead, returning pages with the email address abuse@domain-that-spambot-is-coming-from all over them...

    This is also a good idea. In fact, I have a script which does a traceroute to the IP of the bot, and then looks up the admin contact using whois for the last couple of hops, and returns these. Oh, and for additional fun, throw in a couple of addresses of especially loved "friends"...

    --
    Say no to software patents.
  40. A better solution: obfuscate the mailto: link by rsidd · · Score: 5, Insightful

    Write some of your email address using html code for the ascii characters, like &#36 &#35 114 for "r".
    (Yes, I've posted about this before, but it does work for me.) Browsers render it so users get the address they want, but spambots try to grab it from the raw html and get something meaningless.

    1. Re:A better solution: obfuscate the mailto: link by fulgan · · Score: 1

      Except that writing a parser for such things takes a good 90 seconds if you're doing it from scratch and exactly 0 if you're using a good library.

    2. Re:A better solution: obfuscate the mailto: link by Anonymous Coward · · Score: 1, Informative

      Paul Gregg has posted a PHP deal he's written that'll allow you to generate obscured mailto:s - can be configured to generate obfuscation that is or isn't javascript dependant.

      Find it at: http://www.pgregg.com/projects/encode/htmlemail.ph p

      A usable page for those without access to their own php aware servers as well as source code.

    3. Re:A better solution: obfuscate the mailto: link by Sangui5 · · Score: 5, Interesting

      Some spambots will render that correctly. Less likely, though, is if they'll render an email that has had this done to it: it's encrypted through javascript.

      It is a rather impressive piece of work. Uses honest-to-god RSA.

      You could also encrypt all email addresses, and then in your spambot trap, put really really CPU intensive javascript. You'll win either way: either the spambot doesn't do javascript, and it won't get your addresses, or it does do javascript, and they've just spent an eternity wasting time. It would work the same way as a tarpit, but it wouldn't eat nearly so many resources on your end.

      If you're really clever, you could have the javascript do useful work, and then have the results of that work encoded into links in the page. You could then retrieve the results when the spider follows the link.

      There was an idea called hashcash floating arount a while back. The idea was that an SMPT server would refuse to deliver email if the sender didn't provide a hash collsion of so many bits to some given value. The sender has to expend way assymetrically more resources to generate the collision than it takes the reciever to check it. That way on can impose a cost on sending a lot of email. It's not so much to be a burden on ordinary users, but if you need to send thousands of emails, it will add up.

    4. Re:A better solution: obfuscate the mailto: link by Tablizer · · Score: 1

      (* Write some of your email address using html code for the ascii characters, like $ # 114 for "r". ....Browsers render it so users get the address they want, but spambots try to grab it from the raw html and get something meaningless.*)

      Like somebody said, it is trivial to write spamware that figures it out. However, most spammers won't bother unless you are a huge site or if more sites do it.

      Another solution may be a graphic (.gif, .jpeg, .png) of the email address. That way there is no text to parse, and no spammer is going to bother to write OCR just to get it. However, make sure it is an easy-to-spell address, since copy-n-paste won't work.

    5. Re:A better solution: obfuscate the mailto: link by WhaDaYaKnow · · Score: 1
      A couple of problems with the encrypted email though:

      a) it requires JavaScript to run on the client-side. There are better solutions than this, if you don't mind that being a requirement.

      b) he requires to keep the copyright message to stay intact. Can you say 'signature'?

      c) If they enhanced the parser of the bot with JavaScript (and this is not THAT difficult), he writes the email plain to the document.

      So some improvements: - Forget about requirements of leaving certain comments in the code, these serve as signatures, so if the solution ever became widespread, the bots can be modified to understand that. - Require user interaction to decrypt the address. For example, use the onclick() handler. It's orders of magnitude more difficult for a bot to simulate all possible 'onclick()'s than it is to execute script that's executed when the page is loaded. Below is the code generated, and as you can see in the last lines, the email address is written plainly to the document:
      <SCRIPT LANGUAGE='JAVASCRIPT'>
      // <!--Quick! Hide the java!
      // Speaking of Java, this particular script is (C) Copyright 2002 Jim Tucek
      // If you wish to use my Email Encryption script, these comments must be left
      // alone! That is all.

      // Visit www.jracademy.com/~jtucek/ for script information and a bit of help
      // setting it up, or www.jracademy.com/~jtucek/email.html for contact
      // information.

      // A brief history of this script can be found (and it's rather entertaining)
      // at www.jracademy.com/~jtucek/eencrypt.html

      function goForth() {
      var c = '665 511 420 45 39 48 280 39 175 529 511 147 635 511 665 730';
      var n = 779;
      var d = 103;
      c += ' ';
      var length = c.length;
      var number = 0;
      var bar = 0;
      var answer = '';
      for(var i = 0; i < length; i++) {
      number = 0;
      bar = 0;
      while(c.charCodeAt(i) != 32) {
      number = number * 10;
      number = number + c.charCodeAt(i)-48;
      i++;
      }
      answer += String.fromCharCode(decrypt(number,n,d));
      }
      retu rn answer;
      }

      function decrypt(c,n,d) {
      // Split exponents up
      if (d % 2== 0) {
      bar = 1;
      for(var i = 1; i <= d/2; i++) {
      foo = (c*c) % n;
      bar = (foo*bar) % n;
      }
      } else {
      bar = c;
      for(var i = 1; i <= d/2; i++) {
      foo = (c*c) % n;
      bar = (foo*bar) % n;
      }
      }
      return bar;
      }
      emailAddress = goForth();
      //emailAddress is the decrypted version of your email address, ie none@none.com
      document.write('<A HREF=mailto:' + emailAddress + '>Email me at: ' + emailAddress + '</A>');
      // Stop hiding the script -->
      </SCRIPT>
      There are even better ways to obfuscate things, but, I leave that as an exercise ;)
      Sorry for the strange formatting,- the fucking lameness filters demanded it...
    6. Re:A better solution: obfuscate the mailto: link by Pathwalker · · Score: 2

      I use a little trick that combines both of those techniques.
      It's a little block of RXML that defines a tag called cloak. You use it like this:

      <cloak email='foo@pathwalker.org' />

      If Roxen determines that the client is a robot, or it can't identify what the client is, then they get a graphic.

      If they are detected as a normal webbrowser, then they get a partially entity encoded address.

      If anyone uses Roxen as their server it might be of some use.

    7. Re:A better solution: obfuscate the mailto: link by Sangui5 · · Score: 1

      Well, if you really hate JavaScript, then I guess you won't be getting my email address, then. The really good part about this is forcing computation on the client side.

      The point isn't necessarily to be perfectly secure. The point is to make it expensive for the spammers.

      So what if they have a signature to the code? The idea is that this will be used to obfuscate real email addresses, and (as a bonus) bogus ones too. With or without a signature, they still have to spend cycles to decrypt, and the spambot author has to add JavaScript support. There's two ways this could make it more expensive: cycles and mandpower.

      Again, the signature doesn't matter. My door plainly has a lock on it. Thieves can look at my door, and they plainly see the lock. But that doesn't keep the lock from helping. So the spammers have an easy way to ID obfuscated emails: that doesn't reduce the cost of deobfuscation. Most other obfuscation schemes are trivial, and therefore impose little cost.

      True, the script is really just a proof of concept, and therefore doesn't impose a lot of cost. To be truly useful, it has to force the user to expend more cycles to get the decryption. This means a set of JavaScript large-number routines would have to be written. Also, one could include the MD5 sum of the plaintext email, and fail to include all of the bits of the keys. So rather than taking a relatively few milliseconds per address, it could take any desired amount of time, as the script has to run through lots of possible keys, and check if it gets a correct output.

      One certainly doesn't want the renderer to have to slug through that, though. Your onclick() suggestion has a lot of merit, in that. If it takes a minute to decode one address, the utility of a spambot plummets. Especially if one mixes a lot of fake addresses to the mix. But to real people, spending a minute to send 1 email isn't a big deal. The same idea applies to tarboxes (SMTP servers purposely configured to be reeaallyy reeaally sllooww) and hashcash. It doesn't unduly inconvienience legitimate users, but it costs a spammer lots of resources. This has the added benefit of not eating up server side resources (tarboxes) or requiring major changes to existing specs (hashcash). All the additional work is on the client (except, I suppose, for generating the script in the first place), and most everybody already has JavaScript.

      So yes, the script itself, as is, isn't perfect. But the concept is very sound.

  41. Re:Block? Are you kidding? by cperciva · · Score: 4, Interesting

    Add a couple of sleep(20); into the cgi script that generates the bot fodder. The bot will still stay busy waiting for your webserver's response, but your script will exactly consume zero resources.

    Zero resources, except for memory.

    A much better solution would be to point the bot at a set of "servers" with IP addresses where you're running a stateless tarpit.

  42. linux? by Anonymous Coward · · Score: 0

    How does your solution require linux? And why in gods name would you want to run a webserver using linux and mysql... do you just want a slow webserver or what?

    1. Re:linux? by Anonymous Coward · · Score: 0

      Better then Molasses OS (aka Nintendows) running IIS. Besides, why waste the money on that trash when you can have something worlds better (and faster) for free (that doesn't crash often)? The only other real option is Slowlaris and Oracle. Better yet Linux and Oracle, and even better yet BSDi (or xBSD) and Oracle. I guess what it comes down to is that if your not Toyota, CNN or Slashdot, why the hell do you need anything faster? This coward must be one of those damn gear heads that put the computing equivalent of a porche in for the job of a skateboard. "Look at my Athlon 1.4g box!!! Its soo much faster then yours!! I also have XP and now I'm quicker then a ray of light!!!" Etard, more code != faster, ever. Fixed code == faster, and never forget it!

  43. my spambot trap by romco · · Score: 4, Informative

    The page is already slashdoted. Here is a little
    script that traps bots (and others) that use your robots.txt
    to find directories to look through. Requires an .htaccess file with mod_rewrite turned on

    robots.txt
    #################

    User-agent: *

    Disallow: /dont_go_here
    Disallow: /images
    Disallow: /cgi-bin

    dont_go_here/index.php
    ############

    $now = date ("h:ia m/d/Y");
    $IP=getenv(REMOTE_ADDR);
    $host=getenv(R EMOTE_HOST);
    $your_email_address=you@whatever;

    $ban_code =
    "\n".
    '# '."$host banned $now\n".
    'RewriteCond %{REMOTE_ADDR} ^'."$IP\n".
    'RewriteRule ^.*$ denied.html [L]'."\n\n";

    $fp = fopen ("/path/to/.htaccess", "a");
    fwrite($fp, $ban_code);
    fclose ($fp);

    mail("$your_email_address", "Spambot Whacked!", "$host banned $now\n");

    --
    AdFuel
    1. Re:my spambot trap by Captain+Large+Face · · Score: 2

      How about rewriting denied.html each time to contain a list of e-mail addresses in the format:

      abuse@banned_host

      That way, the spammers might actually spam their own ISP's abuse account. Now THAT would be funny! :-)

    2. Re:my spambot trap by romco · · Score: 2

      "How about rewriting denied.html each time to contain a list of e-mail addresses in the format:
      abuse@banned_host"

      I do something like that..

      Denied!

      --
      AdFuel
  44. Re:Huh? by Anonymous Coward · · Score: 0

    Nothing, MySQL and mSQL are dope as shite! I've never used anything but. Don't get me wrong, Oracle is great for massive databases I'm sure, however I don't want to pay 80k for a Oracle server that so damn complicated I'ld have to send another 5 - 10k on schooling. It doesn't need to be that complicated, however it justifies the 70 - 80k per year you have to spend on your developer. All in all, not worth it. If ya can't do it with MySQL or mSQL your a poor programmer.

  45. Re:Block? Are you kidding? by Ralp · · Score: 3, Informative
    Wpoison does this.

    From the website: Wpoison is a free tool that can be used to help reduce the problem of bulk junk e-mail on the Internet in general, and at sites using Wpoison in particular.

    It solves the problems of trapped spambots sucking up massive bandwidth/CPU time, as well as sparing legitimate spiders (say, google) from severe confusion.

  46. Re:Block? Are you kidding? by gclef · · Score: 3, Interesting

    Actually, I've done this w/a bot trap on my site at home. It's a perl script that generates a bunch of weird-sounding text w/some fake email addresses at the bottom and a bunch of database-query-looking links back to the original page.

    The bots don't fall for it anymore. Some dorks in Washington state decided to make a couple requests a second to it once, but in the two years I've had it up, they're the only ones.

  47. Other options.. by primetyme · · Score: 4, Informative

    A pretty good article, but being able to install modules into Apache may not be the best situation for everyone who wants to stop Spambots..

    Shameless plug, but I've got an ongoing series in the Apache section of /. that deals with easy ways that administrators *and* regular users can keep Spambots off their sites:
    Stopping Spambots with Apache
    and
    Stopping Spambots II - The Admin Strikes Back

    Just some more options and choices to help people out!

  48. Re:Not fp, but still a wide page! by Anonymous Coward · · Score: 0

    fine in windows ie5 / ie6 /ns3 and even webTV !

  49. Re:Not fp, but still a wide page! by Anonymous Coward · · Score: 0

    no, it's broken on ie6

  50. Re:Block? Are you kidding? by liquidsin · · Score: 2

    I like that idea...look up the originating host, and make links back to abuse@, root@, webmaster@, and whatever else you can think of. Clog their mailservers. The problem is, it would be simple enough (if it's not already in place) to have your spam bot ignore addresses for your own domain.

    --
    do not read this line twice.
  51. If I were a spambot author... by Anonymous Coward · · Score: 0

    It would be possible for the spambots to get wise, and start following the robots.txt file rules. Then the spambot could in theory surf your entire site (or at least the bits allowed by robots.txt) without falling into the trap. However this also means that you can control where the spambot goes, which is the whole point of robots.txt. If you want, you can allow google into one part of the site, but exclude all others.

    I'd read robots.txt and just go where google was allowed to go...

  52. burp by Anonymous Coward · · Score: 1, Interesting
    We had an Evil Harvestor Robot irritator on our web site back in 1996. It worked rather well. It didn't hit legitimate spiders by using an appropriate robot directive. It also gave the harvester a whole heap of nonsense addresses to add to its database.

    None of that Perl nonsense, either. All in pure C on a BSD host, with a damn good attention to potential overflows. That was also the site which had my own custom MTA (I only knew sendmail, so it seemed a wise decision), demanded full W3C compliance (we would test it on about 10 platforms), and got used as evidence in the DoJ case against Microsoft.

    Sigh, those were the days. Now, all I see is rehashing of old ideas. So, I view this news is 6 years old -- perhaps even a record for Slashdot?

    1. Re:burp by Voivod · · Score: 1

      Yeah? So put your source up on a website and let us benefit from your ideas! How does it benefit anyone that you solved this 6 years ago? As it stands, your work was wasted.

    2. Re:burp by Anonymous Coward · · Score: 0

      I assure you, I would if I had the agreement of the organisation to release it for general usage.
      I feel out with them about 3 years ago (political, not technical, issues), so it won't be happenin'.

  53. Re:Block? Are you kidding? by Martin+S. · · Score: 2

    Give it thousands, millions of addresses this way.

    Liberally sprinkled postmaster@127.0.0.1 and abuse]@127.0.0.1.

  54. using images is bad for people with text browsers by hsenag · · Score: 2, Insightful

    If you use images for email addresses, what are people using text browsers supposed to do? Even worse is using them on the "warning" pages - someone with a text browser would have no idea what the image said and therefore nothing to stop them falling into the trap and getting firewalled.

    And of course if he uses ALT text for the images, then he has the same problem he was trying to avoid, of creating something the spambots can read.

  55. Re:Block? Are you kidding? by boky · · Score: 5, Interesting

    I agree. And, come on, how much technology do you need?

    This is my solution to stopping spambots. It's in a JavaServlet technology and I am posting it here to prevent my company's site from being slashdotted. It does not prevent the spammer from harvesting emails it just slows them down.... a lot :) If everyone had a script like this, spambots would be unusable.

    Feel free to use the code in anyway you please (LGPL like and stuff)

    Put robots.txt in your root folder. Content:

    User-agent: *
    Disallow: /members/

    Put StopSpammersServlet.java in WEB-INF/classes/com/parsek/util:

    package com.parsek.util;
    // Slashdot lameness filter trick... sklj lijef oiwej goweignm lkjhg woeèi weoij woefh woegih weoigj woefm weoikjf woeifh woefhpweifjwopejf pw
    // Slashdot lameness filter trick... flk joweij pgwej pweof ,mpeof ,mpweorj pweomfwpegj pwehg woeigh owèefij woeij eogih oibhwepoi upeorw wpeo
    // Slashdot lameness filter trick... fkjew fiwje spbojkwe gkwpeori wpbv-j wpeofksweok pweorjsw eigjhwoeifj pweorj wepoj wepfomwe fpmwoe fpowe
    // Slashdot lameness filter trick... epoiw epw0 w'pg wpoe wpeom, wpog wepfoiwpeor kwpeof, wpobm wepofkwpeofk wopvf,w bowkpeoirf pwoef,mwepof p
    // Slashdot lameness filter trick... vlwkepo wesp ibebemwf èsdm fèefo.bp kwèpef èlfk èeofsw èegjwegoweofiw peok èglks dgèlksdfèokwe ofèkwe èfoe
    import java.io.File;
    import java.io.StringWriter;
    import javax.servlet.ServletContext;
    import java.net.URL;
    import java.util.Enumeration;
    import java.lang.reflect.Array;
    public class StopSpammersServlet extends javax.servlet.http.HttpServlet {
    private static String[] names = { "root", "webmaster", "postmaster", "abuse", "abuse", "abuse", "bill", "john", "jane", "richard", "billy", "mike", "michelle", "george", "michael", "britney" };
    private static String[] lasts = { "gates", "crystal", "fonda", "gere", "crystal", "scheffield", "douglas", "spears", "greene", "walker", "bush", "harisson" };
    private String[] endns = new String[7];
    private static long getNumberOfShashes(String path) {
    int i = 1;
    java.util.StringTokenizer st = new java.util.StringTokenizer(path, "/");
    while(st.hasMoreTokens()) { i++; st.nextToken(); }
    return(i);
    }
    // Respond to HTTP GET requests from browsers.
    public void doGet (javax.servlet.http.HttpServletRequest request,
    javax.servlet.http.HttpServletResponse response)
    throws javax.servlet.ServletException, java.io.IOException {
    // Set content type for HTML.
    response.setContentType("text/html; charset=UTF-8");
    // Output goes to the response PrintWriter.
    java.io.PrintWriter out = response.getWriter();
    try {
    ServletContext servletContext = getServletContext();
    endns[0] = "localhost";
    endns[1] = "127.0.0.1";
    endns[2] = "2130706433";
    endns[3] = "fbi.gov";
    endns[4] = "whitehouse.gov";
    endns[5] = request.getRemoteAddr();
    endns[6] = request.getRemoteHost();
    String query = request.getQueryString();
    String path = request.getPathInfo();
    out.println("<html>");
    out.println("<head>");
    out.println("<title>Members area</title>");
    out.println("</head>");
    out.println("<body>");
    out.println("<p>Hello random visitor. There is a big chance you are a robot collecting mail addresses and have no place being here.");
    out.println("Therefore you will get some random generated email addresses and some random links to follow endlessly.</p>");
    out.println("<p>Please be aware that your IP has been logged and will be reported to proper authorities if required.</p>");
    out.println("<p>Also note that browsing through the tree will get slower and slower and gradually stop you from spidering other sites.</p>");
    response.flushBuffer();
    long sleepTime = (long) Math.pow(3, getNumberOfShashes(path));

    do {
    String name = names[ (int) (Math.random() * Array.getLength(names)) ];
    String last = lasts[ (int) (Math.random() * Array.getLength(lasts)) ];
    String endn = endns[ (int) (Math.random() * Array.getLength(endns)) ];
    String email= "";

    double a = Math.random() * 15;
    if(a if(a if(a if(a if(a if(a if(a if(a if(a if(a if(a if(a if(a email = email + "@" + endn;

    out.print("<a href=\"mailto:" + email + "\">" + email + "</a><br>");
    response.flushBuffer();

    Thread.sleep(sleepTime);

    } while (Math.random()
    out.print("<br>");
    do {
    int a = (int) (Math.random() * 1000);
    out.print("<a href=\"" + a + "/\">" + a + "</a> ");
    Thread.sleep(sleepTime);
    response.flushBuffer();
    } while (Math.random() out.println("</body>");
    out.println("</html>");

    } catch (Exception e) {
    // If an Exception occurs, return the error to the client.
    out.write("<pre>");
    out.write(e.getMessage());
    e.printStackTrace(out);
    out.write("</pre>");
    }
    // Close the PrintWriter.
    out.close();
    }
    }

    Put this in your WEB-INF/web.xml

    <servlet>
    <servlet-name>stopSpammers</servlet-name& gt;
    <servlet-class>com.parsek.util.StopSpammersS ervlet</servlet-class>
    </servlet>
    <servlet-mapping>
    <servlet-name>stopSpammers</servlet-name& gt;
    <url-pattern>/members/*</url-pattern>
    </servlet-mapping>

    Here you go. No PHP, no APache, no mySQL, no Perl, just one servlet container.

    Ciao

    --
    boky
  56. SAUCE by Anonymous Coward · · Score: 0
  57. Re:Block? Are you kidding? by Desco · · Score: 1

    Yeah, the spammbot would probably (or could probably) filter abuse... But why not auto-generate an email to abuse@spammers.isp.com and send the appropriate logs that prove the use?

  58. Re:Block? Are you kidding? by richie2000 · · Score: 3, Informative
    Wpoison basically does that; it serves a page with bogus addresses and adds a nasty delay between pages, keeping the spider occupied.

    However, the instructions for installating Wpoison more or less assumes that one has a single website to protect. I have around 20 virtual hosts. So instead of creating a renamed cgi-bin in every DocumentRoot, I added a single

    ScriptAlias /runme/ "/var/www/cgi-bin/"

    to httpd.conf and then linked it like this:

    <A HREF="/runme/addresses.ext"><IMG SRC="pixel.gif" BORDER=0></A>

    I also added a single transparent pixel to the link to keep it invisible but still fool the spiders. Add the runme directory as excluded in the robots.txt and you should be on your way. Muhahahah, and so on.

    --
    Money for nothing, pix for free
  59. Re:Not fp, but still a wide page! by Anonymous Coward · · Score: 0

    Win XP with IE 6 has problems.
    Can't get it to work. Damn Klerck!

  60. Another Method by Captain+Large+Face · · Score: 2

    How about sending a parameter to a page which redirects to the mailto: protocol?

    For example:

    index.html

    <a href="filename.php?x=info">E-Mail Me&lt/a>

    filename.php

    &lt?php
    Header ("Location: mailto:" + $x + "@mydomain.tld")
    ?>

    1. Re:Another Method by Anonymous Coward · · Score: 0

      Spambots are spiders, they traverse links. In fact, their penchant for following silly links is used in the article.

      The bot would follow your e-mail link, and grab the mailto:

  61. Re:Not fp, but still a wide page! by Anonymous Coward · · Score: 0

    Jamie will never "fix" anything, unless you call using a patch someone sends him "fixing". Trouble is, he wants to understand the fix before he applies it, and it could take forever for the slashdot perl weenies to understand a few lines of code. Let me be clear, this is not a shot at perl - even if it is a crappy language.

  62. How to beat the Slashdot Junk filter? by er0ck · · Score: 1

    I had the text too, but I kept getting blocked by the Slashdot filter when I tried to repost it. I even tried using Striff Tummel. What's your trick to weed out the stuff that triggers the Slashdot filter?

    1. Re:How to beat the Slashdot Junk filter? by Captain+Large+Face · · Score: 1

      Well, it took me a hell of a long time. Bit of a pity since I got marked down -3 redundant, but there you are...

    2. Re:How to beat the Slashdot Junk filter? by er0ck · · Score: 1

      Any patterns you used, or just manual editing? I try to opt for the Perl oneliners myself; it's a good way to get more experience with them. Here's a sample:

      #replace a tab with spaces
      perl -p -i.orig -e 's{\t}{ }g' spambot_trap.txt

      #replace lines containing only spaces with nothing. (the hex A0 is for nbsp conversions (ASCII char 160)
      perl -p -i.orig -e 's{^[\s \xA0]+$}{}g' spambot_trap.txt

      #lastly, remove multiple newlines. the -00 undefines the record separator, so you can match on the newline character. The other way you see this alot is -0777. As long as you pick a value that is null or not a valid ASCII character (0-255), it works.
      perl -p -i.orig -00 -e 's{\n\n}{\n}g' spambot_trap.txt

  63. Re:Block? Are you kidding? by fean · · Score: 1

    do a reverse-dns lookup on the host, and do word1word2@reverse.lookup

    that'd be great!!!!!!!!

  64. Use HTML character references instead by Anonymous Coward · · Score: 0

    With syntax like , where D is decimal number of desired character. Following url has few other syntaxes to thwart spambots.

    http://www.w3.org/TR/html401/charset.html#entiti es

    For example, 64 is @ and 46 is . It's really very easy method to obfuscate addresses from spambots which don't parse the HTML anyway. Normal web browsers always parse the pages and use the parsed addresses for additional processing (like mailto: links).

  65. Take this one step further... by Jason+Levine · · Score: 4, Interesting

    There's a spam-blacklist, so how about a spambot-blacklist?

    You'd have a standardized spambot trap (like the one described in the article) on various webservers. The new spambot info could go into a "New SpamBots" database (which wouldn't be blocked). Once a day, the webserver would connect up with a central database and submit the new spambot info it's obtained. Then the server would download a mirror of the updated "SpamBots" database which it would use to block spambots.

    The centralized SpamBots database would take all of the new SpamBot info every day and analyze them in some manner as to detect abuse of the system (ensuring that only true spambots are entered). E-mails could be fired off to the abuse/postmaster/webmaster for the offending IP address. Finally, the new SpamBot info would be integrated into the regular SpamBot database.

    This way you'd be able to quickly limit the effectiveness of the Spambot-traps across many websites.

    --
    My sci-fi novel, Ghost Thief, is now available from Amazon.com.
  66. seems like a lot of tools involved by Anonymous Coward · · Score: 0

    just to write a simple spam trap

  67. Re:Block? Are you kidding? by blibbleblobble · · Score: 3, Funny

    Especially loved "friends"...

    Like hotline@mpaa.org, cdreward@riaa.org, senator@hollings.senate.gov for example?

  68. You Make the Problem Worse by jhunsake · · Score: 1

    By using real domains, you're doing a real disservice to those who host them.

    1. Re:You Make the Problem Worse by Anonymous Coward · · Score: 0

      Yeah I'm gonna add blibbleblobble.co.uk to my spambot trap now. See how it feels buddy.

  69. Attn Spambot Authors by NiftyNews · · Score: 5, Interesting

    Dear Spambot Authors,

    Thanks again for your interest. I hope that we were able to help you write the spambots of the future that will be able to detect and sidestep as many of the above protection schemes as possible. We tried to work all of our knowledge into one convienient thread for your development team to peruse.

    Thanks for your interest in SlashDot, home of too much information.

    1. Re:Attn Spambot Authors by fulgan · · Score: 1

      You know: security by obscurity doesn't work, we all know that (or, at least, should know it).

    2. Re:Attn Spambot Authors by chris\ · · Score: 1

      Your use of the nifty catch phrase is out of context here, and just doesn't make sense. In fact its almost *gasp* off topic.

      A closer analogy here would be comparing this to a bunch of sysadmins who posted a list of ports/protocols they allow through their firewalls, what kind of firewall software they use, and what logging/tracking methods they use... its just not a good idea, reguardless of what your views of public disclosure are.

      NiftyNews has a point.

    3. Re:Attn Spambot Authors by Biggles_the_pilot · · Score: 0

      Let's examine the logic of your cynical little post from a different perspective:

      Women should not be informed of techniques to avoid rape. As long as information regarding ways of avoiding potential rapists - for example, not walking alone at night, carrying pepper spray etc. - is available to women, it is available to all. In which case, it is also available to rapists. Any new idea we come up with and disseminate to help women protect themselves from rapists will simply be rendered obsolete. Therefore, the best way to protect women from rape is to not talk about it.

      --
      I have no sig
    4. Re:Attn Spambot Authors by Eil · · Score: 2


      Well this is one of the highest scoring trolls I've seen in awhile.

      So, what, website admins like myself are just supposed to sit back and let spambots a) harvest email addresses without consent b) eat up costly downstream bandwidth, memory and resources c) blatantly violate robot.txt directives?

      I'm patiently awaiting to hear your opinions on how to stop spambots without actually telling the web server administrators about any of it.

      I suppose next you'll argue that nobody should ever discuss ways to keep from being carjacked or mugged, because you just know that criminals are going to looking for this thread so they can watch out for said tips when they actually go do their dirty work.

      You know, maybe Apache could release some precompiled binaries with their own techniques for avoiding spambots and keep the source to themselves so the spambot authors can't see exactly what precautions have been coded in. You can't exploit a closed system, right? Just ask Microsoft.

  70. Re:Block? Are you kidding? by dirk · · Score: 3, Insightful

    Why on Earth would you like to block a spambot? So it doesn't get any more useful addresses?
    No way, man.
    If you realize you're serving to a bot, go on serving. Each time the bot follows the "next page" link, you /give/ it a next page. With a nicely formatted word1word2num1num2@word1word2.com, where words and nums are random.
    Give it thousands, millions of addresses this way.

    This would be good to do with known bad addresses, but random addresses only add more unknowing people to the list. You may add 1000 email addresses to the list and slow them down, but if even 10 of those email addresses are real, you've added to the problem. The bad addresses will be taken out as they are found to be bad, and the good ones will be left in. You've signed JoeRandomUser@RandomDomain.com up for all the spam he can handle, even if he has taken great lengths to keep his email address off the spam lists. In theory this sounds like a great idea, until your the guy getting your email address randomly fed to the bots.

    --

    "Information wants to be expensive" - Stewart Brand, the same guy who said "Information wants to be free"
  71. Re:Block? Are you kidding? by nathanm · · Score: 2

    Try out the Book of Infinity. It's a CGI that generates an infinite trail of gibberish links. It could easily be modified to add gibberish e-mail addresses to each page.

  72. Crazy Guy on a Bike by Anonymous Coward · · Score: 0

    The author of the Spambot traps alludes to his website as www.crazyguyonabike.com. I recently ran across this site and found it quite interesting. He has a journal of his cross-country bicycle trip from New York City to the state of Washington.


    It's well written and a quite humorous read.

  73. wonder what this means.. by SethJohnson · · Score: 2
    I was just checking out one of the email harvesting products and saw this in the description:

    Automatically avoids spam trap pages.

    I wonder if this is a lie.. I also think it's funny because the rest of the product literature doesn't refer to it as a spam tool, but then this blurb is straight-up admitting it.

    Here's another funny 'feature'--

    Resume at the same place it left off even if your computer
    crashes.


    Doesn't exactly instill confidence in the stability of this product..
  74. Re:Elements of good design I'd missed - P.Solution by skaldrom · · Score: 2, Interesting

    There is another solution: Usually these SpamBots are not able to execute JavaScript...
    As described at http://www.joemaller.com/js-mailer.shtml you can combine JavaScript and Images to protect your mail. Made very good expiriences with this one....

    But, as stated on the Website: this game is an arms race...

  75. Re: SpamBots: PHP Code by blibbleblobble · · Score: 2

    function SeedFakeEmail($Email)
    {
    echo "\n<font size=\"-5\" style=\"display:none\"><a
    href=\"mailto:$Email\"> Please don't email $Email</a></font>";
    }

    SeedFakeEmail("uce@ftc.gov");
    SeedFakeEmail("listme@dsbl.org");
    SeedFakeEmail("hotline@mpaa.org");
    SeedFakeEmail("cdreward@riaa.org");
    SeedFakeEmail("senator@hollings.senate.gov");

    Put that in your pageheader and smoke it!

  76. Re:Block? Are you kidding? by Anonymous Coward · · Score: 0

    No, they're not...what are you some kind of fucking idiot?!?! Don't overload the root nameservers. This is a Bad Thing. See, this is what slashdot readers get...no idea of how things work.

  77. I use two methods on my site.... by Rahga · · Score: 2

    On rahga.com, I use a custom perl script with a html-based form that is programmed only to send messages to me. Here it is.

    On stuff like my FAQs, I use igPay Latin Encoded Email: ahgaray atyay ahgaray otday omcay

  78. Re:Huh? by Anonymous Coward · · Score: 0

    I guess if you're some pussy who's not man enough to use a real database manager then MySQL might seem usable.

    You probably "program" in perl, too.

  79. Re:Block? Are you kidding? by AntiNorm · · Score: 2

    How about instead, returning pages with the email address abuse@domain-that-spambot-is-coming-from all over them...

    Most spambots know better than to send their crap to email addresses containing things like abuse, root, postmaster, .edu, or .gov.

    Also, in regard to the problem of root servers being queried every time a @randomdomain.com is looked up, could you not just use random IP addresses?

    --

    I pledge allegiance to the flag...
    of the Corporate States of America...
  80. Note to self by underclocked · · Score: 2, Funny

    Before announcing new useful project to Slashdot community, create Freshmeat/Sourceforge page first there by eliminating the need for my host to shut me down for execssive bandwidth.

  81. What I use by Phroggy · · Score: 3, Interesting

    Take a look at these two bits of code from http://www.slickhosting.com/contact.shtml :

    <A HREF="mailto:hosting%40slickhosting.com"
    onMouseO ver="window.status='mailto:hostingsli ckhosting.com';return true;"
    onMouseOut="window.status='';">hostingslic khosting.com</A>

    <!-- Spam trap
    abuse@
    (your domain) HREF="mailto:abuse@ (your domain) "
    root@
    (your domain) HREF="mailto:root@ (your domain) "
    postmaster@
    (your domain) HREF="mailto:postmaster@ (your domain) "
    uce@ftc.gov HREF="mailto:uce@ftc.gov"
    -->

    --
    $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
    $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
    1. Re:What I use by doorbot.com · · Score: 1

      It seems to me that it would be better to have:

      abuse@[ spammerIP ]

      That way things might actually get delivered.

      For me, I deny deliveries to someguy@myIP, but sendmail accepts someguy@[myIP]

      Fortunately, it's easy to add both!

  82. other solution: flash 8) by leuk_he · · Score: 2

    Make a big macromedia flash site. Let the bot's eat that: this is the thing a lot of company's do.

    don't worry, and google wil adapt. They read even pdf and .doc files.

    new thought: make a site written in .doc format.

    1. Re:other solution: flash 8) by tomknight · · Score: 2
      I really hope that's a joke....

      Tom.

      --
      Oh arse
    2. Re:other solution: flash 8) by gaudior · · Score: 1

      Why? It seems a reasonable suggestion to me.

    3. Re:other solution: flash 8) by Phibian · · Score: 1

      My site is written in .rtf (www.phibian.com), and .doc will follow in due course. Of course, the site also uses a nifty web-server add-on that converts the files (as well as .pdf files, any ODBC accessible databases, text, html, etc) on the fly to a browser-viewable format.

      And I post my email address all over the place - for a business, it's important that people can contact you... However, I rarely get spam; we have a good spam filter... (and yes, false positives are also extremely rare)

      The way it works: We have a list of phrases that we consider to be spam-matches (you know: "this is not spam"... ). If the sender is not on the allowed sender list (currently my email contact list) and there are spam matches, then the email is not delivered. Instead, it is dumped to a cache, and the subject/sender/date are added to the log file.

      A designated employee reads through the log roughly once a day - determining from the subject if it is a valid spam message or not. And then the messages are deleted (or rarely, bounced back to the person they were intended for, and the filter adjusted to avoid that type of false positive...).

      The important part is the "allowed through" concept, as this way I could still receive breast cancer newsletters if I wanted to (often cited as the drawback for email filters, for some reason...)

    4. Re:other solution: flash 8) by tomknight · · Score: 2
      Right, I don't want to have to suffer any more bloody flash sites than I already do.

      And no, it's not just me. Have you thought about people who can't view flash sites? If you can't think of anyone who might have a problem here, search google for web accessibility and see if you get the picture.

      When I find a "commercial" site that I can't view without flash or images I email the marketing and sales guys and tell them why I have a problem with their site. None of them really seem to give a fuck, but one day it might make a difference.

      Tom.

      --
      Oh arse
  83. Alas, not practical... by OmniGeek · · Score: 2

    Odds are high that this system, should it become sufficiently widespread to be useful, would be vulnerable to poisoning by spammers spoofing spambot traps and causing legitimate IPs (such as Googlebot or large blocks of Net users) to be incorrectly blocked. There are countermeasures against this, but my guess is that the resulting arms race would not result in an adequately-usable system for enough of the time to be worth it. (Remember, the blacklist must update with reasonable frequency for both additions AND expirations, and must have a VERY low rate of false-positives). The authentication of "legitimate" submitters is a serious weakness of such a system. Nice thought, though...

    --

    "My strength is as the strength of ten men, for I am wired to the eyeballs on espresso."
    1. Re:Alas, not practical... by fizbin · · Score: 1

      Actually, what would happen is that spambots would start pulling out of google's cache. That way, the bad addresses they return will be traceable only back to google's spider.

  84. Re:Not fp, but still a wide page! by Anonymous Coward · · Score: 0

    Win 2k with IE 6 as well

  85. Re:Block? Are you kidding? by LinuxHam · · Score: 3, Interesting

    postmaster@127.0.0.1 and abuse]@127.0.0.1postmaster@127.0.0.1 and abuse@127.0.0.1

    Good idea but, I'm sure spam software has been rejecting 127.0.0.1 for many years.

    How about a few people volunteering real FQDNs that all resolve to 127.0.0.1? I realize that people would be volunteering horsepower and bandwidth for DNS lookups, but it would be in the name of dramatically reducing spam. Then, keep a list of all the "loopback FQDN's" and let the rest of us feed those FQDN's into spam-trap generators. Eventually, there would be so many real-looking spam trap email addresses that the spam software wouldn't be able to keep up with the list of loopback FQDN's.

    To take it to the next level, you could hide the list of "loopback FQDN's" by making a reverse DNS lookup against a couple of volunteered IP addresses return a random FQDN from the list of loopback FQDN's at the time that the spamtrap page is dynamically generated.

    Spammers would never know the entire list of FQDN's that resolve to loopback.

    --
    Intelligent Life on Earth
  86. Re:Block? Are you kidding? by erc · · Score: 4, Informative

    Way too much work. Here's similar Escapade [escapade.org] code:

    <QUIET ON>
    <html><head><title>Members area</title></head><body>
    <p>Hello random visitor. There is a big chance you are a robot collecting mail
    addresses and have no place being here.
    Therefore you will get some random generated email addresses and some random links
    to follow endlessly.</p>
    <p>Please be aware that your IP has been logged and will be reported to proper
    authorities if required.</p>
    <DBOPEN "SpamFood", "localhost", "login", "password">
    <FOR I=1 TO 100 STEP 1>
    <SQL select * from names order by rand() limit 1>
    <LET FN="$Name">
    </SQL>
    <SQL select * from lasts order by rand() limit 1>
    <LET LN="$Last">
    </SQL>
    <SQL select * from addresses order by rand() limit 1>
    <LET AD="$Address">
    </SQL>
    <a href="mailto:$FN.$LN@$AD">$FN.$LN@$AD</a> <br>
    </FOR>
    </body>
    </html>

    --
    -- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
  87. Don't stop spambots, feed them with Sugarplum by dananderson · · Score: 3, Interesting

    I don't stop spambots, I feed them. I feed them phony email addresses and addresses of spammers (gathered from places such as my fake /cgi-bin/formmail.pl). I use http://www.devin.com/sugarplum/, mentioned before on /. to dish it out!

  88. Problem with wpoison... by wideangle · · Score: 3, Informative

    is that some of the fake emails it generates will be real.

  89. Able to block spam by Anonymous Coward · · Score: 0

    but not /.

  90. Better yet, use a Spam Troll-box by samhart · · Score: 2, Interesting

    We've recently set up a Spam Troll-box using Vipul's Razor on our new Tux4Kids dev server (you can find our troll box here).

    A troll-box gives Spam-bots a place to send their spam. When this box intercepts the spam, it reports it to the Vipul's Razor network, and everyone else on this network becomes aware of that spam (if they are also using Vipul's Razor to filter, which, chances are they are, it will filter that spam if they get it).

    If Vipul's Razor isn't enough, one can even use something like SpamAssassin in conjunction with Vipul's Razor to get even better results.

    Of course, this isn't cutting off Spam-bots at their source... but if enough sites were to cut them off at their source, then I'd imagine the Spam-bot authors would get wise to this and devise a way around it. Whereas with something like a SPam Troll-box, the Spam-bots seem to still be working to those running the Spam bots ;-)

  91. Let's feed the serpent its own tail by Crash+Culligan · · Score: 2, Interesting
    This morning, after finding a junk fax on the office's voice mail system, I called the removal number (in little text at the bottom of the fax) and reached an automated voice system that would either 1) remove an inputted number, 2) add a new number, or 3) talk to a representative about their service.

    Well, I didn't trust (1), and (3) just got me a voice mail box instead of a person I could chew out, which I didn't use. That left (2), and I had a wicked idea:

    I hit 2, and input the number that I should call if I was interested in the fax (which appeared in BIG text right above the little text). Their own response number should start eventually getting faxes from them or, as I tend to experience, hangups.

    Cute story, I know, but what does this have to do with defeating spambots?

    I went to the page indicated...

    I was just checking out one of the email harvesting products and saw this [getyoursoftware.com]

    And I scrolled to the bottom, and looked at the source code, and noted two faaaaaascinating things:

    First, the HTML on that page is rather clean; I can see no evidence of anti-spambot code on their page.

    And second, the "Contact Us" link at the bottom is a mailto:.

    By all appearances, their page is vulnerable to their own spambot.

    So I had the thought... what if those generated-random-email-address pages were geared to produce not-so-random email addresses? What if the email addresses on those generated-page traps were geared to generate random email addresses at the domains of the various spambot-- (err, I mean) harvester producing companies? Let them see what it's like when less than discerning spammers use their software for evil. Hundreds of Viagra-substitutes! Thousands of hangover cures! Tens of thousands of opportunities to refinance their home mortgage!

    This is just an off-the-top-of-my-head idea. Opinions?

    --
    You cannot truly appreciate Dilbert until you read it in the original Klingon.
  92. Thanks for the pointer to nms by Leomania · · Score: 1

    I've used some of Matt's code on my personal site, and never thought to ask the question "Gee, are these things just an exploit waiting to happen?"

    I don't have much traffic, but that's certainly not the point. I really appreciate knowing about the exitence of nms.

    - Leo

    --
    You don't use science to show that you're right, you use science to become right.
  93. An old, old bot trap by Anonymous Coward · · Score: 0

    An old (by web standards) trap can be found at http://spiders.must.die.net/h/b/e/index.htm, although I've never found it's real beginning myself.

    The prose on those pages also makes good beer drinking reading. :)

  94. Consistency is the key to spammer success ;) by ziriyab · · Score: 1
    in the description [of a spam tool]:Automatically avoids spam trap pages. ...the rest of the product literature doesn't refer to it as a spam tool

    It is at least internally consistent: It's not a spambot, so it doesn't fall into spam traps :)

  95. Good, but why always on the defensive? by Tigerfoot · · Score: 1

    Stopping spambots is fantastic, but this is a defensive measure. Aren't there offensive measures people can use? What about a 'honeypot' approach. Perhaps you set up a bogus site with zillions and zillions of easy-to-find but totally bogus email addresses. The let the spammers download 10-15GB of worthless addresses that will (hopefully) choke their email pipe. Make it "ugly" enough out there and just maybe a few of the less dedicated might decide it's not worth it.

    Any other offensive measures possible?

  96. Sleep blocks don't work. by Anonymous Coward · · Score: 0

    I tried this, and it doesn't work. Although it slows down the 'bot, on a standard linux system the sleep(1) call [in Perl] takes up an enormous amount of CPU time, leading me to believe that it is implemented as a spin lock. Also, the serving process would not disappear even if the socket connection is closed on the client end (ie - getting out of the browser and rebooting my machine), so eventually a large number of CPU-hungry processes accumulate and suck up all your resources.

    My ISP was none to happy about this when I tried it. If any kind sysadmin can figure this out and tell me how to fix it, I'd be grateful...

  97. the danger of mailing lists.. esp. SuSE user list by SethJohnson · · Score: 3, Informative


    Another way your e-mail address can be susceptible to spambots is if you participate in any mailing list. If the administrator decides to archive the list on a website, in many cases the email addresses of the participants will be there in plain text. I found this out after doing a google search for my own email address and having it turn up on the SuSE web site. I sent an e-mail asking that they do a regsub on the archive to substitute the '@' with [at] or something similar. That was more than six months ago and the SuSE website admin still hasn't done it.
  98. My contribution to the anti-spam cause... by purpledinoz · · Score: 1

    Here's a list of non-existent e-mail addresses for those damn spam bots. GO GET EM BOYS!

    Please don't /. it :) Don't go here

    1. Re:My contribution to the anti-spam cause... by kaimiike1970 · · Score: 1

      Hey, my email address is on those pages! Bastard! =)

      Seriously, how did you decide they were 'non-existant'?

      --


      Do a google search before posting.
    2. Re:My contribution to the anti-spam cause... by purpledinoz · · Score: 1

      Good point. But I generated them randomly. Now who uses random letters for e-mail other than spammers?? So I think the probability of generating a valid e-mail address that someone actually uses (other than spammers) is very small. Now I'm not going to calculate that probability! (I just had a probability final exam)

    3. Re:My contribution to the anti-spam cause... by pdiaz · · Score: 1

      Here's mine. And here is the
      script that generated the mails (python)

      --
      Make It Secret . Free JavaScript implementation of AES for your browser
  99. Re:Block? Are you kidding? by Anonymous Coward · · Score: 0

    This would be good to do with known bad addresses, but random addresses only add more unknowing people to the list....You've signed JoeRandomUser@RandomDomain.com up for all the spam
    A solution to this is to generate only hotmail.com addresses.


    "I've got them on the list! They'll none of them be missed!"

  100. What about a Terms of Service page by splattertrousers · · Score: 2, Interesting

    What about requiring all of your users to go through a terms of service page before accessing any parts of your site?

    The page could have a form with "Accept TOS" and "Reject TOS" buttons. I wonder how many spambots would submit a form?

    And to catch spambots that did submit the form, your TOS could have some clauses that make it a violation for evil spiders (ones that don't honor "robots.txt") to use the site. Maybe you could make||lose a few bucks suing the spambotters who go through the TOS and still harvest your email addresses.

  101. New Program - Mailwasher by Peale · · Score: 4, Interesting

    Speaking of spam, I've come across this new program called mailwasher. You can check your mail while it's still on the server, and then - get this - fake a bounced message. There are probably other programs that do this, but this is the first one I've heard of.

    Anyway, AFAIK, it's WinBlows only, and available at http://www.mailwasher.com, although right now it seems the site is down, all I get is a 404!

    1. Re:New Program - Mailwasher by mbbac · · Score: 1

      Apple's Mail application that comes with Mac OS X will fake a bounced message for you since it works with IMAP servers.

      --

      mbbac

  102. detection & obfuscation by RonBurk · · Score: 0

    Didn't spot two of my favorite techniques (although they're probably somewhere in that pile).

    • Only display mailto: links to paying customers who are logged on. This is not too hard if you already generate all pages dynamically and happen to maintain a database of paying customers. Can be viewed as a reward for logging on.
    • Detect the nasty bots by placing a juicy link on each page that spam bots can see but humans can't. E.g., an href that surrounds a 1-pixel dot that is the same color as the background.

    But I still wish someone would make an Apache mod that lets you devote a single process to tying up a specified number of spambot connections.

  103. A friendlier solution. by Fweeky · · Score: 2

    Rather than filling the spider with a whole bunch of (potentially valid) addresses and loading your server with bogus clients you don't want, just make it difficult for them to extract the addresses.

    I wrote a bit of PHP a few months ago that applied some spamproofing ala SlashDot (only a bit less agressive) that some might find useful.

    Highlighted Source

    Raw Source

    It performs the following munging, depending on what you specify:

    freaky@aagh.net

    freaky (at) aagh (dot) net

    freaky@aagh.N0SPAM.net.SPAMN0

    f&#114;&#101;aky@&#97;&#97;g&# 104;&#46;n&#101;&#116;

    random one of the above

    random with entity encoding

    all of the above

  104. How about trying this by SnarfQuest · · Score: 2, Interesting

    There are "scanner" traps that start up a session and just drops it (not telling the scanner) which ties it up until the scanner softare times out.

    How about writing something for these spambots using a special web server that slowly responds to it's requests (sends out a small packet every 10 seconds) so it won't time out and won't consume much cpu time, and just feeds it a line or two lines of junk with each packet. Have it randomly generate a never ending supply of useless information to keep the spambot happy. While it's busy with the useless site, it's not bothering other people nor is it getting any real addresses.

    --
    Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
  105. A simple fact that _everyone_ here overlooked... by Anonymous Coward · · Score: 0

    All of these methods for removing spambots from collecting email addresses do nothing to prevent humans from collecting addresses. Now, if you can find a way to protect your mail address from a human who enters it into a database, you have a solution.

    Nothing will prevent a packet sniffer from grabbing your email address from an unencrypted SMTP connection...

    There are plenty of clever ways to protect your email address, but there are far more clever ways to defeat ant-spam collectors. Imagine an internet worm which sent your addressbook to a spam collector. That would be a serious mail collecting agent. It wouldn't be legal, but it would work.

    I can devise worse ways at the drop of a hat.

    So don't waste my time talking about spam, or else the irony will be that all we are talking about is spam. If you don't get that, you should watch the original Monty Python skit.
    -Mike

  106. Better than a honeypot.. by multipartmixed · · Score: 2

    ..howabout a glue trap?

    1. Publish false mailto: addresses on your web pages in the same colour font as your background

    2. Change them to visible, valid addresses by munging them with DHTML properties and a
    JavaScript include file (sorry, Lynx users)

    3. When a recognizable spam-bot comes in, refuse to load the javascript include file. mod_setenvif and mod_rewrite should help out here.

    4. When a probable spam-bot comes in, serve up the page reaalllly slowly, don't close the connection until it goes in CLOSE_WAIT. This ties up sockets on the remote machine and reduces its ability to troll OTHER sites. You can do this by writing a handler for your base directory, checking the browser, and returning DECLINED for friendly people. That should be in, I think the "post read" phase.

    5. When a recognized bad address comes through to your mail server (from step 1), slooooow the SMTP transaction down as much as you can (same idea as step 4), and throw an error at the end of the 354 DATA section a few times (to force him to come back!), etc. (Some sendmail internals hacking required here, although it would be much easier to hack if you don't have any real mail and just ran a script from inetd.)

    6. Those fake email addresses. Make them all point to a common MX or group of MXes that you control the DNS for. Make sure those MX records aren't used by anything legitimate. Slooooow your in.named down for requests to that domain. A cool side effect, besides tying up sockets on the spammers end, IIRC some OSs can only make one resolver request at a time -- this'll effectively block all of his out outbound spam traffic while he's trying to look up your MX record! Also, make sure the TTL is set to about 10 seconds, just to make sure he comes back the glue trap very often.

    How's *that* for spam countermeasures? I wish I had time to write it. :-)

    --

    Do daemons dream of electric sleep()?
  107. KMail can bounce mail, too by anno1602 · · Score: 1

    KDE's KMail can bounce mail, too, manually or via filter.

  108. Re:Block? Are you kidding? by realdpk · · Score: 2

    Memory plus an Apache child. Any solution which causes Apache to be put sleep artificially can and likely will be used as a very effective DoS against your site. Unfortunately.

  109. Re:Block? Are you kidding? by Anonymous Coward · · Score: 0

    dci@cia.gov
    tridge@fbi.gov
    gbush@whitehouse.gov. ..

  110. Re:Block? Are you kidding? by Continental+Drift · · Score: 1

    I think the bots don't fall for your fake pages and mail addresses because not enough people link to your page. If you had a little more traffic there, more bots would fall for it. Write something interesting and get slashdotted, would you?

  111. Re:Block? Are you kidding? by evil_one · · Score: 2

    You don't need to do that.
    MX records do that for you.
    You can actually have email@mydomain.com when you don't have a box providing an ip for mydomain.com
    MX records say "hey, you, all the email for is handled by - as such, you could easily tell your DNS provider to set the MX for any number of hosts to 127.0.0.1

    --
    Desperation is a stinky cologne
  112. Re: Block? Are you kidding? by elemental23 · · Score: 1

    On the other hand, you'd be surprised at just how much spam is delivered to security@ I thought they'd be smart enough to avoid obvious admin addresses until I started seeing it come in.

    --
    I like my women like my coffee... pale and bitter.
  113. Re:Block? Are you kidding? by 56ker · · Score: 2

    Or billgates@microsoft.com

  114. Re:Block? Are you kidding? by Prof.+Pi · · Score: 1
    How about instead, returning pages with the email address abuse@domain-that-spambot-is-coming-from


    How about the email addresses of everyone in
    Congress, plus all the politicians in Russia,
    Korea, and the other countries with lots of open
    relays? (Perhaps excluding those who have tried
    to do something about spam.)

  115. Bzzt. by gblues · · Score: 2
    No soup for you!

    The mailto:address@foo.com?Subject=bar syntax was introduced by Netscape 2.0.

    Nathan

  116. Re:Block? Are you kidding? by Fluid+Truth · · Score: 1

    By random I don't think they mean JoeRandomUser@RandomDomain.com. I think they mean random like [output of crypt]@[output of crypt].com. It's pretty unlikely that a legitimate address is going to look like kjd73i3h@3hvcfh93.com (which was just me pushing keys). Spambots probably don't care that an address doesn't "look" like a legitimate address; they're just there to harvest everything.

    --
    Apparently, of the rich, by the rich, for the rich.
  117. Here's a Javascript that writes mailto: links... by Artifice_Eternity · · Score: 3, Informative

    ...so that you can leave them out of your HTML source:

    http://artificeeternity.com/includes/linkwrite.j s

    Instructions for use are included in comments. The script fragment that replaces mailto: links in the page will actually shorten your code -- it only requires entering the username and domain once. Also, the @ sign is added in by the script, so the address itself never appears in your HTML.

  118. Re:Block? Are you kidding? by dousette · · Score: 1

    That's a really great idea, returning email addresses with the "abuse@..." address for the domain they are coming from. Can you post the script you use to do this?

  119. Don't use a mailto and uce@ftc.gov by macdaddy · · Score: 2
    Many of the suggestions above say to put a mailto link on a hidden IMG that goes to uce@ftc.gov. THAT'S BAD! Or better put, that doesn't gain you anything more often than not. The reason I say this is because many of the spambots I've gotten my hands on lately automagically search for and remove that specific address. Many also remove all *.gov addresses. The best thing you can do as a server admin to seed an address that goes to uce@ftc.gov is to create a simple mail alias or user with a .forward on you mail server that forwards to uce@ftc.gov. That way these "smart" spambots won't detect that seeded address and remove it.

    Ideally you would actually create a spam trap account for this task and use a procmail recipe to briefly explain what you're doing in the forwarded message. That way the raw forwarded headers can't be misinterpreted as your server sending the spam.

    I do this very thing and have had great luck with it. I seed multiple addresses on key pages so that uce@ftc.gov is garunteed to receive a number of these pieces of spam. I also send this spam to the newsgroup bot for news.admin.net-abuse.sightings, a newsgroup filled with forwarded spam LARTs for us anti-spammers to search for patterns or previous spamming evidence. You just add "nanas-sub@cybernothing.org" to you recipient list and prepend the forwarded subject line with "(email)". That's it!

  120. Re:Block? Are you kidding? by mengel · · Score: 1
    This is the idea behind Wpoison Which I've had a version of installed for a long time.

    Now however, I have changed the URL I use to link to it to be:
    /cgi-bin/spambot_trap/guestbook/journal/mess age
    so that all the spambots he mentions will follow it :-).

    --
    - "History shows again and again how nature points out the folly of men" -- Blue Oyster Cult, 'Godzilla'
  121. Re:Block? Are you kidding? by slamb · · Score: 3, Insightful

    Way too much work. Here's similar Escapade [escapade.org] code:

    Not similar enough. That makes 300 queries per hit against your database, and I don't think you even used prepared statements. His code slowed their software to a crawl by sleeping. Yours will slow your software to a crawl by excessive database traffic.

  122. Robots Exclusion is usually honored by spambots by asackett · · Score: 2
    At least, in my experience, that's the case. I've got bot bait on my site that's been there for many months now, and it has yet to be crawled. I get lots of hits on robots.txt from agents I believe to be harvester bots, but none has yet ventured in. Most of the hits on the bait come from curious slashdotters.

    Just to make sure it gets said: The email address that's listed here on /. is a spamtrap. Don't use it! My user name in my domain is the same as my user name here. I didn't intend for that address to become a spamtrap, but it was soaking up so much spam it seemed wise to put it to good use.

    --

    Warning: This signature may offend some viewers.

  123. Re:Not fp, but still a wide page! by Anonymous Coward · · Score: 0

    A situation that forced me to install Mozilla.

    Can defeating MS really be *that* easy?

  124. Elcomsoft (remember them) sells spamware! by Convergence · · Score: 2

    Hey... Here's something I found out a few days ago:

    http://www.mailutilities.com/aee/

    Elcomsoft, who are the makers of the Advanced Ebook processor (remember Skylarov?), also make various email utilities. Although some look like they might have legitimate uses, at least one looks to have *no* legitimate use. (When a tool is designed to scan web pages for email addy's, and DESIGNED to pull out real names&email from web forums...)

    Read the above URL and the rest of the site yourself and draw your own conclusion.

  125. Re:Block? Are you kidding? by Anonymous Coward · · Score: 0

    Yes it is... Compile a list of email addresses of congress/senate/judges ("Law makers"/"Law in-forcers") and let the spambot's eat them up!

  126. Re:Block? Are you kidding? by Anonymous Coward · · Score: 0

    Compile a list of email addresses of congress/senate/judges ("Law makers"/"Law in-forcers") and let the spambot's eat them up!

  127. Re:Block? Are you kidding? by Sniser · · Score: 1

    You mean like this?

  128. Re:Block? Are you kidding? by kryptkpr · · Score: 1

    My personal favorite:

    krypt@mars:~$ ping warez.dal.net
    PING warez.dal.net (127.0.0.1): 56 octets data
    64 octets from 127.0.0.1: icmp_seq=0 ttl=255 time=0.4 ms

    --
    DJ kRYPT's Free MP3s!
  129. Buffer Overflow? by xkenny13 · · Score: 2
    How about things like buffer overflows? Worms/hackers have been exploiting them for years ... would:
    • <a HREF="mailto:abc(insert 1000 characters here)@blahblahblah.com">
    have any detrimental effect?
  130. Re:I see page widening is back by Anonymous Coward · · Score: 0

    "This form has been used already 0 seconds ago. You can not use a form and hit the back button to use it again."

    Is slashdot falling apart or what?

  131. http://www.mailwasher.net/ by jasonk3 · · Score: 3, Informative
    1. Re:http://www.mailwasher.net/ by Peale · · Score: 2

      Whoops! Thanks!

  132. Re:Block? Are you kidding? by BlueUnderwear · · Score: 2
    Can you post the script you use to do this?

    I'd really like to, but unfortunately, I can't get the script past that lame lameness filter... Yes, I know, I shouldn't have used Perl... If any of the editors are reading this, please consider making that filter less strict. Thanks!

    --
    Say no to software patents.
  133. Build up the mailto with javascript by maddugan · · Score: 2, Informative

    Here is what I do on my website to protect email address

    Javascript:
    function sendmail()
    {
    var string = 'mail'
    string += 'to:'
    string += 'webmaster'
    string += '@'
    string += 'domain'
    string += '.com'
    open(string)
    }

    Usage:
    <a href="JavaScript:sendmail()">webmaster</a&gt ;

    This could be expanded to pass the values need to build up the email address.

  134. At what point can spam be considered a DoS attack? by Mustang+Matt · · Score: 2

    Can I claim that all the spam these jerks send me are an attempt at a DoS attack?

    --
    The man who trades freedom for security does not deserve nor will he ever receive either. - Benjamin Franklin
  135. Thank you all very much by Anonymous Coward · · Score: 0, Funny

    Now that I've read all your countermeasures, I have created the ultimate Spam Bot to get by these traps. ;]

  136. Slashbot? by Webmoth · · Score: 2

    OK, so we've got spambot prevention. Now we need some effective form of "Slashbot" protection. I envision a webserver that will detect a high number of referrals from Slashdot and put the server into "low bandwidth" mode, serving pages stripped of formatting and graphics (with links to graphics, of course) in order that content may be delivered in an efficient manner.

    --
    Give me my freedom, and I'll take care of my own security, thank you.
    1. Re:Slashbot? by Eil · · Score: 2


      This has actually already been done. I went to one web page linked directly from slashdot the other day that had the heading (to paraphrase):

      "Welcome Slashdot visitors. You have clicked through a slashdot link to access this page and in an effort to cut down on bandwidth expenses, you have been referred to this page. It is devoid of ads, script, and graphics but otherwise contains the same content. You can click here [link] to view the original page."

      I'm surprised I hadn't seen something like this earlier, but then I guess most people never expect to be slashdotted until it actually happens...

  137. Re:Block? Are you kidding? by Anonymous Coward · · Score: 0

    Cute, except your choice of Java opens you up to a trivial DOS.

    Just start opening exceptionally deep URLs in parallel. Thanks to Java's low-powered IO system, you can suck up tens of thousands of threads that way, clogging your scheduler.

    This would be better implemented as a second server, listening on a different port, written in a language that lets you write event-based state-machine IO. This will cut your memory usage per tar-pitted connection down by 20-30 TIMES, and won't put stress on your scheduler.

    NOTE: hacks like Weblogic's native performance pack won't help you here, since you sleep.

  138. ipchains by Anonymous Coward · · Score: 0

    i pee chains

    that would hurt!

  139. iptables by Anonymous Coward · · Score: 0

    that would hurt even more!

  140. Re:Block? Are you kidding? by F�an�ro · · Score: 2, Interesting
    How about a few people volunteering real FQDNs that all resolve to 127.0.0.1? I realize that people would be volunteering horsepower and bandwidth for DNS lookups, but it would be in the name of dramatically reducing spam. Then, keep a list of all the "loopback FQDN's" and let the rest of us feed those FQDN's into spam-trap generators. Eventually, there would be so many real-looking spam trap email addresses that the spam software wouldn't be able to keep up with the list of loopback FQDN's.
    Slashdot has been doing that for years with warez.slashdot.org . try it, it resolves to 127.0.0.1
    I always enter postmaster@warez.slashdot.org in spamforms
  141. Re:Not fp, but still a wide page! by Anonymous Coward · · Score: 0

    I switched to mozilla because of slashdot too, but it was to get rid of the BFAs.

  142. Infinite loop generating fake e-mail addresses by Anonymous Coward · · Score: 1

    What if a specific page was actually a script that would forever generate fake e-mail addresses?

  143. Better than Loopback - Feed Them Open Relays by billstewart · · Score: 2
    Spamware that sends its own mail probably rejects 127.0.0.1, but spamware that abuses open relays won't notice, because it'll be the abused relay that resolves the domain name. Sendmail may be bright enough not to freak out with 127.0.0.1, though you could have fun with 127.0.0.2. And don't give them whitehouse.gov, but you might give them a FQDN that resolves to whitehouse.gov's IP address.

    But you can do better than that - Give them FQDNs that resolve to Open Relay sites, and use Round-Robin DNS if you can. If you've got your own domain, you can spare plenty of FQDNs, like mail2.mydomain.com.

    • The spammer or open relay will send spam to mail2.mydomain.com, which resolves to [relay1.school1.kr].
    • Relay1.school1.kr will relay it to mail2.mydomain.com, which your round-robin resolves to [relay2.school2.kr].
    • Relay2.school2.kr will send it to mail2.mydomain.com, which your round-robin resolves to [somebox.cn.net].
    • somebox.cn.net will send it to mail2.mydomain.com which ..... ad nauseum, ad erroneum.

    Depending on how you set up the round-robin, and where the relay machines get their DNS resolution done, you may be able to make them run in a tight little loop around the Korean broadband, or burn expensive international bandwidth between China and Sweden.

    Or you could give them random names at various spammer and spamhaus sites, or FQDNs that resolve to the addresses of spammers or spamhausen, or remove-me addresses of other spammers. They may filter out their own, and don't give them obvious addresses like abuse@ or postmaster@, but surely they won't recognize most of them, especially the latest Corrupt Nigerian Official trying to launder embezzled money.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  144. conclusions... by onShore_Jake · · Score: 1
    Not a big deal but he sez:

    they would come straight into the middle of my website, at a specific page rather than the root. This means that the spambots obviously had some kind of database of pages, which had presumably been built up from previous visits

    That's possible I guess, but mayhap a search engine done told em where to go.

    the bot would come in, scan pages rapidly for maybe a few seconds, and then stop for a while. So it was obviously making at least some attempt to circumvent blocks based on frequency/quantity of requests.

    Or it's processing...

    I guess Ill actually finish reading it all now.. sorry... pet peeve, distracted, must stop picking


  145. Better Addresses To Feed Spiders by billstewart · · Score: 3, Informative
    I've posted a separate article about fun tricks with round-robin DNS to feed spammers FQDNs that resolve to open relays, which will forward to other open relays. And if you know machines running Teergrubes, they're excellent addresses to feed spiders.*

    If you're not messing with DNS, though, there are lots of addresses that can cause trouble:

    • sales@spammerdomain.com, where the domain may be your spammer (if you customize your spidertrap) or a random spammer. They'll probably reject abuse@ and other obvious administrators, but names like "sales" and "purchasing" and "marketing" and anything that might get a real user is good.
    • randomjunkuser@spammerdomain.com. If they're not verifying the list before using it, this is good.
    • randomjunkuser@randomjunksubdomain.spammerdomain .c om
    • randomjunkuser@spamhausdomain.com, at some site that encourages spammer customers.
    • randomjunkuser@randomjunksubdomain.spammers-ISP. ne t - does the spammer's ISP check for bad DNS hits?
    • randomjunkuser@othercustomer-of-spammers-hosting -I SP.net. Your mission is to get the spammer's ISP to throw off the spammer. If you want to be much ruder, you can use real-presidents-name@othercustomer-of-spammers-hos ting-ISP.net .but both of those attacks require more customization to hit spammers you're having ongoing problems with, as opposed to shotgunning them all.
    • unsubscribeme-address@unsubscribemedomain.com - anything not immediately recognizable as "remove@". Give some other spammer's list builder a bunch of addresses to work with.
    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  146. Re:Not fp, but still a wide page! by CmdrStkFjta · · Score: 0

    What? Code some thing that works better than, but not well, with any thing Microsoft. That whole native format thing. Silly isn't it.

    --


    *SRU
  147. Teergrubes and other traps for spammers by billstewart · · Score: 3, Informative
    Teergrubes are tarpits to stick spammers in. They look like perfectly correct SMTP servers, e.x.c.e.p.t. t.h.e.y. a.n.s.w.e.r. v..e..r..y.. s..l..o..w..l..y.. and maybe generate lots of error messages requiring repetition, and basically they leave the spammer's machine tied up for a long time with very little effort. A legitimate mailing list server that encounters a teergrube will normally survive, because it's usually multithreaded, or at least has almost all its recipients as legitimate users, but an occasional few minutes of one thread stuck in a trap isn't a major problem. But a spammer who's encountering a large number of teergrubes (especially if he picked them all up at once from a spidertrap) will have lots of threads tied up for a long time and may not have enough spare capacity to bother real targets. There are a number of implementations around.

    And somewhere out there is a far nastier variant on a teergrube that can keep a typical smtp session up for hours with only a few kilobits/minute, using tricks like setting TCP windows very small, NAKing lots of packets so TCP retransmits them, etc. (It basically works by saying "No, SMTP/TCP/IP isn't a set of protocol drivers in my Linux kernel, it's a definition of a set of messages and there's no reason I should user a bunch of well-tuned efficient reliable kernel routines when I can send raw IP packets myself designed for maximal ugliness."

    • Spamido is an automated tool for collecting spammers' addresses so they can be fed back to other spammers.
    • Wpoison and Sugarplum are spidertraps that generate lots of fake addresses for a long time.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  148. Skr1p7-K1dd13@2600.com ? by Anonymous Coward · · Score: 0

    There are a whole bunch of script kiddies who wouldn't like to get any spam and might break somebody's electronic kneecaps if they get too annoyed. You wouldn't wanna do that, it'd be rude. Don't bug Emmanuel Goldstein himself, and he and many of the other people there are good guys, and surely if you've got any pretenses to being 31337 you can go hang out on the hacker irc channels and find Usual Suspekts :-)

  149. Re:Block? Are you kidding? by HD+Webdev · · Score: 0

    Spambots tend to avoid web*|admin*|root|support|help@somedotDOTorg

    --
    This is not a dream, not a dream...we are transmitting from the year 1-9-9-9.
  150. Re:using images is bad for people with text browse by Eil · · Score: 2


    Golly gee, let's see here. Ways to thwart the spambots.

    You can URL-encode and un-mailto your address.

    But spambots can still read most plaintext email addresses from the text itself...

    Then encode your email address into a piece of javascript.

    But many normal users don't have javascript turned on...

    Then write your email address into a GIF or PNG.

    But certain types of disabled people and lynx users won't be able to view those images...

    This author would argue that those two are one in the same. But still, you can also obfuscate your address for the user to figure out, providing directions on how to unobfuscate it. (NOSPAM.bob@NOSPAM.hoser.com)

    But there are many users who are too dumb to unobfusicate the address...

    Then write a web page with a form for sending the message... the email address remains hidden.

    But this is insecure / stupid / not fully supported by Mosaic 0.13beta...

    Then whoever can't use one of the above methods can go sod off. I plan to use most of these, grouped together into one contact.html page on my personal web site. If there are a couple of users in the world out of thousands who can't contact me due to technical or mental limitations, then dang them to heck for all I care.

    You see, it's a balancing act of preferences. Would you prefer to let (literally) a couple users slip through the cracks, or would you rather get bombed by potentially hundreds of spambots? Your choice...

  151. Re:Block? Are you kidding? by boky · · Score: 1

    Me again.

    HTML ate some code. If using this, change those weird ifs to this:

    if(a 3) { email = name; } else
    if(a 4) { email = last; } else
    if(a 5) { email = name.charAt(0) + "." + last; } else
    if(a 6) { email = name + "." + last; } else
    if(a 7) { email = name + "." + last.charAt(0); } else
    if(a 8) { email = last + "." + name; } else
    if(a 9) { email = last + name; } else
    if(a 10) { email = name + last; } else
    if(a 11) { email = name.charAt(0) + last; } else
    if(a 12) { email = last + name.charAt(0); } else
    if(a 13) { email = name + last.charAt(0); } else
    if(a 14) { email = last.charAt(0) + name; } else
    if(a 15) { email = last + name.charAt(0); }
    email = email + "@" + endn;

    Sorry. :)

    --
    boky
  152. Re:Block? Are you kidding? by boky · · Score: 1

    Ufff...

    It's not my day; change those weird if(a...) to if(a&lt...), of course.

    Thank you.

    --
    boky
  153. Re:Block? Are you kidding? by Anonymous Coward · · Score: 0

    next time, post it in MIME or uuencode.
    for example:

    _=_
    _=_ Part 001 of 001 of file test.zip
    _=_

    begin 666 test.zip
    M4$L#!!0````(`*VIC2P%90R7\````#H!```1``` `=&5S=%]U= 65N8V]D92YT
    M>'1-C$UK@T`41?>"_V$J+Z)5,I!%*9U.DW57 @\657U@ =R:#5EQFE*6)_>VU"
    M:'EPN9Q[>`?!#79JI#1TMG$4Q7%:/4 8..]1_/.VU;*OR2, ZJCPDEG9:":'D'6:F=X*S;1UP=C2@$SK3F@OW1"A\A@KE/\R"` )!Y'102P4&``````$`
    ,`0`_````'P$`````
    `
    end

  154. Re:Block? Are you kidding? by Anonymous Coward · · Score: 0

    (The original is off of http://perl.plover.com/obfuscated/)
    Try to get it past the lameness filter:
    @P=split//,".URRUU\c8R";@d=split//,"\nrek cah xinU / lreP rehtona tsuJ";sub p{
    @p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q *=2) +=$f=!fork;map{$P=$P[$f^ord
    ($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[ P.]/&&
    close$_}%p;wait until$?;map{/^r/&&}%p;$_=$d[$q];sleep rand(2)if/\S/;print
    Hey! It worked. wow.

  155. Re:Block? Are you kidding? by Anonymous Coward · · Score: 0

    What don't you try using the PREVEIW button next time.
    Jerkoff.

  156. Re:Block? Are you kidding? by mixbsd · · Score: 1

    Good idea for ISP's that stick to the abuse@ standard, but not much good for ISP's like Clueless & Witless, who ignore abuse@ and who use spamcomplaints@ instead. You could use a script to query rfc-ignorant.org for the right abuse@ address, but that would waste CPU and bandwidth. In any case, most spambots will ignore addresses ending in .gov and .mil, and a lot will not follow links onto .cgi pages, so I use Wpoison.cgi as a "virtual" inside a php page on my site.

  157. Blackholes? by Hyperhaplo · · Score: 1


    Since we're discussing filtering and dropping packets into the ether.. whatever happened to Blackhole (ing) software?
    The last I heard (years ago) was that a company was actively using it, and being sued by other companies for blocking their mail.
    Anyone?

    --
    You have a sick, twisted mind. Please subscribe me to your newsletter.