Domain: spamassassin.org
Stories and comments across the archive that link to spamassassin.org.
Comments · 240
-
Weapons against Spammers:Some useful links for reducing spam income:
For People with an *nix Account:
- Spamassassin ruleset-based mail analizer. Detects spam quite well, especially if you enable access to Razor and Realtime-Blacklists. Newest release includes a bayesian filter.
- bogofilter My favourite bayesian spam filter. Pro: Very good detection rates after training properly. Con: Needs to be trained.
- Use Mozilla Mail The up-to-date Mozilla release includes a bayesian spam filter which can be easily trained by marking spam messages. Very good detection rate after resonable low training effort.
- Find your favourite bayesian filter here
-
Run spamd/spamc version of SpamAssassin
SpamAssassin can run as a daemon (see here) so it doesn't have to start up the perl interpreter for each message. This is the preferred mode for large installations.
People report processing times in the range of 0.2 to 0.5 seconds per message with basic tests (no pyzor 2). Get a fast machine with dual processors, plenty of RAM, a caching DNS server, set spamd/spamc to have an appropriate number of child processes, and you should be good to go.
It's certainly going to be cheaper than the sexual harassment lawsuit that one of those 50,000 users is going to file for being forced to look at pornographic material (we require employees to read their e-mail, don't you?). -
Re:a really bad idea
The admin of the mailing list can set any price he desires to post to it. Since an ML is a recipient multiplier and a fat target for spam, they might choose $0.20 or $0.50 for a message
I don't think it's a leagally tennable idea to make a mailing list charge for messages sent to it. For starters, by charging someone a fee, you implicitly accept a number of responsibilities for the prodcut or service.
You also cannot claim that someone is abusing your service when they paid you the money you asked for. Thus, bulk mailers no longer have to masquerade behind forged headers and such, they just proudly send mail to the largest 10 mailing lists they can find, pay their $2 fee and count their revenue stream....
Very bad, do not do.
And $0.25 isn't too much for a non-subscriber who wants driver help from linux-kernel
No not at all, but who's going to pay $0.25 to ANSWER? No one of course, so only private replies will be sent, but even then it's goign to cost SOMETHING.
Also, you're being US-centric. There are many countries that take part in US-based free software development lists and many other beneficial discussion venues who would be crushed by the exchange rate on a $0.25/per message fee! In some countries that would be enough to buy a meal, and I'm going to answer a technical question or contribute a patch... AND PAY FOR IT?!
Fee-based sending would be the death of electronic mail as we know it, and there are better solutions -
A GREAT IDEA!
As long as there's a header that I can set to say "I will not pay the tax, please reject my mail," I'm a happy camper!
I will continue to send email to/from my friends who all run their own servers, and ISPs can go fly. If the IRS thinks that I'm going to pay them for mail that I send from my MTA to my friend's MTA that are both located in our own personal machine rooms, I'll set up UUCP mail and let them figure out if that counts....
Yeah, so sarcasm asside, this idea fits into a class that you should be looking for. Let me quote:
a tax would be an affront to some mythic libertarian "spirit of the internet"
No, a tax would be an unworkable mess that would have so many problems you cannot possibly measure them! Yes, the spirit of the Net is a network of peers who exchange packets at the IP level and let applications decide what to encapsulate, so there's some basic problems there, but then you get into What is mail? Should I tax ICMP? ICMP isn't even IP, it's a sister protocol, but if ICMP isn't charged for then I could just write SMTP/ICMP and encapsulate the protocol in ICMP datagrams (yuck, but it would work). If you tax ICMP, then you're charging me for things like my Linksys firewall rejecting network probes!
The Internet should turn into a penny post, with a levy of 1 cent per letter.
Define post. Define letter. Define pay (if I live in a community in India that's mostly barter-based and there's an email kiosk in the center of town, set up by volunteers....) Assume that the vast majority of my mail comes from a private residence and goes to private residences and businesses, not to public ISPs (which is the case) and try to figure out how we go about collecting a "tax".
I pay a tax for my connectivity, it's called ISP fees. If the US government wants to charge a tax to ISPs, they'll have to talk to the ISPs, but I assure you AOL will lobby against it pretty seriously, so you'd better have your facts in line detailing exactly how it will prevent spam from Russia while also not hurting the consumer.
Rather than punditing, I'm actually contributing to the solution. Please keep your "there oughta be a law" reactionary drivel out of my Internet. -
Spamassassin plugin
I've been using Spamassassin along with the Razor and DCC plugins and it works very well, 99% of the spam that enters my Inbox is clearly labeled as such. However, does anyone know of a piece of software that will automatically add the IP address of the mail server that sent the spam to my sendmail access.db reject list? If there isn't such a thing, already, I could probably write one myself, but I don't want to go through the effort if it's already been done.
-
Spam is deadGet used to a mailbox full of
... whatever you want, including nothing.
Spam tools are currently at the point tht detection of spam is a near-certainty and the probabilities for false-positives (e.g. good mail getting called spam) are measured in the 0.00n-0.0n% range (that is n in 100,000 to n in 10,000) which can almost always be improved on locally by the user through various means that are anti-spam-tool independant.
SpamAssassin is currently my tool of choice. It's very flexible, can be used with any UNIXish mailer and is just getting frighteningly better over time.
SA's recent addition of Razor2, a Bayesian filter and improved handling DNS blacklists (which SA weights so you can apply them withour worrying about slicing large and useful parts of the Internet out of your field of view) have reduced many concerns that folks had before about active abuse of SA's rule-base in the past. The speed with which this system applies hundreds of tests to a message is also quite stunning, and a major boost to Perl's tacit reputation as a "slow" language.
The biggest problem with SA right now is probably the inability to scale up to the mid-range ISPs and medium-sized business without SERIOUS harware allocation due to the heavyweight neature of its testing. That's my personal mission for SA over the next year or so. My goal is to make SA a reasonable option for anyone that has to process orders of magnitude more mail than your average ISP (e.g. AOL).
When the upcoming 2.54 comes out, I HIGHLY recommend checking it out. You can install SA on most UNIX-like systems, as long as they have Perl installed by typing (as root)perl -MCPAN -e shell
following the configuration process if you have not done so for Perl before, and then typinginstall Mail::SpamAssassin
After that it's just a matter of how you want to configure your MTA to talk to SA. I recommend using SA in "spamd" mode with sendmail and procmail. If you already use sendmail with procmail delivery, you just have to change your .procmailrc by adding rules to invoke SA, and there are good examples of that on the SA site. You can also use qmail (officially qmail doesn't support this kind of thing, but if you use the standard set of patches that most every has to apply, it's reported to work fine) and postfix (though postfix has some complexity when it comes to setting up any kind of uni-directional filtering).
Good luck! -
Spam is deadGet used to a mailbox full of
... whatever you want, including nothing.
Spam tools are currently at the point tht detection of spam is a near-certainty and the probabilities for false-positives (e.g. good mail getting called spam) are measured in the 0.00n-0.0n% range (that is n in 100,000 to n in 10,000) which can almost always be improved on locally by the user through various means that are anti-spam-tool independant.
SpamAssassin is currently my tool of choice. It's very flexible, can be used with any UNIXish mailer and is just getting frighteningly better over time.
SA's recent addition of Razor2, a Bayesian filter and improved handling DNS blacklists (which SA weights so you can apply them withour worrying about slicing large and useful parts of the Internet out of your field of view) have reduced many concerns that folks had before about active abuse of SA's rule-base in the past. The speed with which this system applies hundreds of tests to a message is also quite stunning, and a major boost to Perl's tacit reputation as a "slow" language.
The biggest problem with SA right now is probably the inability to scale up to the mid-range ISPs and medium-sized business without SERIOUS harware allocation due to the heavyweight neature of its testing. That's my personal mission for SA over the next year or so. My goal is to make SA a reasonable option for anyone that has to process orders of magnitude more mail than your average ISP (e.g. AOL).
When the upcoming 2.54 comes out, I HIGHLY recommend checking it out. You can install SA on most UNIX-like systems, as long as they have Perl installed by typing (as root)perl -MCPAN -e shell
following the configuration process if you have not done so for Perl before, and then typinginstall Mail::SpamAssassin
After that it's just a matter of how you want to configure your MTA to talk to SA. I recommend using SA in "spamd" mode with sendmail and procmail. If you already use sendmail with procmail delivery, you just have to change your .procmailrc by adding rules to invoke SA, and there are good examples of that on the SA site. You can also use qmail (officially qmail doesn't support this kind of thing, but if you use the standard set of patches that most every has to apply, it's reported to work fine) and postfix (though postfix has some complexity when it comes to setting up any kind of uni-directional filtering).
Good luck! -
Not capable of completely dealing with the plague?
Fair enough if they don't think it can be completely eliminated, but it would be nice if the article would mention a few tools like http://spamassassin.org
-
Maybe not completely...
...but SpamAssassin in combination with Razor and Distributed Checksum Clearinghouse works quite well on most mail servers I've seen.
-
Re:Why does filtering work for me?You're right, spamassassin does seem to be pretty heavyweight. I hadn't ever bothered timing it on the command line before. It can be run in daemon mode, which would at least eliminate perl startup/script compilation costs.
I suppose on a really busy mail server SA would peg cpu. But I'm guessing most installations have plenty of cpu to spare for SA.
-
Re:It's right there in my email
No, there is only one killer app everyone really wants and needs. It's the killer app that kills spam...
Yes, and it's called SpamAssassin. -
Let's see here
Spamassassin running: Check
Add the following to ~/.procmailrc: Check :0:
* ^Subject.*\[SPAM\]
! webmaster@emarketersamerica.org
Justice Served: Check
Now, to get taco to do the same so he'd stop complaining about the spam he gets.... -
I'm down to two a week now
I was getting 500 spam a day. Hot damn, that is a lot. I have a bunch of URLs and I was promiscuous with my e-mail address(es). I had them up in newsgroups, message boards (even slashdot), I subscribed to crap, I bought things online, I registered at countless sites... and never with a condom. I have a paypal account, and I have registered at a few casinos (not to play, but to look for security holes - but that doesn't mean they don't still spam the hell out of me). And then my friends and I go through periods of signing each other up for things when we are asked to fill out forms - so it is hard to say how much of that has happened.
The bulk of what I was getting was from the URLs that I have registered - those URLs were setup to forward all mail at that address that didn't have an actual e-mail address to my address. So I disabled that feature to some extent, and it dropped my daily spam count down to a little over 120 or so a day.
So I then got curious and went through and "unsubscribed" from a bunch of them just to see what happened. My spam went down to about 30 a day. Hot damn, it worked.
But then it came back up over time - not sure if the unsubscribing just got my name on other lists, or if it just grew over time.
So I installed spamassassin, at the time 2.5 was in devel, so I used that. Various builds were better than others, and it got me down to about 1 or 2 spam that snuck through everyday.
Since then I have installed 2.6 and haven't kept up with the development builds as often since the changelog wasn't... well, wasn't changing much over the time that I was watching it.
I run it as the perl script, not the faster c daemon. I am on a shared server and scripts have to time out after 30 seconds of cpu time. So if the perl script is doing a lot of stuff, it gets killed, and the mail gets sent through.
So that was the bulk of the spam I was getting - not that spamassassin mistagged it - but that it was dying and letting it through that way.
So I went in and changed my settings. I disabled all of the blacklist checks (score RAZOR_CHECK 0 and score RAZOR2_CHECK 0). I raised the autolearning threshold to be higher so that it would do that less frequently. I have my good contacts on a whitelist. I made the required_hits spam score to be 3.5 instead of the default 5. I went in and made the 90% bayes score 3.5 and the 99% score to 4. I skipped the rbl checks and made the max attempts on anything that would try multiple times if there was any failure to be low (1-2).
As a result, it rarely kills the process now unless the server is under a lot of load - and now I get about 1 or 2 spam in a week instead of in a day.
I am a very big fan of spamassassin. -
What would have helped...
This is a consumer document meant to tell folks how to stop getting as much spam.
Useful insofar as it goes, but what would be much more helpful is an objective take on how spam gets to the end-system. It's very hard to generate this information. You can come up with the list of final-hop relays, but that's not as useful as you might think, since most of the really crappy spam software out there finds open relays dynamically and routes through them.
Slightly smarter software is now making it out there that performs some simple testing to determine how / if a given relay of choice can reach other sites. So for example, AOL's recent blocking of Commcast customers will help them in the short term, but over time they'll find that spammers simply stop using those relays and start using the ones that can get through. As new relays pop up, they will be used... eventually you would have to simply stop accepting mail in order to correctly prevent spam.
Like I say, it would have been useful to have the data on where spam is actually originating, but even without it, you can block spam with a very high degree of certainty based on the sender and relays with a much lower false positive (failure) rate than any of the bogus blacklist schemes out there. I'm about to add a module to SA to do just this, so stay tuned.... -
Re:bouncing mail to postmaster?A status of 550 should only be sent in response to a command, not to connection.
Correct, and what's more they issue that 550 ending with "550 Goodbye" and then a connection reset (TCP-"R") packet, which is also in violation of the RFC.
If you run SpamAssassin, I highly recommend adding:score RCVD_IN_RFCI 0 3 0 3
to your /etc/mail/spamassassin/local.cf. If everyone on the net does this, it won't block AOL's mail (or any other RFC-ignorant site), but it will mean that you have a much lower level of tollerance for spam-like mail from them.
It's not punative so much as showing them the right way to have solved this problem. Yes, AOL gets a lot of mail; yes, filtering spam out of it is hard; but if they simply weighted blacklists based on how accurate they are (as SA does) and then combined the results of several lists from dynips to rfci to relays with those weights, then they could make an accurate assessment, inform the sites that are blacklisted appropriately (in conformance with the RFC).
Ultimately, even after issuing that 554, if someone pushes on with a "RCPT To: postmaster@aol.com", they should accept it so that the site has a usable route for delivering mail to assert that the problem has been solved, but that would be a rare occurance if the lists were public and used/maintained correctly.
Bah. -
For those of you who think this is okay . .
Let me just point out a few things:
1) Although I've never used my ISP's mailservers for outgoing mail, my friends have -- and mail is constantly lost, or delivered hours late.
2) Likewise, my ISP's incoming mail servers are frequently down, losing mail, and full of spam (the address was either harvested or sold, I don't know which. I have evidence of it, but that's another thread). A couple of my own local accounts suffer from spam as well, but I managed to install Spamassassin, which must be too difficult for my ISP.
3) Privacy is a concern with me, and I'd prefer to handle mail transactions myself.
4) I like the reassurance of looking through my Sendmail logs, knowing that an important message was delivered, and if it wasn't, the reason why.
5) Although this is unrelated, my friends often complain of outages when my service is fine. The reason? My ISP's DNS servers are constantly screwed up, yet I run my own.
6) I run majodomo to host a small mailing list of 20 of so members (that moves perhaps 500 messages a month); that's not enough traffic to justify having it hosted somewhere else, and Yahoogroups butchers messages with advertisements. Luckily none of its members use AOL.
7) I check my mail logs often (to make sure nothing unordinary is going on), and do not allow relaying.
Many of us run mail servers simply because our ISPs are unreliable. Many ISPs can't even host a measly 5mb of web space adequately, so I feel weary letting them handle important E-Mails. I wish Speakeasy was available in my area, it would be a no-brainer switch.
You've probably heard the saying, "tolerating excesses in order to preserve freedoms." Well, Spam is an excess -- a very horrible excess. At the same time, enough people use home mail servers for justifiable reasons that outlawing them, or blocking mail from them isn't a logical decision.
And besides, there's other ways to prevent spam without making anyone unhappy. Spamassassin, once configured correctly, nails just about all spam. My university filters spam on my POP account, and I receive maybe one (if that) a month; couple that with Mail App's built in filtering and I haven't actually seen a Spam message in months. The best way to get rid of spammers is to implement solutions that make their efforts ineffective on ANY level, not just by killing off one of their hundreds of other options (AOL's method). -
Re:Good move
A better move would be to use a content analysis tool such as SpamAssassin in conjunction with Vipul's Razor to check the mail for recognisable spam. Basically you get Procmail to check if each mail is on Razor, which is an online spam database. If it's on there, it gets deleted from your mail queue or if you wish, dumped into a quarantine folder. If it isn't there, SpamAssassin checks for various Spam elements like 419 scams, testimonials from "my wife Jody" etc. If it scores over a certain threshold, it gets reported to Vipul's Razor as spam, and deleted/quarantined. Should a spam get through all that, you can manually report it to Razor so next time it will get intercepted.
-
Re:Bad Addresses
How is this different from the open-source Vipul's Razor, Pyzor or DCC, all of which are already in wide use through their easy integration with SpamAssassin?
Clearly a proprietary system just won't be as good because it needs, by its very nature, a lot of subscribers to be effective. Having said this, Cloudmark seems to do alright by using Razor's network. -
Geeks asleep at the wheelI don't see why so many people at
/. cheer Gov't getting involved in the spam problem. I have been using CRM-114 and SpamAssassin for several months and the result is: it works. I get something like 4-5 times as much spam as non-spam, and *VERY* rarely does a spam message find its way into my inbox now.Before we cheer legal solutions (which will have their fair share of downsides) maybe more people should take technological measures.
Also have a look here: Annoying spammers with OpenBSD's pf
Slides explaining how Bayesian email filtering is successfulPS: I know people might say, but what about the economic cost of spam, blah blah blah. Read the slides. If no one ever gets spam, people will stop sending it, and the economic cost goes away.
Good luck!
-
SpamAssassin is really free and multiplatformNot only is SpamAssassassin free with no hidden strings attached, but you can run it on Windows (not just Linux and other Unix systems).
- If you have Perl on Windows (ActiveState, Cygwin), then the standard SpamAssassin will run fine.
- Open Source Windows client for POP3: SAproxy (disclosure: I'm one of the developers)
- Commercial: Spamnix for Eudora
- Deersoft made Exchange and Outlook versions, but they are being revamped since Deersoft was acquired, so they're not being sold for a few months.
- and more...
Not to mention all the reasons why challenge-response filtration systems are alienating to the rest of the world. Sure, you will get almost no spam, but you'll also lose a lot of legitimate email from disgruntled people who don't like being challenged. (My standard reply to TMDA challenges is to
... not. I find it very obnoxious when I reply to someone, answer a question, or heck, just email them for any legitimate reason, that I have to prove that I'm a human. It basically sends the message that "my time is more important than your time".)Thankfully, there are some strong anti-spam methods that are being developed which don't require challenge-response, opt-out lists, patented crypto, or any of the other dumb ideas I keep reading about.
-
SA still worksI've been using SpamAssassin for about a year now. It started out good, and got better. Now it's actually a little frightening how good it is.
If you want to try it out, you will (most likely) need your own machine handling mail (if you're a broadband or DSL user, this is easy enough, I'll assume you've made that step...)
Now, make sure Perl is installed.
Now, as root, type "perl -MCPAN -e shell" and follow the instructions to set up Perl's configuration system.
In that shell, type "install Mail::SpamAssassin".
Exit that shell and type "/etc/init.d/spamassassin start"
You will want to do what your OS prefers for making sure this starts at boot time, under Red Hat Linux, that's "/sbin/chkconfig --levels 35 spamassassin on"
Exit your root shell, and do the rest as your user account.
Assuming you use sendmail with procmail (see the SpamAssassin site for other MTA configuration steps), put:
into your :0fw
| spamc -f .procmailrc.
SpamAssassin is now doing its job. It just marks messages that it thinks are spam. See the example procmailrc on spamassassin.org for more information on how you can move the mail to another folder, delete it, or even more complex things. Also, there's a procmail bug that the example config can help you work around.
If you're doing this on a busy site, I recommend adding "-m 20" or so to your spamd command-line to throttle periods of intense mail delivery.
You can also configure SpamAssassin to do lots of useful stuff just the way you like it. There's a FAQ on your site that will walk you through it, but after the first time spamd handles mail for you, it will create a ".spamassassin/user_prefs" file that has good comments in it that guide you through common configuration needs (like whitelisting users).
-
Why Pay?
Why pay for some type of filter when SpamAssassin is free (as in speech)?
-
Re:Telemarketers
Really, this is ON-topic... just not till the last point i guess
:-/ This filter suggestion you have:
Beef up your filters and accept it.
is good. Your logic about the marketers needing 30 days is also reasonable. But since this is a board for nerds, I think it warrants something more involved... you want maintain control over your mailing addresses, and whether or not you recieve mail sent to them. The solutions are out there- you just need to take a few minutes to put the pieces together.
I just started using a new account for my main email address, and I'm taking this opportunity to try to break the chain of spam that I developed over 6 or 7 years of using my last address at a .edu domain. What steps am I taking? (note- of course, this is a linux-centric view. If you're using hotmail/outlook/AOL, and you're really concerned about the spam you get, my only suggestion is "find something else.")
1. Set up Procmail. If you're root, it's a little more involved... if you're not root, odds are procmail is already running somewhere on your system. "man procmail", "man .procmailrc", "man procmailex" should be enough to get you going.
2. Use Spamassassin. Once again, if you're the only user on your domain, it's more work because you have to dl/install/configure the SA program. Lucky for me, i don't have root on my mail domain, and my friendly new sysadmin had it running already- so all I had to do was set up a new procmail recipe like this one. In fact, i think i used that one, exactly.
3. Use sneakemail to generate new email addresses for any public post/contact information. Point the sneakemail account you set up to your real address. Don't ever list your actual REAL address ANYWEHRE that a bot can pick it up off the web. Don't give it out to anyone on the phone. Don't use it to send email to anyone at hotmail. Don't list it in the text on your resume or write it out in your .signature. Don't fill it in on warranty registration postecards.
#3 is the really important one- which is why i brought it up in an earlier post in this thread. You probably have another account that is getting a lot of spam right now, which is why you've read this far. So you .forward that address to your new address, where everything that comes in gets run thru procmail and SA just like any new mail. Procmail lets you set up separate delivery folders for mailing lists, so if you use Sneakemail every time you join a new mailing list, or give your address to another company online, you can direct mail coming to that address into its own folder, because sneakemail tags the "From:" headers with information as to which address someone is sending mail to. SO- to take this particular case in point, you make an "audiogalaxy" sneakemail address, and when you get spam from Sprint on the audiogalaxy address, you know that audiogalaxy sold you out. So you call them up, complain, AND THEN YOU LOG INTO SNEAKEMAIL AND TURN THEM OFF.
-
Re:Telemarketers
Really, this is ON-topic... just not till the last point i guess
:-/ This filter suggestion you have:
Beef up your filters and accept it.
is good. Your logic about the marketers needing 30 days is also reasonable. But since this is a board for nerds, I think it warrants something more involved... you want maintain control over your mailing addresses, and whether or not you recieve mail sent to them. The solutions are out there- you just need to take a few minutes to put the pieces together.
I just started using a new account for my main email address, and I'm taking this opportunity to try to break the chain of spam that I developed over 6 or 7 years of using my last address at a .edu domain. What steps am I taking? (note- of course, this is a linux-centric view. If you're using hotmail/outlook/AOL, and you're really concerned about the spam you get, my only suggestion is "find something else.")
1. Set up Procmail. If you're root, it's a little more involved... if you're not root, odds are procmail is already running somewhere on your system. "man procmail", "man .procmailrc", "man procmailex" should be enough to get you going.
2. Use Spamassassin. Once again, if you're the only user on your domain, it's more work because you have to dl/install/configure the SA program. Lucky for me, i don't have root on my mail domain, and my friendly new sysadmin had it running already- so all I had to do was set up a new procmail recipe like this one. In fact, i think i used that one, exactly.
3. Use sneakemail to generate new email addresses for any public post/contact information. Point the sneakemail account you set up to your real address. Don't ever list your actual REAL address ANYWEHRE that a bot can pick it up off the web. Don't give it out to anyone on the phone. Don't use it to send email to anyone at hotmail. Don't list it in the text on your resume or write it out in your .signature. Don't fill it in on warranty registration postecards.
#3 is the really important one- which is why i brought it up in an earlier post in this thread. You probably have another account that is getting a lot of spam right now, which is why you've read this far. So you .forward that address to your new address, where everything that comes in gets run thru procmail and SA just like any new mail. Procmail lets you set up separate delivery folders for mailing lists, so if you use Sneakemail every time you join a new mailing list, or give your address to another company online, you can direct mail coming to that address into its own folder, because sneakemail tags the "From:" headers with information as to which address someone is sending mail to. SO- to take this particular case in point, you make an "audiogalaxy" sneakemail address, and when you get spam from Sprint on the audiogalaxy address, you know that audiogalaxy sold you out. So you call them up, complain, AND THEN YOU LOG INTO SNEAKEMAIL AND TURN THEM OFF.
-
Much higher percentage, probably
It wasn't until I setup a spam filtering mail relay for my home network using a FreeBSD server running Postfix and SpamAssassin, that it really hit home just how much spam I was getting on a daily basis. Postfix is using RBLs and header filtering criteria, and that kills a lot of the spam outright. That which passes Postfix is analyzed by SpamAssassin and flagged as spam in the subject line. My MUA filters my mail and moves flagged messages to a designated SPAM folder for review before I delete it (because I will never trust an automated process like this 100%). Now that my legitimate mail is nicely sorted from my junk mail, the percentage is staggeringly obvious. I get 4 to 5 times the amount of junk mail as legitimate mail, and that is with Postfix kicking a large portion of the inbound mail before it ever hits SpamAssassin! I don't have precise figures on how many Postfix kicks, but my mail log is flooded with Postfix reject messages. And you can add to that the fact that I firewall access to my mailserver from all of Latin America and Asia because of the high volume of spam and network attacks sourced from those regions.
Based on my guesstimation, I'd say that 90-95% of my inbound email is spam. And given the fact that bandwidth and CPU power keep getting faster, cheaper, and more available, I can only see the spam problem getting worse. -
Re:Two points - not quite, IMO
Either you like to live dangerously, or you've found a miracle recipe against nasty porn spam...
SpamAssassin, I get one or two spams a week now, down from over a hundred a day (yes, seriously) before I implemented the filters.
Al. -
Summary of IETF ASRG discussionsFour days ago when this was mentioned on slashdot, I posted the following summary of what had been discussed. Sadly, this summary is still pretty complete.
From what I take from all this discussion is that the only "solution" to spam is to do the types of things that we have been doing for years, but to do more of it and quicker. Use well run DNS blacklists (Spamhaus SBL, ordb, dsbl, etc.), use good content filters (bayesian filters, etc.), use bulk mail detectors such as DCC or vipul's razor, etc.) and per-user whitelists and blacklists.
Or, combine all of the above techniques by using SpamAssassin
--
I've been subscribed to the list since near the beginning and have been following it fairly closely. Much of the discussion has been rehashes of old topics such as "what exactly is spam?", "make the sender pay something, either money or CPU", etc.
The most interesting discussions that I've seen so far are:
- Mail transfer programs (MTA) such as sendmail, exim, qmail, etc., should keep track of sender-recipient pairs. The first time the sender-recipient pair shows up, sendmail (or whatever) should issue a "temporary delivery failure". This will force the sending mail transfer program to queue the mail and resend it later. This is completely backwards compatible and doesn't require end users to do anything.
Most spam specific programs will not queue and retry, and thus the spam will be dropped.
Spammers that use real mail transfer programs or open relays will need to be able to hold all their outgoing spam for a while, increasing the spammer's costs and slowing down the delivery of spam. Legitimate email will not be thrown out, it will only be delayed and only for the first time.
Of course, you don't really want the databases to remember every sender-recipient pair forever, nor do you want to remember pairs that were added by spam so this really isn't a "first time" database, but it is close.
Apparently the "canit" program already does this, but I had not heard of this technique before.
- Spam filtering really needs to be done while the email is being received. Sendmail can already do this with the milter filter, but other MTAs should also. Most mail servers are I/O bound, not CPU bound so this really isn't much of a burden on the server.
If you filter during the email receive process, you can make the sending MTA do the bounce. This means that you will not have to deal with spammers forging "from" and "reply-to" headers. You won't have to clean up bounces that never succeed, nor will you be responsible for bouncing spam to another victim that the spammer selected for the "from" or "reply-to" headers.
Also, false positives will recieve a bounce message instead of just disappearing. This reduces the danger of important email being lost.
- There are also several proposals to deal with ways of verifying that email being sent from a given IP address and claiming to be from a certain domain is actually authorized to send email claiming it is from that domain.
Right now, there are DNS records that tell you which IP addresses are valid to try and send email to for a given domain (the MX records), but many ISPs have different machines for sending and recieving email. There are currently no DNS records to tell you which tell you which IP addresses a domain will send email from.
The problem with this kind of proposal is that there are many people who think they have legitimate reasons to forge "from" or "reply-to" addresses. It also forces ISPs to make sure that every time they add a new outgoing mail server, they need to update the list of valid IP addresses. If they forget to do this, then only bleeding edge spam filters will detect a problem.
- Mail transfer programs (MTA) such as sendmail, exim, qmail, etc., should keep track of sender-recipient pairs. The first time the sender-recipient pair shows up, sendmail (or whatever) should issue a "temporary delivery failure". This will force the sending mail transfer program to queue the mail and resend it later. This is completely backwards compatible and doesn't require end users to do anything.
-
Re:Simple Solution
I'm not bothered which is the best solution, the point is there's no mailserver side tricks being employed at the moment.
Sure there are! Everything from RBLs and phony IP addresses to source address and header analysis. Believe me, sysadmins are definitely into punishment!
You think "scumbag" is a bit harsh
Not at all ;-)
Nobody will risk the stability of their servers by kludging together 4 or 5 spam systems together now will they?
Actually, I wasn't necessarily referring to combining systems, but rather to combining algorithms. Even so, some of the systems I've seen are extensible: Tarpit, for example, will let you hook up your own classifier. SpamAssassin can work with a Vipul's Razor system and now also includes bayesian filtering.
Also, with a flexible system like qmail it's pretty easy to modify the part of the system that receives messages without risking the stability of your server. It should be pretty straightforward to hack together something that does combine several approaches, including realtime throttling of suspected spammers. In fact, I've lately been considering doing something like that on my company's mailserver. -
Re:Is spamass-milter ready for prime time yet?
Yep, a while back they fixed the problem with over 250K messages hanging the milter. It is quite solid in my experience.
I'm running Debian 3.0, sendmail 8.12.3-4, spamassassin 2.50, and spamass-milter 0.1.3a. -
I don't buy spam costs estimates either
I'm getting quite fed up with all the anti-spam rhetoric around the 'Net. All kinds of figures fly around as to the cost and magnitude of the spam problem, but most of them are obviously biased and the methodology by which they are obtained is genrally fuzzy at best. It reminds one of the figures quoted by the BSA for software piracy, or the figures quoted by the RIAA for music piracy: that is, they factor in all kinds of "intangible" costs, are based on questionable assumptions, and are impossible to verify.
It is clear that spam is a nuisance. But spam filters work miracles, and they don't have to be fashionable Bayesian classifiers either. Simple treshold or trigger based filters work extremely well for individual mail accounts. Such as junkfilter, or SpamAssassin.
Now some people will argue that filters don't solve the problem: by the time the mail arrives in somebody's inbox, the damage has been done, the network resources have been wasted and the CPU time has been spent. But that argument is meaningless without a means to quantify the costs. And again, where are the figures? How can we even reliably estimate the figures?
It stands to reason that many people benefit from inflating the costs of spam. Meanwhile nobody questions the figures because everybody hates spam. Notice how Barry manages to almost, but not quite, evade question #7 in this interview.
Spam: the non-issue that everyone loves to hate. -
Re:Spam, Spam, Go Away, Come out ANOTHER day.Install Spamassassin. Install, use, and report spam with Vipul's Razor, Distributed Checksum Clearinghouse and Pyzor so that only a few people have to read a message before the rest can skip it.
If nothing else, get a new email address and start telling all your friends. Once you are filtering out the spam, it's kind of fun to see how accurate you can get it...
I wrote a quick plugin for Becky! (my mail client most of the time) that connects to my imap server and empties my spam folder, sending it all through Spamassassin's reporting mechanism. So I just check my mail, scan for false-positives (none to date), move any spams that were missed, and run the plugin.
-Elentar
-
Same idea, different approach.
I'm thinking that using spamassassin along with qmail-qfilter and a small perl script to tie it together that envokes a sleep() loop for every spam-like message, that it could easily be used to do the same thing because spamassassin kicks back a score for the message's likehood of being spam...
cheers.. -
Re:The problem with content filtering
You're right, it's not easy. SpamAssassin does indeed have measurements that suggest a mail is not spam (the full list of tests are here, the negative ones,) but there very few.
I worked on spam filtering for a project in University, based on the computational linguistics field of genre classification. I first ran tons of spam and good email through a decision tree maker (using many statistics: word length, verb tense, number of possessive pronouns, and more) Once I let the _computer_ decide which statistics were important, I could run any mail through this decision tree and find out two things: the probability that it was spam, but also the probability that it was good email. This proved very effective, as does the simpler keyword-based prediction of bayesian filtering.
Determining statistics for good email is harder than determining statistics for spam, but it is possible.
-
Re:Wow this article isn't what I expected.
One word: SpamAssassin.
I installed this on my server and am now able to filter 95%+ of my junk mail. If you don't have access to installing software on your servers... ask your ISP! -
Re:Unfortunately, posting to /. can generate spam.
Moral: spammers hoover slashdot, so don't post your email here, ever.
Screw that. I refuse to hide or obfuscate my email address. I've been using the Internet for 15 years. I remember the time when the Internet was mostly spam-free, and people rarely forged email addresses even though everyone knew how to.
My real email address is deven@ties.org -- this is my primary personal email address, not a spam-trap address. I know that the spammers are harvesting address from Slashdot and everywhere else. I don't care. Let them have the address. I've never hidden it, and I never will. I'm stubborn that way. (It's akin to refusing to change your lifestyle in response to terrorism, even when you know you're at risk...)
Of course, since I don't hide my email address, I get tons of spam, along with "Joe job" bounces/replies for spams forged in my name, plus more bounces copied to postmaster, since I receive postmaster mail for several domains. Bring it on! It just provides me with a larger corpus of bogus email to use for Bayesian filtering, or whatever other technique I may experiment with...
I firmly believe that a technical solution will be required to solve the spam problem. Legislation won't prevent the virtually-untraceable international spams, and may not even prevent local ones if it's not zealously enforced. Social controls haven't been effective. We need to prevent the spam from being delivered in the first place, or at least mark it as suspicious so legitimate mail doesn't drown in the noise so easily.
Beyond basic filtering like SpamAssassin and Bayesian filtering, there are other technical solutions worth exploring. Human validation techniques like TMDA might help. Finding a way to punish spammers and drive up their costs, such as E-Stamps or selling interrupt rights (original paper: HTML or PDF), might be effective. (But likely a higher barrier to legitimate mail.) Some sort of PGP-style Web of Trust might be very effective if done well, but it would be difficult to build. Perhaps some "soundness" principles could be borrowed from Usenet II to create a similar system for email...
Let's cross our fingers and hope to find a truly effective solution (or combination of solutions) in the near future! -
I get odd errors every now and then
my server will tell me that "I" am trying to access it in an inappropriate way (sounds like this girl I knew in high school). I think it is usually generated from various automated scripts trying to find ways to send out stuff - I'm glad it doesn't work.
Hell - I've gotten enough nasty e-mails just from other people I know getting viruses... virii? the kind where one person gets it and then it randomly picks a name in the addressbook to send things out as and then e-mails everyone else in the addressbook.
Anyway - again, anytime anyone has spam issues, I just have to blurt out SPAMASSASSIN and then do a little dance. *dancing* -
"Do-Not-Call" lists really work...
Prior to signing up for the Missouri "Do Not Call" list about a year ago, we would get 1-2 telemarketing calls each evening, usually during dinner. Now we get none.
I'm not sure if a "Do-Not-Email" list would be as effective, but if they were, I'd be the first person to sign up. I'm now getting close to 75 spam emails each day. Fortunately for tools like Spamassassin and the new Mozilla email client with built in junk email filtering, at least I don't have to look at them! -
May work for US entities which follow laws...
... but seeing as how most of my SPAM is from out of the country... oh well. This is a good start to get American business SPAM out of my inbox, I'll have to rely on procmail and SpamAssassin for the rest of it, I guess.
-
What a Sick Sales Plug!!!
I hate to do this because it's only partially complete. But I have a concept worked out on how to handle spam that works extremely well and removes the chance of false positives, especially from Real People.
It's not a money-making scheme, but it is prior-art <grin>.
The idea is a hybridization of SpamAssassin and tmda (tagged message delivery agent) wherein you accept all email into your inbox and the spam goes into a spam mailbox. Nothing New...
The cool part comes in when you start automating the spam_mail similar, at least conceptually, to what I have on my website. Shameless plug here
The idea is that you send out an email confirmation, similar to tmda, for only that email which is considered spam (by SpamAssassin). This means that most of your regular communications would go unhindered. But it would also make casual contact via email the easy and simple function that it is supposed to be.
These notions of having an email list of only your known contacts is a pain in the arse and most times met with extreme hostility. This is especially true if you are attempting to contact someone privately from an email list, or from a solitication from their website.
I have to warn you that if you use the code as described on my website you will probably break your server in the first day. I've rewritten it to scale much better (1,000 spams every 10 minutes). But I haven't had the chance to post the new code. But conceptually it rocks!
I've processed something like 20,000 emails without taking a single false positive, unless the original sender vegged... but then he didn't really want to talk to me anyways now did he?
The point is, it places the responsibility of delivering spammy mail to the sender. I do not have to receive it. However it allows the non-spammer to go about the internet unhindered.
-
32k Window...
The fact is, that unless your SPAM corpus and HAM corpus are both under 32k, this won't work. Gzip is fast because it only has a 32k sliding window, meaning that it only searches for like strings in a 32k window around what you're currently compressing. Hate to break it to you, but 32k is not enough for a corpus. I think Bzip2 uses something larger (900k?), but I forget what it is.
I'll be happy with spam assassin until I get CRM114 (and mailfilter) trained and working. -
"ignored" - hardly
ignored at least 750,000 requests by consumers to be taken off their lists.
I'm sure they didn't ignore them - they use those responses to determine that they now have a confirmed live e-mail address which is worth more than a bunch of e-mail addresses that nobody checks.
so I'm sure they don't just ignore them - they likely instead do just the opposite and have much interest in those 750,000 responses and gave them a little extra attention... like logging them in their database as "live" or something like that.
All I have to say about this is 1) I wish I had thought of it all in 1995 - could have made a bundle and 2) SpamAssassin rules! -
Re:Treating the symptoms, not the problem...
And SpamAssassin will evolve to take that into account... again, without me having to lift a finger. Here are the various tests SpamAssassin currently performs on a spam email. A bit more than you imply.
Also, note : if you avoid the garbage #'s (or random text) in the subject line, then you've just made it so that you can't email Hotmail, Yahoo, or AOL users because all of those systems will block an email with the same subject if they detect it going to too many users during a certain time. Many other ISPs probably do this as well. The whole reason spammers started adding the random text/numbers was to avoid that.
And remeber - if tools (like SpamAssassin and whatever else) start to get so good that spammers have to spend a large amount of time crafting spam to try and get past them, it ceases to become profitable for them, thus they will have to raise their rates, thus the service seems less appealing to those buying it, thus the amount of spam will decrease. -
Just 81 spam today?
What's the matter, Taco, never heard of SpamAssassin?
-
Not bloody likely...
All it takes is 1 single admin with a clue to install SpamAssassin to get rid of the bulk of it.
;-) -
In one word
-
This is why I like spamassassin...The article points out that there are problems with RBLs, and that is true. On the other hand, they're very useful in blocking spam.
This is why I like spamassassin. It lets you look up DNSBLs, and include those in a mail's score. It combines these and distributed spam reporing services like razor (which could be abused, too, but only on a per-message basis, not whole sites or netblocks) with its own content-based checks and an automated whitelist facility.
-
Oh cry me a river
It's funny. First it was the spammer networks complaining about getting blocked. Now it's the customers on those networks complaining.
Here's an immediate answer to the problem. Change to an ISP that can control their network better. There are more ISPs out there than you can shake a stick at. Find one that actually cares. Now every ISP will have a spammer on it but alls it takes is a staff who cares to get the problem solved.
However good article. I personally don't agree with bouncing email - tagging it is far better like with using SpamAssassin.
RBLs however are a necessary evil since some networks are willing to allow spamming (or aren't capable of fixing the problem). There has to be some way to identify those networks who aren't playing nice.
-
Rmail in emacs howto sort w/ spamassassin headers
a.
For rmail in emacs, once you've got http://spamassassin.org on the system, like the system at the university here at this end, how do you sort the hundreds of spam commercial messages now with spamassassin headers?...
b. What functions for rmail in emacs are there?...
c. what other features of spamassassin are there?...
that neophytes might try to get to their preferred correspondents messages more easily?... -
My spam solution
I use SpamAssassin, combined with some scripts available here. Since I implemented this system last month, I have gotten exactly one piece of spam, and it got through because the body contained nothing except a URL.
-
Re:security?
Yeah, but then you've got the whole "security thru obscurity" thing working. It's no good to come up with a spam-fighting technology that doesn't work if spammers know about it. That's why we have tools like SpamAssassin, where it doesn't matter if they're aware your're using it.