DSPAM v3.6 Released

Comparison to other tools by Puramoca · 2005-10-17 00:07 · Score: 2, Insightful

It would be interesting to compare this version to other spam filters and see how it measures.

Re:Comparison to other tools by gvc · 2005-10-17 00:52 · Score: 2, Informative

TREC's Spam Track will evaluate several spam filters. There's also a toolkit for do-it-yourself comparison.
Although DSPAM is not an official participant at TREC, three configurations will be evaluated for comparison - with tum, toe, and teft training modes. Zdziarski reported some of the preliminary results in his interview, but complete and comparative results won't be available until TREC in November.
Re:Comparison to other tools by pushf+popf · 2005-10-17 03:09 · Score: 2, Informative

While it's great that it learns and makes decisions about the "spamminess" of various incoming items, the most reliable method I've found so far is Greylisting.

The moment I installed and started GLD (gasmi.net), the spam simply stopped. It was like flipping the "nospam" switch on. The spam just stopped. No false positives, no missed spam, nothing.

Every now and then I get unwanted email, but at least now it's from an actual, identifiable SMTP server, not a spam-bot.

It's an amazing improvement from implementing a really elegant concept.

Get SCUBA Diving Water conditions the coastal US waters at: bupkis.org

finally by antivoid · 2005-10-17 00:14 · Score: 1

Finally a decent anti-spamming utility. There's been a lot of hype around this product and it is not out of place. I like the way its (at least partially) integrated to clam(win?). I still feel it wont be long for spammers to find ways around this tool... but for now, great, im definately using it.

Windows and Exchange. by Jaruzel · 2005-10-17 00:15 · Score: 4, Interesting

I know I'm going to get mauled over this quesiton... but has anyone compiled it on Windows 2003 server ?

For practical reasons I don't have linux in my test lab, and I'd like to have DSpam on my Webserver which is running IIS6 and Windows 2003 Server.

I can see I need to run it in SMTP mode with a relay to my Exchange box, but I don't want to waste my time trying to compile it (using Visual Studio), if someone already knows it wont work.

-Jar.

--
Together, We Can Make Slashdot Better. I Do NOT Mod ACs. - Check Me Out

Re:Windows and Exchange. by myspys · 2005-10-17 00:40 · Score: 4, Informative

from the FAQ (http://dspam.nuclearelephant.com/faq.shtml#1.15)

Q. Does it work with Windows?
A. v3.2 is the first to include a Windows build supplement, which includes the necessary Visual C++ project files and portage to compile the agent and tools under Windows. Check out the win32/ directory in the source tree for more information. Win32 support is still unofficial, but seems to work well. Of course getting it compiled is one thing, getting it integrated is another. It's probably best to build it under Cygwin using the general distribution.
Re:Windows and Exchange. by wwwillem · 2005-10-17 02:09 · Score: 1

A. v3.2 is the first to include a Windows build supplement
I downloaded version 3.6.0, but there seems to be nada :) support for Visual C. No win32 directory to be found. However on the download page, in the unsupported section, there was also DSPAM v. 3.2.8, which indeed does contain the Windows stuff.

--
Browsers shouldn't have a back button!! It's all about going forward...
Re:Windows and Exchange. by Nuclear+Elephant · 2005-10-17 02:11 · Score: 1

Version 3.4 has win32 support, but nobody wanted to maintain the build kit. It stopped working with 3.6 and was removed. You can build 3.4 natively in Windows, or you can build 3.6 under Cygwin.
Re:Windows and Exchange. by Jaruzel · 2005-10-17 04:18 · Score: 1

Cheers for that, Mr Elephant. :)

I owe you One (1) Beer.

-Jar.

--
Together, We Can Make Slashdot Better. I Do NOT Mod ACs. - Check Me Out

SPAM (TM) by Uukrul · 2005-10-17 00:21 · Score: 1

There isn't any trademark problems with DSPAM?
SPAM is a registered trademark of Hormel Foods Corporation, and DSPAM aren't the Monty Python.

--
My city: Barcelona.

Most likely need cygwin. by khasim · 2005-10-17 00:26 · Score: 1

That was how earlier version worked. I don't know of anyone who actually got them to work natively under Windows.

Still getting on Hormel's nerves, I suppose by Anonymous Coward · 2005-10-17 00:30 · Score: 1, Informative

DSPAM is also noted for their trademark spat with Hormel, who tend to be nice about "spam" as a term until it's spelled in all-caps. (Previous Slashdot coverage.)

Too late by mordors9 · 2005-10-17 00:30 · Score: 2, Funny

But the great news is this product is no longer needed. After all the FBI has put a stop to all of that: http://www.detnews.com/2005/technology/0510/16/B01 -349738.htm (For those that are easily confused, the comment was tongue in cheek)

hiding your address by Douglas+Simmons · 2005-10-17 00:40 · Score: 0, Flamebait

Other than annoying whitelists, there is no anti spam warez that is bulletproofly reliable. The best defense against spam is never to type your personal address anywhere on the internet. Once your address is spotted by a bot, you're screwed and it will only get worse over time.

If you want to give your site's visitors a simple way to contact you without losing your email addess to the spam harvesting "folks" (not to mention without forcing your visitors to fire up a client they probably don't even have configured with a mailto link), just set up a simple form and use simple php to make it convenient for them to reach you while keeping your email address safely tucked away.

Though this is only possibly with PHP, ideally running on a Debian system, it's the most important language to learn in the universe. For a starter's guide, check out this site.

Re:hiding your address by Bogtha · 2005-10-17 00:51 · Score: 4, Insightful

Though this is only possibly with PHP, ideally running on a Debian system, it's the most important language to learn in the universe.

What kind of fuckwittery is this? No, plenty of languages can code a simple contact form handler, the platform you run it on is pretty irrelevant, and PHP is by no means "the most important language to learn in the universe". It's a pretty typical scripting language, not the magic you make it out to be.

--
Bogtha Bogtha Bogtha
Re:hiding your address by Anonymous Coward · 2005-10-17 00:51 · Score: 0

Not just PHP, you could use python. or perl. or ruby. or any one of a bazillion other language with the appropriate hooks into Apache.
Re:hiding your address by kimba · 2005-10-17 00:52 · Score: 1

The best defense against spam is never to type your personal address anywhere on the internet.

You have to do more than that. You also have to not email anyone, and also not have an easy to guess username.

The problem is, you can never publish your email address anywhere - and someone else will gladly do it for you. All it takes is one person you have emailed to come down with an email virus, which then propogates your address all over the net.

Email address synthesis will also guarantee unless you have the most obtuse email address, it will end up getting spam too.
Re:hiding your address by BigJim.fr · 2005-10-17 00:53 · Score: 3, Interesting

> The best defense against spam is never to type your
> personal address anywhere on the internet.

Hiding your address does not work because some viruses collect addresses from your correspondents addressbook. Your address will percolate to spam lists, it is only a matter of time. If like me you have kept your adress for many years, you absolutely need some form of spam defense.
Re:hiding your address by MichaelSmith · 2005-10-17 00:53 · Score: 1

The best defense against spam is never to type your personal address anywhere on the internet.
You still have to communicate with people, and many of them will have windows boxes which will get rooted at one time or another. It is made worse by people who innocently spam whole lists of people with documents or joke emails. Your address can get spread around that way.

--
http://michaelsmith.id.au
Re:hiding your address by Damer+Face · 2005-10-17 01:00 · Score: 1

And make damn sure that your code isn't vulnerbale to "e-mail injection" exploits; these will result in spammers using your simple form to spam others AND you getting your hosting revoked.

See, eg, here: http://www.nyphp.org/phundamentals/email_header_in jection.php
Re:hiding your address by ozbird · 2005-10-17 01:01 · Score: 1

The best defense against spam is never to type your personal address anywhere on the internet.

It's at least ten years too late for that for me, and I'll be damned if I'm going to give up my email address now just because of a few pesky spammers. Besides, the worst of the spam flood seems to be over. A year ago, I was getting hundreds of spam messages a day; now I might get ten, occasionally twenty a day. SpamAssassin + ClamAV identify the vast majority of those.
Re:hiding your address by CmdrGravy · 2005-10-17 01:09 · Score: 0, Offtopic

Fuckwittery, an excellent term. I have been a fan of the term "fuckwit" for a long time, largely thanks to it's flexibility, and you have added another string to my bow with this excellent post. Thanks.
Re:hiding your address by edesio · 2005-10-17 01:21 · Score: 1

You can also use a "short-term" e-mail like the ones provided at SpamGourmet.com.
Re:hiding your address by HermanAB · 2005-10-17 01:21 · Score: 1

Never heard of dictionary attacks on domains have you?

--
Oh well, what the hell...
Re:hiding your address by Damer+Face · 2005-10-17 01:26 · Score: 1

> or you can always do stuff like: foo AT gmail DOT com
> or you can always use the html encoding for the characters in the email

These are no protection against a number of more advanced bots, and that number will increase over time.

Also, in many situations, like signing up for stuff online, an encoded email address won't be seen as valid input and will be rejected out of hand.

> or you can always just put the words inside an image.

This might work on your personal website, but is useless in most situations.

> or you can always use a real email for friends, and a spam email for
> everything else.

By far the best method in my opinion, coupled with educating your friends so that they don't fall pray to malware.
Re:hiding your address by shaitand · 2005-10-17 01:42 · Score: 1

Aye. It is pretty obvious the gp is something of a fuckwit. However, for its intended purpose PHP is practically magic. Personally I have always been something of a Perl addict and then one day I was pondering some web work and decided to dive into php by recoding a couple of perl scrips in php. I was simply amazed at how much more simply one can do web cgi's in php.

For just about everything else there is still perl (which is definately superior to php in every NON web task) and when perl fails there is C (or C++ for those who believe that it's ok to make programs eat more cpu than they have to simply because cpus are faster than they used to be).
Re:hiding your address by Antique+Geekmeister · 2005-10-17 01:47 · Score: 1

Also, spammers steal addressbooks or buy them from unethical employees, others make partnership contracts where you've submitted a contact address and use those contacts to get spam addresses, some spammers use alphabetical or name-guess spam, and any unethical sysadmin with a clue can use the mail logs of his servers to generate a list of valid email addresses from other sites for sale.
Re:hiding your address by dvaldenaire · 2005-10-17 01:49 · Score: 1

I think this is the kind of things like, you know, "humour".

As you know, comments on PHP vs. Other Scripting Languages are totally useless... ... because PHP is the best. (the same joke with Debian and Other Distribs is left as an exercice to the reader...)

--
What does it mean, "appended to the end of comments you post"
Re:hiding your address by imroy · 2005-10-17 02:02 · Score: 1

Have you been living under a rock for the last ten years? Of course web programming in PHP is easier than CGI! Just about anything is easier than CGI, not matter what language the CGI script is programmed in. If you want a similar (but more powerful) PHP-like environment for Perl, I highly recommend HTML::Mason. Two other interesting mod_perl environments are AxKit (centred around XML and XSLT) and Catalyst (a tight MVC framework). But they both are rougher to develop on, requiring restarts of Apache to load new code. At least Catalyst provides its own mini server for testing/development purposes.
Re:hiding your address by horza · 2005-10-17 02:09 · Score: 1

just set up a simple form and use simple php to make it convenient for them to reach you while keeping your email address safely tucked away

All you've done is swapped vigilence in maintaining anti-spam on your inbox to vigilence in protecting your contact form against spammers abusing your email form as a spam gateway. My contact form page gets an attempted hit every couple of days (usually a combination of MIME attachments in the comments field and injecting a BCC field to forward to the recipient) and this is a low volume site. Anyway, your email only has to leak once for it to propagate and it may not necessarily be you that does it. You'll find the spam blocker built into Thunderbird does a good job if you don't want to bother installing Spamassissin/DSPAM on your mail server (at the expense of extra bandwidth and download times).

Phillip.

--
Property for sale in Nice, France
Re:hiding your address by MasTRE · 2005-10-17 02:38 · Score: 2, Funny

> Other than annoying whitelists, there is no anti spam warez that is bulletproofly reliable.

Yeah yo, no bulletproofly reliable warez yo!

> ...just set up a simple form and use simple php to make it convenient for them to...

Make it convinient to root your server, yo! Yeah, yo! Bulletproofly warez, yo!

> Though this is only possibly with PHP...

Yeeeeaaaah, buddy! Warez, yo!

NOT!

Whatever TF this guy is smoking, you lemmings shouldn't mod it +4/Informative. It's a crap post.

--
Must-not-watch TV!
Re:hiding your address by samkass · 2005-10-17 03:31 · Score: 1

"A year ago, I was getting hundreds of spam messages a day; now I might get ten, occasionally twenty a day. SpamAssassin + ClamAV identify the vast majority of those."

For me, most spam (unwanted email not intended for me personally) I receive are either bounces or "confirmation" emails from other people's spam filters. Since spammers never send FROM their own address, they usually just pick a random address off their list and send from them (ie. Mine.) So bounces go to me.

These days, I've started clicking the "confirmation" URL on all of those "Please confirm you are a real person" emails just to make those people stop using their broken, idiotic anti-spam systems that just make life worse for the rest of us.

--
E pluribus unum
Re:hiding your address by Monkier · 2005-10-17 03:38 · Score: 1

also if your email is a combination of firstname and/or surname - chances are the spammers will guess it..
Re:hiding your address by shaitand · 2005-10-17 04:02 · Score: 1

shhh don't tell anyone but when you program in PHP you ARE still programming to the CGI ;) In fact everything you mentioned above still interacts with the client via CGI and html/xhtml just like it has for the last 10 years.
Re:hiding your address by hobbit · 2005-10-17 06:32 · Score: 1

1990 called, they want their webserver back.

Why not use Apache + mod_perl/mod_php, like the vast majority of souls in the known universe?

--
"Wise men talk because they have something to say; fools, because they have to say something" - Plato
Re:hiding your address by cloudmaster · 2005-10-17 07:23 · Score: 1

Since you may have been serious - CGI stands for "Common Gateway Interface". In other words, CGI defines the "common" "interface" between the browser and the webserver (aka "gateway"). Many early CGI programs were written with perl, and several still are. I've written several CGI programs in C, PHP, perl, and bash - among others (Cold Fusion is something I'd like to forget - what a POS!). Using mod_blah generaly just moves the interpreter (or parts of it) into the web server so you save the launch time and can use nifty persistance stuff. None the less, you're still technically using CGI if you at any time submit data to a webserver using GET or POST. Yes, it's still cgi even if it's not handled by a perl script with a .cgi extension.

CGI's really a badly understood, oft misused term - and I've not explained it all that well - but hopefully the general idea's a little more clear. :)
Re:hiding your address by Nethead · 2005-10-17 07:26 · Score: 1

And if you're running a mail system for 10,000 Real Estate agents..... 4x Barracuda 400 Spam Firewalls.

--
-- I have a private email server in my basement.
Re:hiding your address by imroy · 2005-10-17 09:42 · Score: 1

Huh? My understanding of CGI was that it defines the interface between the web server and the program/script. It defines how the URL, headers, and POST variables are passed to it, and how the program/script returns the page and status code. The Apache modules like mod_perl, mod_php, and mod_python put the interpreter into the web server, eschewing the overhead of launching a program (and parsing the perl/python) for each request. Thus the interface is an internal Apache API instead of the CGI. Now, mod_perl allows you to emulate the CGI environment and reuse your Perl CGI scripts with a speed/efficiency increase. But it's not the only (or the best) use of mod_perl.
Re:hiding your address by cloudmaster · 2005-10-17 10:33 · Score: 1

You're right - it's the interface between the app and the server. Doh. :) Though, isn't the mod_* API more of a superset of CGI rather than a replacement? Trying to save a little face here... ;)
Re:hiding your address by Alioth · 2005-10-17 21:38 · Score: 1

Or any injections at all. I host a modest number of people's domains (a dozen people). One user had PHPBB. When I told him what trouble his buggy, old version of PHPBB had caused, he swore he'd deleted it - all he'd actually done is removed the links to the board, but the code was still there.

A Romanian phishing gang found it, and tried to send over 2 million phishing emails by uploading a PHP script via the exploit. Fortunately, the way I have the email relay configured (the firewall blocks port 25 egress from the web server, so the system has no choice but to relay it through my relay), it shut down after only a handful of phishing emails went out and I could contain it (and had all the evidence to find out who did it). It's prompted me to make the egress filtering tighter though - I had allowed port 80 outbound because it was convenient, now I've told all the users they have to tell me what addresses they need because the rule is now default deny.

--
Oolite: Elite-like game. For Mac, Linux and Windows
Re:hiding your address by shaitand · 2005-10-18 00:47 · Score: 1

Don't worry CGI is NOT just an interface between the webserver and the application. CGI also defines much of the information the browser is required to exchange with the webserver.

Try DSPAM by ajs · 2005-10-17 00:41 · Score: 3, Informative

I'm a long-time proponent of and rare contributor to SpamAssassin, and I'll continue to be, but fighting spam is much like fighting disease: you have to diversify your defenses. DSPAM is a nice package, and is very well designed. I've spoken to the author in the past, and he has an excellent understanding of the complexities of the issue (as opposed to the legions of people who seem to think that spam filtering should be easy, given the right algorithm).

As far as I'm concerned there are two tools for spam filtering: DSPAM and SpamAssassin. Try them both. See what fits your needs. My impression is that SpamAssassin provides more knobs and buttons and is more easily extended by the casual user, but DSPAM can be lighter weight. Both are highly accurate, with very low false positive rates.

Re:Try DSPAM by gvc · 2005-10-17 01:05 · Score: 1

There are lots of alternatives. Bogofilter, spamprobe, spambayes, popfile, dbacl, are all quite effective.
Re:Try DSPAM by jaseuk · 2005-10-17 01:30 · Score: 1

The problem with SPAMD, SpamAssassin etc. is they rely too much on training and user interaction. If a user has to go into the SPAM box and double check that no mistakes have been made then the system is worse than not having any SPAM checking at all as most users will not check the SPAM box, this is especially true for larger deployments where it is much harder to train users and these environments usually cannot afford for these sorts of mistakes to be made.

I've found greylisting to be the best solution so far, primarily because the user doesn't have to do anything, there is no quarantine or training and it's 99.93% effective out of the box. The only real problem is that occasionally some mail servers are not compatible with greylisting, in this case the sender would get a bounce which IMO is better than filing false positives in a SPAM quarantine folder that may never be checked.

Greylisting is almost completely maintenance free, I've been using the same greylisting daemon (postgrey) for 18 months and asside from whilelisting a handful of servers there has been no other work to keep the system running effectively and my users don't even know it's there.

Jason
Re:Try DSPAM by imroy · 2005-10-17 02:11 · Score: 1

From what I know of those projects, they're all Bayesian filters and little more. Maybe a white/black list. That's what the GP post was referring to when he wrote "as opposed to the legions of people who seem to think that spam filtering should be easy, given the right algorithm". I don't know much about this DSPAM, but SpamAssassin covers a whole bunch of tests. It started off as a list of common-sense patterns looking for the usual penis/breast enlargement etc spam in the email body and suspicious info in the headers. That's why the tests each have a weight associated with them. So it was fairly easy to add in later network tests (DNS block lists, Razor/Pyzor, etc) and a bayesian test. Each simply adds a certain amount to the email's final score. SpamAssassin won't be outdone for a long time, it will just keep adding tests as new techniques are developed. It might even be possible to add a DSPAM test into SpamAssassin.
Re:Try DSPAM by gvc · 2005-10-17 02:29 · Score: 2, Informative

I use Spamassassin with a special user configuration file and I train it systematically. In this configuration it works pretty well (much, much, better than out-of-the box). But Bogofilter and Popfile work about as well. As does just the Bayesian component of Spamassassin, ignoring all the other cruft. DSPAM, on the other hand, doesn't work at all well for me.
Re:Try DSPAM by gvc · 2005-10-17 02:41 · Score: 2, Insightful

If a user has to go into the SPAM box and double check that no mistakes have been made then the system is worse than not having any SPAM checking at all.
Not true. First, if the user's mailbox is cluttered with spam, the user is more likely to overlook good mail. More likely than a good spam filter. Second, it is way easier to scan a list of predominantly spam for occasional good mails (and vice versa) than to have everything jumbled together. Third, spam filters are good enough that one does not need normally to look through the quarantine list. Instead it can be searched if and when email goes missing. Almost all spam that is misclassified by a filter is weird in some way - cold call, internet transaction, advertising. Generally one of two mitigating circumstances holds: (1) there is a secondary social mechanism whereby the missing mail will be noticed and retrieved [e.g. nobody assumes that a cold call is delivered, and a reply to an internet transaction would be expected]; (2) the user doesn't really care about the email [e.g. advertising from their frequent flyer plan].
I've found greylisting to be the best solution so far
Greylisting "works" only because spammers aren't on to it yet. And it is intrusive - adding delay and risk of non-delivery. Greater risk, I posit, than the risk of using a spam filter.
Re:Try DSPAM by ToyKeeper · 2005-10-17 06:31 · Score: 1

This is just one admin's viewpoint... it may not reflect anyone else's experiences. It's just what I've found over the years, using both systems.

Accuracy... SpamAssassin generally offers higher accuracy with less effort, at first, but the accuracy degrades over time. DSPAM takes more effort initially, but offers higher, sustained accuracy over the long term. I see an average of about 99.5% long-term accuracy with dspam. I can't tell what the accuracy was with spamassassin, since it doesn't include a way to measure accuracy, but it certainly felt lower than that... even at the beginning, when its accuracy was at its peak.

Speed... SpamAssassin takes a lot more CPU time than DSPAM for each message. On a system with a few dozen users, the server load skyrocketed while using SA. The load would be 20 or higher most of the day. DSPAM, on the same system, kept the load low -- around 0.1 to 0.3. And that was using one of the slowest possible dspam configurations -- "train everything" mode. However, DSPAM can have issues if it uses an improperly-configured MySQL. It really needs row-level locking.

Flexibility... both systems are extremely flexible and configurable. I'd go so far as to call them spam toolkits rather than spam filters. They each allow you to essentially build your own solution. With SA, I got the best results by turning off the heuristics after a few weeks, using just the bayesian portions instead. It helped to classify spam at first, but later it just seemed to get in the way. DSPAM has an equivalent approach -- giving users a common training base from which they can grow and specialize. The difference is that SA comes with that initial training already created for you, and DSPAM requires you to build your own training base.

Overall, I found SpamAssassin to be easier to set up and configure, but it needed more attention and upgrades in order to stay useful. DSPAM seemed overall more effective and scalable, but more complex to set up.

Neither system is a complete spam solution. You'll need to configure your mail server properly, in the very least, and probably also use a greylisting system and virus filter. Most spam can be blocked before it even reaches the filters. A honeypot address isn't a bad idea either -- it can reduce the amount of manual retraining necessary.
Re:Try DSPAM by cloudmaster · 2005-10-17 07:33 · Score: 1

Were you using procmail and individual spamassassins, or using spamd/spamc for mail checking? I wonder if that's the reason people see such super-high CPU loads with SA. I was delivering around 10K-15K messages/day (roughly 50 users), with SA identifying around 85% as spam. The backup MX ran spamd with user prefs and bayesian keys stored in MySQL, and the primary MX delivered through procmail using spamc. The backup MX/spamd machine was a P3/800 with 512M RAM and the primary MX was an Athlon 1000 with 1G. The load on both was almost never above 50%, and usually hovered around 10%.
Re:Try DSPAM by ToyKeeper · 2005-10-17 09:48 · Score: 1

I saw the SA load problem happen both with and without using the daemon setup. However, the systems were slower than what you described, and did a lot more than just handle email. They were dual-500MHz boxes, but couldn't keep up with the incoming mail. Mail arrived faster than SA could process it, even though it was just a few dozen accounts. It would tend to catch up at night, but email during the day was pretty lagged.

I haven't tried dspam as a daemon yet, but intend to try it soon to see how it works. I may add it to a much larger mail system soon, depending on how my tests go. I know it's possible to support hundreds of thousands of users with it, but I don't yet know the right settings and architecture to use. :)
Re:Try DSPAM by cloudmaster · 2005-10-17 10:30 · Score: 1

That backup MX was also the primary DNS and syslog server, but that's not much of a load. The primary MX was also the pop/imap/web server, for what it's worth. My home setup is about 5 users with around 5-7K messages/day, and I run spamd and MySQL on the same box - which is a dual Celeron 400 machine. Messages come in on an AMD 5x86-133 gateway which does the DNS lookups and tehn forwards to a PPro233 which calls spamc (that one's also the web server). All three machines combined have less computing power than something you could buy for $500 now and have single hard drives (well, the ppro has an array of SCSI disks, but it doesn't count) - the two mail servers sit with triple 0.00, and the spamd box is at 0.09/0.02/0.01 right now. It still makes me wonder what the heck you're doing. :) I have the razor/dcc tests disabled, but otherwise it's pretty typical for a setup where all users access the same bayesian filter. If you put all of the config stuff into MySQL and run a modern MySQL with query caching, you can probably speed things up some, and you open up the potential of adding a second dedicated spamd box in a round-robin arrangement (or use any of the common connection balancing programs). I'm fairly agressive with filtering at the Postfix level too, though, so a lot of obvious junk gets filtered before it gets to SA - maybe that's part of it. I use header-checks and a couple of body-checks, but much of that really slows the 5x86 down...

That said, I'm gonna try dspam out tonight because it looks like it'll be easier to keep up-to-date than running sa-learn periodically, which I'm notoriously bad about. ;)
Re:Try DSPAM by ajs · 2005-10-17 14:15 · Score: 1
For the most part you seem to be:
- Shutting off auto-learn (mistake, see below)
- Upping BAYES scores (good plan, I do too)
- enabling a few knobs that are generally useful (though I've had too many false positives with RCVD_IN_DSBL).
The only thing I would critisize is shutting off auto-learn. If you want to be conservative, just lower the ham threshold and raise the spam threshold a bit. I tried to manually train for a while, and what I found was that I was actually lying to SA. auto-learn means that a view of your mail gets sampled that has not been tainted by your perceptions. This generally vastly improves the quality of the data.

Beyond that, I would just suggest that you re-consider firing up a new copy of SA for every message. spamc/spamd work quite well, and vastly reduce the overhead of spam checking.
Re:Try DSPAM by gvc · 2005-10-17 14:25 · Score: 1

Auto-learn in spamassassin is broken. In fact my mail script automatically calls sa-learn for every message, with ham or spam depending on what Spamassassin claims. Then if I want to correct it I call sa-learn over again with the correct classification. That's why the user-prefs file has it turned off.

I should make this more clear in my notes. Thanks for pointing it out.
Re:Try DSPAM by ajs · 2005-10-17 23:33 · Score: 1

Explain "broken". Works great for me.....

Training on everything is probably a mistake. Catching all of the edge conditions where that fails is going to be a very laborious task. Do all of your users do the same, or do you force their auto-learning off and have them use your bayes tokens? That has its own problems (you're not training on their mail), but at least would not leave an inattentive user in the horrible situation where they are constantly training incorrectly. That quickly leads to a broken classifier.
Re:Try DSPAM by gvc · 2005-10-17 23:54 · Score: 1

Explain "broken". Works great for me.....
Some explanation appears here.
In summary, auto-learn re-evaluates the message using only the static rules - not the bayes rules. Then, if the static rules give an extreme score that differs from the bayes score, and a couple of extra ad hoc conditions hold (number of "hits" exceeds some threshold) the bayes filter is trained.
You can adjust the "extremeness" of the score under which Bayes is trained but training will not be on what Spamassassin reports; only on what the static rules report. It is perfectly possible for Spamassassin to report "Spam" yet train as "Ham" or vice versa. This behaviour is unacceptable in a supervised training setup. I've had it correctly classify a message, only to misclassify the next instances of nearly the same message, because of this behaviour. Auto-whitelists have a similar problem.
There is no Spamassassin user parameter to alter this behaviour. I have hacked Spamassassin but it is obviously not reasonable to post a solution that requires a source change. The only way I know to make Spamassassin train properly - on its own judgments - is to force feed it externally.
The reason that Spamassassin's auto-learn is set up this way is to support unsupervised learning - in a server where it is seldom, if ever, corrected. In this setup, the built-in rules work marginally better than simple self-training. But in the supervised setup they are a disaster.
Re:Try DSPAM by ajs · 2005-10-18 02:53 · Score: 1

"In summary, auto-learn re-evaluates the message using only the static rules - not the bayes rules. Then, if the static rules give an extreme score that differs from the bayes score, and a couple of extra ad hoc conditions hold (number of "hits" exceeds some threshold) the bayes filter is trained."

Hrm... well, no.

First off "number of hits" is not an "extra ad hoc condition". Number of "hits" is exactly "score". There's no difference, just two pieces of terminology for the same thing. "Level" is another thing, but I won't go into that, as it's only there for the benefit of programs like procmail, and is not used internally.

Now, on to score with and without Bayes. I understand your initial concern, but I ask you to re-visit it. There has been substantial research into Bayes auto-learning under various systems, and what is show time and time again is that a set of well-balanced static rules (such as a set of tropisms or, in the case of SA, the static rule base) is far superior to any feedback-loop. This is why Bayes is discounted when computing auto-learning.

What you're doing is looking at outliers and saying, "see, this 'spam' was trained on as 'ham', and that means SA is broken." In fact, such errors will exist in both directions, but as long as the vast majority of spam trains as spam and the vast majority of ham trains as ham, the Bayes tokens will be correctly scored.

All that said, you seem uncomfortable with static rules of any kind, so if you don't buy into what I've said above, then I suggest that you stop using SA. Static rules are a giant advantage, but if you are going to defeat most of their value, then you might as well not suffer their overhead.

For further reading, I suggest: http://plg.uwaterloo.ca/~gvcormac/spamcormack.html
Re:Try DSPAM by gvc · 2005-10-18 04:20 · Score: 1

Hrm... well, no.
All that said, you seem uncomfortable with static rules of any kind, so if you don't buy into what I've said above, then I suggest that you stop using SA. Static rules are a giant advantage, but if you are going to defeat most of their value, then you might as well not suffer their overhead.
For further reading, I suggest: http://plg.uwaterloo.ca/~gvcormac/spamcormack.html

I wrote that paper, and the configuration I posted here is what was used in the best-scoring run.
For your convenience, here's a link to the Spamassassin code that makes the auto-learn decision. Note that sub learn calls _get_autolearn_points() which uses "score set 2" which does not include the Bayes result. Also notice the string of ad hoc tests (which cannot be disabled) based on head_only_points and body_only_points. The main negative effects are: (1) that Spamassassin fails to train on your ordinary good mail, resulting in more higher resulting false positives; (2) although the Bayes filter flags a large number of spams that the ruleset would not otherwise catch, it is not reinforced on these (worse, if the ruleset says these were ham, the Bayes filter is incorrectly trained to believe this is good mail).

Thanks, but... by Kagura · 2005-10-17 00:43 · Score: 1

I use Gmail. :)

Re:Thanks, but... by Slashcrap · 2005-10-17 01:22 · Score: 1, Funny

I use Gmail. :)

"So I let Google spam me in a targeted and personal manner via HTML rather than random people spamming me through SMTP."

I can understand why you're so proud.
Re:Thanks, but... by Talinth · 2005-10-17 01:36 · Score: 1

I configured my gMail account to Moz Thunderbird. No targeted ads, and the benefit of the greatness that is the gMail spam filter. I would say that it is quite possible the GP poster does as well.

--
71.3% of all statistics are made up on the spot.
Re:Thanks, but... by Slashcrap · 2005-10-17 01:43 · Score: 1

I configured my gMail account to Moz Thunderbird. No targeted ads, and the benefit of the greatness that is the gMail spam filter. I would say that it is quite possible the GP poster does as well.

Yeah, I bet at least 99% of gMail users know how to do that.
Re:Thanks, but... by shaitand · 2005-10-17 01:46 · Score: 1

If that was meant to be sarcastic it should not be. Gmail is invite only and the first invitations went to an all tech savy crowd. Although gmail has spread far and wide I think the audience is still primarily tech oriented.

A 'chicken-and-egg' random thought by TVmisGuided · 2005-10-17 00:43 · Score: 1

This is one of those things that makes me wonder...which "side" is pushing the technological envelope further and faster, the {spammers | malware slimers | virus breeders} or those who develop to defeat them?

Since it's generally agreed that history is written by the winners of a given conflict, I guess we won't have an answer to that until the war's over.

This comment generously brought to you by a severe lack of caffeine.

--
All the world's an analog stage, and digital circuits play only bit parts.

Re:A 'chicken-and-egg' random thought by EpsCylonB · 2005-10-17 01:15 · Score: 1

This isnt really a chicken and egg situation. Whats the answer to 99 out of a 100 questions ?, Money.

Spammers used email to sell things whilst at the same time pissing everybody off. Eventually people hate spam so much that they are willing to pay for services that try and and eliminate spam.

It may not always be so but spammers have always been one step ahead, they have more incentive.

Linux Router by Stavr0 · 2005-10-17 00:46 · Score: 3, Interesting

I know I'm going to get mauled over this quesiton... but has anyone compiled it on Windows 2003 server ? (Release the hounds!)

How about getting it compiled into a Linksys WRT54G router firmware i.e Sveasoft firmware?

Re:Linux Router by op00to · 2005-10-17 00:57 · Score: 3, Informative

DSPAM, as it's running in my cluster, is using way more ram than the WRT54G physically has. Probably not a good idea to run it on that little box.
Re:Linux Router by maggard · 2005-10-17 04:05 · Score: 1

My understanding is this sort of filtering isn't practical on any of the consumer routers due to their limited memory. The applications load the email messages to scan them, and between the OS code, the scanning package, and the email being scanned there simply isn't enough memory to hold it all, even on the larger WRT54GS units. My own hope is that Cisco's Linksys subsidiary eventually 'gets smart' and releases a combination WRT54GS / NSLU2 / PAP2 appliance, with more RAM, that is Linux-based and hackable. That, or some Lucky Factory No. 5 starts churning out the equivalent white boxes using just the chip manufacturer's reference implementations, gets it FCC'd, and sets 'em lose on the market.

--
I don't read ACs: If a post isn't worth so much as a nom de plume to its author then I wont bother either.

Comment removed by account_deleted · 2005-10-17 00:55 · Score: 1

Comment removed based on user account deletion

curious about MD by jkind · 2005-10-17 01:09 · Score: 1

How well does "Markovian discrimination" work in practice? It sounds fascinating, but what is the false-positive rate that can be expected on average??
Geez from dealing with spammers to working with the crap DiamondTouch, Yerazunis is a real glutton for punishment :)

--
~jennifer.k~

Re:curious about MD by junics · 2005-10-17 01:30 · Score: 1

The CRM114 classifier/filter has used markovian and derivatives thereof for quite some time and claims 99.984% accuracy.
A downside is that markovian is quite a lot more resource intensive than simple bayesian.

I used bogofilter (a fast bayesian filter) before CRM114. Even if it was harder to setup CRM114 than bogofilter and it used more resources, it was totally worth it.
Re:curious about MD by Antique+Geekmeister · 2005-10-17 01:30 · Score: 1

You apparently missed Iglassware, Bill's contribution to measured drinking, and his role in the JunkYard Wars, at http://www.tms.org/pubs/journals/JOM/0310/Byko/Byk o-0310.html
Re:curious about MD by jkind · 2005-10-17 01:41 · Score: 1

Iglassware.. now there is a fun Masters thesis :)

--
~jennifer.k~
Re:curious about MD by Anonymous Coward · 2005-10-17 01:47 · Score: 0

It sounds fascinating

Why? Because of the kewl sounding name???

Man, marketing people just LOVE people like you. So easy to manipulate!
Re:curious about MD by Nuclear+Elephant · 2005-10-17 02:05 · Score: 2, Interesting

Below are some tests I ran with a pre-release version of DSPAM on a test corpus. As you can see, Markovian discrimination is significantly more efficient than any Bayesian methods and Chi-Square. Markovian showed slightly more (4 more than the top contender) false positives, but it also caught 100 more spam... some additional tuning, tweaking, and most importantly, training, can easily get this down to a very low error rate.

Bayesian (burton)
TP: 785 TN: 1003 FN: 218 FP: 4 SC: 4 IC: 0
SR: 78.27% IR: 99.60% OR: 88.96%

Chi-Square (multiword)
TP: 801 TN: 1005 FN: 202 FP: 2 SC: 0 IC: 0
SR: 79.86% IR: 99.80% OR: 89.85%

Chi-Square (single Word)
TP: 794 TN: 1003 FN: 209 FP: 4 SC: 2 IC: 0
SR: 79.16% IR: 99.60% OR: 89.40%

Bayesian (graham)
TP: 833 TN: 1002 FN: 171 FP: 4 SC: 4 IC: 0
SR: 82.97% IR: 99.60% OR: 91.29%

Bayesian (graham-burton)
TP: 838 TN: 1000 FN: 166 FP: 6 SC: 4 IC: 0
SR: 83.47% IR: 99.40% OR: 91.44%

Markovian discrimination (burton)
TP: 950 TN: 996 FN: 54 FP: 10 SC: 0 IC: 0
SR: 94.62% IR: 99.01% OR: 96.82%
Re:curious about MD by jshaped · 2005-10-17 03:07 · Score: 1

so in your opinion, is 4 more false positives worth the increase in true positives?

this is one thing i'm struggling with, is how to compare the results of 2 filters on the same corpus.
we know FP's are substantially worse than any spam that gets through, but how much worse?
Re:curious about MD by markhb · 2005-10-17 03:15 · Score: 1

Yerazunis is a real glutton for punishment

He leavened it with appearances on Junkyard Wars.

--
Save Maine's economy: write stuff down. All comments are exclusively my own, not my employer.
Re:curious about MD by Nuclear+Elephant · 2005-10-17 04:10 · Score: 2, Insightful

4FPs for 100-something more TPs? Heck yeah. At least for me.. But keep in mind these are just preliminary training numbers with 1000 messages in each corpus. After real-world training, any of these approaches will be much more accurate.

Re:not "bulletproofly" reliable by NutMan · 2005-10-17 01:14 · Score: 1

This isn't "bulletproofly" reliable either. My brothers and I run a small local ISP. Years ago I created an address for my youngest daughter. She never used it, it was never posted anywhere, and it wasn't an easy to guess address since it was a combination of her name and her nickname. However spammers are constantly trying to discover email addresses on our domain, we get about 2,000 invalid recipient attempts every hour of the day. So eventually they discovered her address and she now gets a small amount of spam. (6 to 12 a day) If you want something 100% effective, then cancel all of your email accounts. A more reasonable course of action is to use an excellent solution like DSPAM.

OpenBSD port by chrysalis · 2005-10-17 01:43 · Score: 2, Informative

The OpenBSD port can be downloaded from ftp://ftp.00f.net/misc/port-dspam-3.6.0.tar.gz

--
{{.sig}}

Best defense is to say "No!" to PHP by jistanidiot · 2005-10-17 02:24 · Score: 0

Though this is only possibly with PHP, ideally running on a Debian system, it's the most important language to learn in the universe. For a starter's guide, check out this site.

PHP is nothing but an email harvesting/phishing scam. I was going to convert our website from Windows IIS ASP to Linux/Apache/PHP. I had read all the hype and thought it would be a good move. I subscribed to a list on php.net to help me install. I tried posting a message and was told I had to visit three different websites in order to be able to submit it. I tried contacting the list admin. He seemed helpful at first, but then became belligerent. It was then I realized I was trapped on spam list.

But all was not lost. The PHP spam scum were too stupid to prevent me from emaling their list. A number of months later I managed to get a message posted which detailed my saga with the list. That's when some seemingly nice list person contacted me. They offered to help me get removed from the list. What they suggested, I had done before but nothing happened. Then to my shock and horror, the person wanted my password to my gmail account. This PHP stuff isn't just an email harvesting scam, but a phishing scam too!

So now my address was trapped on the spam list and my new gmail account was full of spam (even with gmails filtering). They've tried to get me to give them my password, but failed. Their phising scam has been revealed. I luckly managed to post a 2nd message to their list about the phising scam. After all of this they finally realized I wouldn't fall for the scam and removed me from their list.

All I can say is that I'm lucky I didn't convert our production server yet, no telling what could be written in the PHP code. It is just amazing this spam harvesting scheme has come this far. Stop the PHP spammers and phishers! Just say no to PHP. Don't even visit a website with PHP in the URL.

Enhancement? by hazzey · 2005-10-17 02:48 · Score: 1

...significant enhancements include trusted sender whitelisting...

I thought that whitelisting had been a feature of every email reader/server since spam filtering began.

Re:Enhancement? by Nuclear+Elephant · 2005-10-17 06:37 · Score: 1

I thought that whitelisting had been a feature of every email reader/server since spam filtering began.

DSPAM's trusted sender whitelisting is automatic, based on who you converse with. It's not quite social networking, but is very useful, and requires no effort on the end-users part.

Mod parent into oblivion! by Thalagyrt · 2005-10-17 03:41 · Score: 1

Nice troll. PHP has nothing to do with spam, if anything it was your blatant stupidity that got you on a spam list.

--
Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo!

Solution by lorcha · 2005-10-17 04:20 · Score: 1

If a user has to go into the SPAM box and double check that no mistakes have been made then the system is worse than not having any SPAM checking at all as most users will not check the SPAM box

I use a three-outcome approach with SpamAssassin. Messages scored below 5 are delivered to the user's INBOX. Messages scored 5 or higher, but less than 10 go into the spam box. Messages scored 10 or higher are rejected during the SMTP session, with instructions on how to proceed.

I did this because, in practice, my system never had a message scored 10 or higher that a used considered to be HAM, and indeed, no one has ever called and said, "WTF? My email didn't get through!" Also, in practice, the number of spams and hams that score between 5 and 10 is very low, so users do check their spamboxes. 99% of the messages delivered to the spam folder are flagged as spam. Every so often, a ham slips through, but never has a ham been rejected to my knowledge.

I like this solution because it keeps all obvious spam away from the users, keeps most non-obvious spam away from the users, yet never drops anything to /dev/null.

--
"Avoid employing unlucky people - throw half of the pile of CVs in the bin without reading them." -- David Brent

Not an advertisement... by pabl0 · 2005-10-17 05:34 · Score: 3, Interesting

... but it'll sound like one: I recently converted from a rather involved anti-spam defense utilizing SpamAssassin with Razor, Pyzor, and several RBL checks. I spent a fair amount of time selecting RBLs that worked the best and tweaking SA test scores whenever I got false positive/negative messages. I even had all sorts of validity checks turned on in the MTA to block out badly formed messages and the like.

I replaced all those defenses with: DSPAM. And I'm seeing better results out of the box than I ever did with a multi-layered SA-based solution, even after a lot of time tweaking.

A quick anecdote: When I converted, I opened up a bunch of previously blocked spamtrap addresses, just to get some good training material for the filter. I've long since passed my initial training threshhold but haven't even bothered to block the spamtraps again because I never see the spam. At the risk of sounding like I'm bragging, I literally don't have a spam problem anymore, and DSPAM is entirely responsible for that.

Now, I'm not necessarily advocating that you give up all your custom defenses and switch to DSPAM. (I've turned off all my other filters, but I haven't removed them completely.) There's always a chance that an ingenious spammer will find a weakness in DSPAM setups, but I can testify to the fact that DSPAM is "scary good" as of right now. Training the filter is a simple matter of dropping misclassified messages (and there aren't many) into an IMAP folder.

If what you have is working for you, stick with it. But if you're looking for a low-maintenance, high accuracy filter, you should definitely give DSPAM a shot.

Re:php problems -- too specialized by ToyKeeper · 2005-10-17 05:43 · Score: 1

I must agree about PHP being un-magical. It's great for one or two specific purposes, but is pretty lacking for anything else. Want a simple web email form? It'd be hard to find an easier way to do it than PHP. But if you want a large web application, it's worth trying other languages. What's magical and amazing is that people have built incredible things with it despite its shortcomings -- projects like Drupal and Mediawiki are sheer wizardry.

I've been keeping a list of problems with PHP, if anyone wants details. I won't say it's not biased, but it's not terribly religious either. It just attempts to list some of the more important issues.

Re:social effects by ToyKeeper · 2005-10-17 05:56 · Score: 1

I've found that nearly all of my users actually prefer an interactive system like dspam over a fully-automatic system. Both systems make mistakes, but the interactive system gives the user a feeling of empowerment to fix mistakes and improve their accuracy over time.

It's better for the admin, too... When a non-interactive system makes a mistake, I find that the users complain -- either to the admin or to each other. But with dspam, they reclassify the missed message and continue working, happy to know they're part of the solution. A simple "mark as spam" button eliminated most of my email support requests.

I do get occasional users who still aren't happy... they expect 100% accuracy with 0 effort. But the only way to please those users is to hire them a personal spam secretary. And guess how often that happens?

Re:social effects by gvc · 2005-10-17 06:04 · Score: 1

Absolutely. It is cathartic to punish spam by reporting it to your spam filter. And, of course, fully automatic systems aren't nearly as good as claimed. (Neither are learning filters - 99.9...% accuracy? pshaw! - but they're better than non-learning ones.)

So do I... and it could so easily be improved! by hobbit · 2005-10-17 06:27 · Score: 1

I get an incredible amount of spam bounces in my GMail account -- from somebody sending lots of spam using my GMail address as the From: or the Return-to: address.

I really, really want an option for GMail to record the message-id of all messages I ever send through their server, and bounce any which are returned to me but which they haven't got on record as being sent by me.

I requested this ages ago, and it should be relatively straightforward. Does anyone else have this problem?

--
"Wise men talk because they have something to say; fools, because they have to say something" - Plato

What is wrong with DSPAM? by frn123 · 2005-10-17 06:29 · Score: 1

Why is it not included in Debian?
Spamassassin is.
Bogofilter is.
Popfile is.

I thought it was the license, but seems that DSPAM is GPL.
So, can anyone comment? I'm not installing it
for my server if i can not apt-get it and have debian
security support for it.

Re:What is wrong with DSPAM? by Nuclear+Elephant · 2005-10-17 07:01 · Score: 1

Why is it not included in Debian?

There's been a lot of interest in this area but nobody's felt like taking it upon themselves to make a Debian distro AFAIK. Part of it may have had to do with the storage driver backend, which supports several different approaches, but required a recompile to switch from say Postgres to MySQL. In 3.6, the storage backend can be built dynamically making packaging much easier. Perhaps someone will pick 3.6 up now.

Re:social effects by jaseuk · 2005-10-17 10:47 · Score: 1

I used SPAM Assassin quite happily for many years but found the effectiveness started dropping, there are some messages that just can't be caught, usually these are the worst kinds of messages (ie. a face full of spunk) almost always received by the people most likely to be offended (ie. 55 year old female administrative staff).

False positives seem to be more of a problem written in languages other than English. Pretty much all of our e-mail in Welsh language we receive through AOL has been tagged by AOL as SPAM, you might say AOL losers etc. But SpamAssassin & Messagelabs also incorrectly tag e-mails, training these systems doesn't really help and that pretty much ruled those options out, then on top of that if we don't respond to Freedom of Information requests within 20 days we can be fined, so another good reason to not rely on any SPAM system that can be manipulated by the user, better to not receive than to misfile and forget.

I have measured our greylisting performance, I manually filtered over 8000 messages and found only 4 items (Nigerian / lottery frauds) that were undetected SPAM, that gives us 99.95% and our users have had to take no action whatsoever to achieve this. Asside from the usually very short (usually less than 5 minutes) initial delay and the very occasional non-delivery (3 instances in 18 months) due to a broken downstream mailserver (easily rectified with a phone number & guaranteed to work contact e-mail in the bounce) it's very low maintenance.

Another great feature of greylisting is that it's a highly effective first line of defense against viruses. Prior to enabling greylisting I was getting around 10-20 messages a minute intercepted by our virus scanners, with greylisting the number is more like 8 a DAY and all of those are thanks to either transparant SMTP proxying from some brain dead ISPs or messages passed on through forwarding.

SPAM is not really a security risk as such, but the fact that greylisting has such strong anti-virus capabilities should when balanced against it's few potential shortcomings make it very easy to justify switching on as a good e-mail security measure.

Oh and I really get a laugh when people using SpamAssassin helpfully mark their own non-SPAM e-mail as SPAM, thats always a good one and a sure sign that there is something seriously wrong with the SpamAssassin approach.

Jason.

Another ClamAv+mail by Anonymous Coward · 2005-10-17 12:01 · Score: 0

Cool. So we have yet another spam filter. What we really need is an alternative to ClamAV_Redirector.py , which is bunk.

100 comments