DSPAM v3.2 Released
Nuclear Elephant writes "After four months of development DSPAM v3.2 has been released, bringing many new enhancements and filtering technologies. These include distributed computing support, implementation of Bill Yerazunis' Sparse Binary Polynomial Hashing algorithm (from CRM114), and v1.2 of Bayesian Noise Reduction. Other enhancements include SQLite support and many significant performance enhancements for PostgreSQL. DSPAM's official release is next week, but you can download the preview release now. Users of the project have also contributed towards creating a new logo for this release."
Are most people using a bayesian DSPAM, CRM114, or SpamBayes along with SpamAssassin (rule based)? Or do you just use the bayesian filter?
I see that most of these bayesian filtering programs mention that they can be used with SpamAssassin. Is it usually best to run both for DoublePlusGood(TM) spam catching?
.sig
I am using D-Spam on a qmail/vpopmail server and I find that its great in terms of accuracy. Most of my users have never had a false positive and many havent seen a spam after a couple of weeks of training.
The problem that I have with DSpam is the integration side. Im not sure how it goes with other mail systems but integrating it with vpopmail was a major pain. It seems easy, you just put the command in the dotfiles, but in practice getting it to work was quite a trial. Even now it doesnt integrate properly with the web administration, etc despite some scripting and minor code changes.
Because of this Ive been thinking of switching to Spam Assassin simply because of its integration with qmail-scanner. Has anyone else had similar problems or been in a similar situation and found a good solution?
...any better than CSPAN?
"Like fire and fusion, government is a dangerous servant and a terrible master."~RAH
Iron Port, fuck yeah.
it seems to me that we're on 3.2 preview release 1. not 3.2 release which is scheduled for the 20th to the 22nd. is this post a bit early?
dave
Here's what it shows.
ONLY the 3.2 Preview Release 1 is currently out!
.sig
I'm sick of spam filters braging about their overall error rate. All of them do OK at getting rid of the bulk of spams and saving the bulk of time.
The real important differentating factor is how many false positives they mistakenly accuse of being spam.
The consequenses of a spam message getting through are minimal - under a seconds of time, on average, to skip them.
The consequenses of a non-spam getting blocked can be huge - loss of a customer - a mom not knowing her kid is in trouble.
I wish the spam filters focused entirely on reporting how few false positives they produce.
I have been using DSpam for my network for quite some time now (~a month or two) and have since not recieved a complaint from any users, seems to me it works better than CRM114.
as SPAM released?
Me've always found that the best filter still is the humble (and the not so humble) human :p
... have ya gone through several delete keys yet?
a few months ago those features were available, too. while dspam is great at filtering mail, I faced two crucial problems, which forced me back to spamassassin. I haven't heard that they fixed any of those: .. 10, so I can check 0..4 where 0 is ok (few false negatives) and 1..4 spam (few false positives), and I can directly delete thousands of mails in 5..10 without looking at them.
- the database did grow huge. when my single user server with 128 mb had to use a 512 mb spam token database, performance was terrible. even with the tools included I could not do anything to fix the issue.
- dspam knows only yes or now, there is no usable value that gives you some grey information. as a result, I had to check all those spam postings for false positives. Spamassassin on the other hand has that spam result 0
i wont go back to dspam unless someone can offer speciic help for those issues. I believe everyone will face them sooner or later.
Does DSPAM inform the sender that his/her e-mail has been filtered out?
Asking slashdot: .mac email service any good? I have a Mac and sure could make use of some of the other features they offer ...
Which provider do you think does the best effort to filter/fight spam and uses the most state of the art techniques for that? The german freemailer GMX I use now is good, but I wonder if others do better.
And I wouldn't mind paying for never receiving spam again. Is Apple
The DSPAM site mentioned that it can be compiled on Mac OSX, but what about Winblows? I only have one box (go ahead and laugh) and it is an older Pentium III Winblows machine. I'd like to have a seperate box to act as a mail server but it just isn't currently feasable (translation: I'm broke.) Is there any way they can compile DSPAM for Win9X?
this is one heck of a product, and I think it would be used more if there were a very verbose install of the current version on various platforms (similar to obsd version on site).
think- spamassassin, clam, spammassassin howto or something similar but it has to be VERY verbose to bring in the crowds (newbies).
my 2c
AC
9/11 Eyewitnesses to Explosive WTC Demolition 1 of 2
Wow, yet another anti-spam solution out there. I wonder how this stacks up to the other ones out there, looks good so far. SPAM SPAM it lasts for years, either meat, mail, or canned.
Didn't your mother tell you that if you haven't anything nice to say, then don't say it all!
MOD PARENT UP!!! for a more friendly, sensible Slashdot.
DSPAM's Focus
The DSPAM project attempts to set itself apart from "generic Bayesian filter" by focusing on the following areas:
* DSPAM has a strong drive for research. Many new algorithms and approaches to fighting spam have come out of the DSPAM project. Some of the approaches deployed in DSPAM include Chained Tokens, Neural Networking, Message Inoculation, advanced de-obfuscation techniques, and a new noise reduction algorithm called Bayesian Noise Reduction. We're always looking for new approaches to improving the accuracy of DSPAM.
* A strong focus on large-scale implementation support. The largest implementation of DSPAM we've heard about to-date involves 350,000 users, with the next largest being around 125,000, then 100,000. DSPAM has been designed to run with a very short execution time (between 0.01s - 0.03s real time for classification and between 0.03s - 0.10s real time for training, on average hardware), and has been equipped with a storage driver API allowing several different storage mechanisms to be used. Depending on disk space constraints, accuracy can be traded off for additional disk space or vice-versa.
* Usability. DSPAM was designed with "grandma" in mind. Users need only forward the spam they receive to an email address to train their filter. End-users don't need to know any commandline utilities or other complexities plaguing some other such tools. Functions such as whitelisting and keyword inventory are automatic (based on statistical functions) and therefore require no user intervention.
Not to change the topic, but I have a different method for curbing spam. I just change email addresses every 6-9 months. Works like a charm. When the ratio of spam versus real email starts shifting, I know it's time for a change. That and just don't post your email address all over the internet. Works for me.
Free Desk
if you look spamassasin distribution you'll find a tool to finetune rulescore based on spam an non spam mail. read "a plan for spam". read "a plan for spam" its token not words: sa rules can be tokens
... to CPAN!!
Your head a splode
RBL deal with 90% of the shit I see (no, not SPEWS and the nasty 'let's damage the world' ones - I use relays.ordb.org, sbl-xbl.spamhaus.org, list.dsbl.org, opm.blitzed.org, dul.dnsbl.sorbs.net, cbl.abuseat.org, dynablock.njabl.org, dnsbl.njabl.org.
Why mess with only DSPAM and stuff, which is after the fact, and massively over-engineered? Sure, use spamassassin or something, kept up to date to cleanup the other 10%, but don't really on it.
Here's another spam solution:
If we had a respected national leader who could often talk to millions of people, that person could change the culture. The leader could tell everyone never to buy anything or even respond to unsolicited email advertising.
It might take years, but eventually it would not be economic for spammers to operate, particularly since spam filters would continue to improve.
The only person who could do this in the U.S. now would be Oprah Winfrey. She has an enormous following, and has a reputation for positive thinking (and, unfortunately, sometimes being ignorantly anti-male). She could tell her women viewers, and ask them to tell everyone in their family.
If we had a positively-minded president, he or she would be in an excellent position to change the email culture. A president could change the culture in a few months, possibly. It would simply become socially unacceptable to respond to unsolicited email.
Unfortunately, we don't have such a president. For example, see this article: Unprecedented Corruption: A guide to conflict of interest in the U.S. government.
If the spam culture change worked, the next thing I would like to see is an open source reference browser that set standards for how browsers should work. Unforunately, Bill Gates is not a positive leader, either. I would like to see Mozilla become the U.S. national government standard. Anyone could continue to use any browser they wanted, but the government's power could be put behind web page rendering standards and browser quality.
--
Government data shows Democrat and Republican spending patterns.
I didn't find anything negative about the pages you referenced.
but somewhat besides the point.
I have to disagree with you on whether it's spam, however. Just making up statistics here, but I'd guesstimate that the sender address of >99,99% (probably even more) of all virus emails is forged and probably points at an innocent third part. That means that the message from the virus scanner is completely and utterly worthless to the reciptient (i.e. the "sender" of the virus email). That makes it "junk" or "spam" in my book.
You're right that there isn't much you can do, but I usually check to see if the mailer-daemon/postmaster address in the message looks legit and send off a boilerplate message saying something to the effect of "what you're doing is stupid and counterproductive, please stop".
Hopefully SPF can stop some of this sender spoofing.
HAND.
Why does DSPAM get front page treatment when the latest POPFile release (which now handles POP3, IMAP, SMTP and NNTP filtering) and has an XML-RPC external interface, supports different databases, etc. etc. gets rejected as a story?
/. has recently turned into some combination of Freshmeat and PC Magazine? Yes.
Perhaps it's because I don't tend to make super-wild claims about POPFile's accuracy? Or come up with cool marketing names for the internal technology?
POPFile's the only Bayesian filter that can:
1. Do more than spam vs. anti-spam and
2. Filter POP3, IMAP, SMTP and NNTP (that's right Usenet news)
Do I have an axe to grind with Jonathan and DSPAM? No, it's a cool project. Does it annoy me that
John.
A lot of porn email now comes with legitimate text. Some are excerpts from books or wrods from the dictionary.
This confuses the hell out of Bayesian filters.
How do you fix that?
... I want a spam filter that automatically forwards all spam to the abuse@ mailbox for the domain from the spammer.
Once the admins start getting hundreds of thousands of spam complaints in their abuse boxes PER DAY. Then maybe they'll start to think of ways to fix this problem.
I got nothing against content-filtering measures, as long as one is aware that this should be just the last layer of defense againts spam. Think about it, if your SMTP has already swallowed the spammer's email content, you have already lost precious bandwith.
Especially if you host your own SMTP, you should put up a layered system of defenses: RBL lists, maybe tarpitting, white/graylisting, and then content filtering.
Even though you don't SEE the spam, you STILL HAVE TO PAY FOR THE RESSOURCES THE SPAMMERS ARE STEALING FROM YOU!!!!
Unless ALL the spammy networks are PUNISHED FOR HARBORING SPAMMERS, spammers will always find connectivity
We are at war, at war against spammers and their spammy-networks accomplices, and until the spammy-networks are thoroughly eradicated, there will always be ressource-stealing spammers.
Netblock blacklisting is a really poor solution.
/24 and then a /16 to be blocked.
It is the only solution when the ISP will do nothing to stop the spammer on their network.
In some cases a single spammer causes a
That is rather difficult without the ISP's assistance (or them repeatedly ignoring the complaints).
Btw, do you understand that changing ISP may not be an option?
Sometimes that is true. In which case, you should get on the phone and make sure that your ISP understands that they have customers who will be upset if the ISP doesn't handle its spammer problem.
Those lists, by themselves, do not block any email at all. Those lists are used by people who are fed up with trying to get ISP's to deal with their spammers.
What if we all began responding to every spam we
could, go to every website and fill in nonsense, etc.
It seems that very quickly spam would become
useless. They send these out to millions and
millions of account, they're generally low
budget operations, they can't afford to sort
out the wheat from the chaff.
There are some types of spam this won't work
for (e.g., stock pump+dump), but maybe it'd
put enough of them out of business that all
of it would go away. But why make the best the
enemy of the good?
Has anyone used DSPAM with xmail?
I trained my spam filter on bounces as well as regular messages. It got a little confused at first but soon got the hang of distinguishing real bounces from spam/virus bounces.
From the DSPAM FAQ: SpamAssassin's primary detection facility has been designed to use a static set of rules to service all users of the system. That's not true at all. Each of my users maintains their own bayesian db's and custom rules if they choose. It's in $USER/.spamassassin.
I read through the white paper describing the 'Bayesian Noise Reduction' and I just can not see how it is in any way Bayesian. It is a bunch of heuristics, which sound pretty reasonable and probably work great in practice. But why call it Bayesian? It is great to see that Bayesian techniques such as Naive Bayes Classifiers get applied with great success in the spam setting. But it is somewhat annoying if people use the word 'Bayesian' as just meaning 'sophisticated' or 'awesome'. It does actually have a meaning. http://en.wikipedia.org/wiki/Bayesian_inference
True, it may be difficult or even impossible for some users to switch ISP. Does it mean everybody else should be forgiving and tolerate more junk mail from their ISP? That's like telling the ISP up front: Lock your customers in and we will avoid blacklisting you for the spammers you host.
To the inhabitants of certain countries, emigration isn't an option either. Does their lack of choice mean we shouldn't boycott products from such a country due to the misdeeds of a very tiny minority
(the people in power) of said country? I rather not give specific examples of countries; I'm sure you can think of a few.When someone begs "please don't attack those who keep me hostage", listening to that plea would be a disservice to everybody.
The paper uses the term "GPLware". I haven't seen that befofe. I might use it. Of course, we remember "freeware", "shareware", etc.
Problem is, if you don't want to trust the advice of others in any way, be it in the form of blacklists, filtering software, or even pressure against the spammer's ISP, you are effectively left with fighting your spam flood yourself. You would have to look at each and every message you get, or write a filter all by yourself. It will take you a lot of time, maybe even more time than it takes to hit delete, and you may still accidentally delete something you wanted without noticing it.
The whole idea of the Internet, on the other hand, is based on cooperation with others, to the point that you even trust complete strangers to write software used by you for business-critical activities. For a start, your e-mail correspondance with your customer depends not only on your ISP, but also on your customer's ISP, to get through intact and remain confidential. If you are willing to yield this much power over your business to your customer's ISP, then how is trusting someone else's advice (on what is and what isn't spam) fundamentally any different?
I agree with you however, that trusting someone else's list of "spam keywords" is in general a bad idea, and I avoid doing so myself, but not because I don't trust others. I prefer blacklists of IP addresses and domains instead, not because I maintain all of those lists myself (I don't), but because I can more easily predict the consequences of using them, given the listing criteria and the trustworthiness of the list maintainers. What are the "listing criteria" for a list of spammy keywords?
To prevent someone from doing something illegally while the spammers continue to do whatever they want?
Shouldn't they pay for the costs when they are caught?
http://saveie6.com/
Yes, the above counts as "humor" too. :) Have a nice day.
"Like fire and fusion, government is a dangerous servant and a terrible master."~RAH
What you should do, however, is reject the message in the SMTP session. My mail server issues a 554 during SMTP if you send me a spam or a virus. That way, legitamate senders will still get a notification of the delivery failure (generated by their own MTA, not mine!), and I am not sending misdirected bounces all over the place.
Of course, the 554 says why the email was rejected: "554 mail server permanently rejected message: message contained VIRUS (#5.3.0)" for a virus, similar message for spam. That way the sender knows what's up.
"Avoid employing unlucky people - throw half of the pile of CVs in the bin without reading them." -- David Brent
My approach only uses 8 simple rules to score spam--the others use more complicated and computer-intensive methods.
My approach is fast, simple, and effective.
I use it to check my own email where it has filtered out my spam without fail.
The only 'spam' it wont detect currently is 'subject line' spam with email bodies with absolutely no content but I can easily fix that....
Maybe my approach is 'too good to be true' or 'not serious' to merit 'airtime' on Slashdot. You decide.