Comparison of Bayesian POP3 Spam Filters

Re:great by Eric+Ass+Raymond · 2003-08-10 20:05 · Score: 3, Insightful

Spam is not advertising.

It's harrasment.

Nitpick... by 1029 · 2003-08-10 20:07 · Score: 2, Insightful

I just sure as hell hope he meant "latest, best hope", because anyone who thinks bayesian is the LAST best hope doesn't understand CS technology at all. And such a person sure as all hell shouldn't be given an audience on /.

--
- I love animals. I try to eat at least one a day.

Re:Nitpick... by RatFink100 · 2003-08-10 20:40 · Score: 2, Informative

It's a Babylon 5 reference.
Re:Nitpick... by spongman · 2003-08-10 22:22 · Score: 4, Interesting

Actually SpamBayes isn't bayesian at all. It uses a chi^2-based algorithm which was shown in (the extensive spambayes team's) tests to be superior to regular bayesian filtering.
Re:Nitpick... by AndroidCat · 2003-08-10 22:40 · Score: 3, Informative

The "last, best hope" was used by Lincoln in the American civil war, "We shall nobly save, or meanly lose, the last best hope of earth."
It's quite possible that it goes back further to a version of the Bible or Shakespeare. (Always the two to bet on when finding the source of a phrase in one fell swoop.)

--
One line blog. I hear that they're called Twitters now.
Re:Nitpick... by spongman · 2003-08-10 23:01 · Score: 4, Informative

Here's a bit from the excellent SpamBayes background page:
A remarkable property of chi-combining is that people have generally been sympathetic to its "Unsure" ratings: people usually agree that messages classed Unsure really are hard to categorize. For example, commercial HTML email from a company you do business with is quite likely to score as Unsure the first time the system sees such a message from a particular company. Spam and commercial email both use the language and devices of advertising heavily, so it's hard to tell them apart. Training quickly teaches the system all sorts of things about the commercial email you want, though, ranging from which company sent it and how they addressed you, to the kinds of products and services it's offering.
Re:Nitpick... by tim_one · 2003-08-11 12:37 · Score: 2, Informative

The way spambayes estimates the probability that a msg is spam given that it contains a specific word is thoroughly Bayesian, as described on Gary Robinson's web page, and in his March "Linux Journal" article.

The way spambayes combines probabilities ("chi-squared combining") is indeed not Bayesian at all. The probability combining scheme Paul Graham suggested isn't correctly Bayesian either, unless you assume the universe consists of equal numbers of ham and spam messages (so that the prior probability of spam is 0.5).

Re:great by mirko · 2003-08-10 20:09 · Score: 3, Insightful

I think spam is overhyped : it is not convenient to get some but with properly adjusted filters, very few of these will land elsewhere than in you trash can.

Personally, I get around 100 of these a day, but only 3 get in my inbox instead of one of my specific mail directories, this is not *that* disturbing.

I just wish these spams were better targetted : getting some penis-enlargement, ultra-fast-diet, university-diploma or cheap-herbal alternative to viagra is somehow repetitive and boring.

--
Trolling using another account since 2005.

Bayesian filters are useful, but... by fr0z · 2003-08-10 20:10 · Score: 5, Funny

I still believe that we should have a hunting season for spammers, just like we do for ducks...

--
Never underestimate the predictability of human stupidity...

Re:Bayesian filters are useful, but... by frovingslosh · 2003-08-10 20:16 · Score: 4, Insightful

I still believe that we should have a hunting season for spammers, just like we do for ducks...
No, it should be longer, if not all year long.

--
I'm an American. I love this country and the freedoms that we used to have.
Re:Bayesian filters are useful, but... by dtfinch · 2003-08-10 20:28 · Score: 5, Insightful

You know, computer crimes are considered terrorism under the USA PATRIOT Act. Until that silly law gets repealed, lets hunt down those terrorists for their, umm, denial of service attacks against innocent email users, bandwidth theft, failure to provide real opt-out links, sending email advertisements with fake return addresses, presenting obscene material to minors, etc...
Re:Bayesian filters are useful, but... by ctr2sprt · 2003-08-10 20:29 · Score: 4, Funny

Spammer: Duck season!
You: Spammer season!
Spammer: Duck season!
You: Duck season!
Spammer: Spammer season! Fire!
*bang*
Re:Bayesian filters are useful, but... by frovingslosh · 2003-08-10 20:51 · Score: 4, Funny

I like your way of thinking. It's much like my approach of defending myself with deadly force when I'm attacked with the deadly weapon of second hand smoke.

--
I'm an American. I love this country and the freedoms that we used to have.
Re:Bayesian filters are useful, but... by AndroidCat · 2003-08-10 22:53 · Score: 2, Interesting

Spammers love to use open proxies to hide, and are now engaged not only in scans to find then, but also in campaigns to create them. Trojans and worms like SoBig. While each offense is small, it's on a scale large enough to have them behind bars for quite a while.

--
One line blog. I hear that they're called Twitters now.
Re:Bayesian filters are useful, but... by PhxBlue · 2003-08-11 00:13 · Score: 2, Insightful

You know, computer crimes are considered terrorism under the USA PATRIOT Act. Until that silly law gets repealed, lets hunt down those terrorists for their, umm, denial of service ...

An immoral law is no less immoral just because you can find a practical use for it. If you don't like the PATRIOT Act, don't support it, period.

--
!#@%*)anks for hanging up the phone, dear.

You just don't get it by frovingslosh · 2003-08-10 20:12 · Score: 5, Insightful

None of these spam filters will have any effect on spam at all if they are just installed on the systems of people who hate spam and would never buy from a spammer anyway. Hell, they might even have the opposite effect; I will never buy something if I get spam for it. But if I personally filter my spam and don't even see subject lines, I might end up buying the product without knowing they also marketed it by spam.

Spam is effective because it reaches millions of people who are not installing these filters on their systems. Until ISP's start applying these filters to all spam by default, then the spam filters will have no effect at all, exactly the same number of marks will be reached and respond no matter if the people who know better than to respond to spam go ahead and filter their e-mail or not!

--
I'm an American. I love this country and the freedoms that we used to have.

Re:You just don't get it by Plug · 2003-08-10 20:20 · Score: 5, Insightful

Realistically, I don't give a damn how much spam _you_ get, I care that _I_ don't get any.

You cannot automatically filter spam. Bayesian filtering works because it works on your own personal items only, and you have a method of manually removing false positives. There is nothing worse than the possibility that an ISP will filter out a real email in their spam system. That simple fact makes server side spam filtering impossible for most situations. You can filter spam into /dev/null (unacceptable), you can filter into a spam box (How many POP users would that rule out, who only have one POP box?), or you can keep it bundled in email with a flag, and expect people to update their clients, in which case you have the exact scenario you have now - the client has to do something themselves.

Until Hotmail et al starts offering bayesian filtering with a separate 'spam' mailbox, consider server side filtering worthless.

I am smart and don't get any spam. A lot of people I see in my line of work, aren't. These people are going to get something like Outclass (an Outlook plugin for POPfile), and then they are going to see the problem go away, and they're not going to lose any email in the process.

I'd rather use SpamBayes, but the Outlook plugin has an annoying bug that renders autocompleting addresses in Outlook useless.
Re:You just don't get it by hankwang · 2003-08-10 20:28 · Score: 2, Insightful

>None of these spam filters will have any effect on spam at all if they are just installed on the systems of people who hate spam and would never buy from a spammer anyway.
Still, there are plenty of people who hate spam but don't know how to handle it. At our department, quite a few people receive over 30 spams per day and hate it, but no one has installed a spam filter better than the subject/sender filter built-in in their (Windows) mail clients. One has stopped reading e-mail from his university account and asks us to send mail to a private address, because he isn't allowed to change his email address.
I mentioned Bayesian filters, but it turns out that not every computer user enjoys downloading and trying out five different programs to see whether they filter effectively and work together with their existing mail software. On top of that is the fear of false positives. (I am one of the few Linux users and on top of that I don't receive so much spam that I should worry, so I can't advise them.)

--
Avantslash: low-bandwidth mobile slashdot.
Re:You just don't get it by Anonynmous+Cow · 2003-08-10 20:46 · Score: 4, Interesting

Speaking of filtering for others... I don't - but I do run my own little mail server.

Even after implementing all the postfix uce rules and adding in the RBL's - and using spamassassin... I still saw some spam slipping in...

So I hacked together a tiny little perl script that monitors my mail log... after any IP address gets more than 3 "554" messages (generated by the RBL's) the source IP gets a lovely little teergrube.

I waste their resources and prevent them from trying to deliver any other shit that might get through spamassassin...

Script can be found at here but is only good for postfix/linux/iptables peoples.

--
e3 :: blogging the wireless freenet
Re:You just don't get it by Plug · 2003-08-10 21:12 · Score: 5, Funny

How many people you know that email you 12 gifs/jpegs in one message with LARGE red text. ????

Lots of them. They're called 'girls' and Slashdot should encourage communication with them wherever possible.
Re:You just don't get it by Anonymous Coward · 2003-08-10 21:38 · Score: 3, Funny

Well actually got the Viagra and the penis enlargement pills... they work perfectly.

The problem is there were no instructions on how to find a partner.
Re:You just don't get it by MuParadigm · 2003-08-10 22:31 · Score: 3, Funny

I do not know how many times I have to tell people this.

They do not work. They just make your hand smaller.

Other filters by dtfinch · 2003-08-10 20:13 · Score: 4, Informative

I would have liked to see how my favorite bayesian spam filter, K9, would have faired in your comparison, but it failed to meet your first requirement of being cross platform. It's freeware written in C, is about a 60kb-100kb download, depending on if you get it with the self installer, is easy to use, and has a very small memory footprint. Before today it had sorted my email with over 99.8% accuracy, excluding the first couple days of training, and after only a couple weeks of use, though now it's down to 99.7%.

I have used PopFile in the past on both Windows and Linux, but found K9 to be better suited for environments where Windows is an option. It's very easy to use, having a windowed interface, and it seemed to learn much faster than PopFile did.

I haven't used SpamBayes. I'll have to give it a shot.

Spamprobe by 1029 · 2003-08-10 20:14 · Score: 5, Informative

The article didn't mention SpamProbe. It is what I use, and it has worked quite well for the past month or so that I've been using it. Perhaps this is just because the author didn't test this spam filter yet, but I like it quite a lot with my current mutt/procmail setup. Take this for what it's worth.

--
- I love animals. I try to eat at least one a day.

Re:Spamprobe by opk · 2003-08-10 20:29 · Score: 3, Interesting

I'll second this. Have been using spamprobe since December. It took longer than a month before it was fully trained. These days it's very good. And the best thing (except once when someone quoted the full body of a spam when complaining about spams on a mailing list): It has never given me a false positive.
Re:Spamprobe by HermanAB · 2003-08-11 02:42 · Score: 3, Insightful

Yes, SpamProbe is the best one I tested and I tested most of them. The reason being that it not only counts single words, but also word pairs. It is about 99.5% accurate for me and never gives false positives. My wife uses it in her law office, where I run it on the server - one database for everybody. It works like a charm and doesn't get tripped up with matrimonial fighting mail, which can resemble sleaze mail in many respects...

--
Oh well, what the hell...

Only useful to a point by KU_Fletch · 2003-08-10 20:16 · Score: 4, Interesting

I love spam protection programs. I've been using them for years, but have to switch every couple of months because of the friggen spammers. The people that make the spamming software don't just sit around cackling about how evil they are. They reverse engineer every anti-spam protection out there in an attempt to get around it. While this seems like a good idea (and I will be playing around with these two programs for a while), it's unfortunately only good up to the point when spammers figure a way around it.

I wish the government would somehow make the practice illegal, but I doubt they'll ever get anything to stick. The far better option at this point is to have a class action suit of server owners (who provide mail accounts) against developers of spamming software and spammers. I've gotten enough warnings from my university to know that bandwidth costs money. By sending millions of spams a year into any one e-mail server, that can account for a serious chunk of bandwidth used at significant cost to the provider. It won't stop spam all together, but it will bankrupt anybody that has been doing it.

--
It's not stupid. It's advanced.

Re:Only useful to a point by spongman · 2003-08-10 22:38 · Score: 3, Informative

I've been using SpamBayes for about 9 months now and I've never had any problem with this 'new kind of spam' you mention. I just don't see it. I don't have to do anything, write any rules, configure anything, it just gets junked. I've never once had any false positives either I get about 30 spams/day, and out of the 8,200+ spams I have in my spambox, less than 100 of those spams are categorized as having less than 90% probability of being spam.
Re:Only useful to a point by Steve+B · 2003-08-10 23:10 · Score: 2, Interesting

They reverse engineer every anti-spam protection out there in an attempt to get around it.
This is why a real anti-spam legal reform would clearly equate circumvention of an anti-spam filter with circumvention of a password prompt. Both are attempts to crack into someone else's computer without permission -- indeed, against an express prohibition -- and the former ought to carry the same penalties as the latter.

--
/. If the government wants us to respect the law, it should set a better example.

Filtering by rf0 · 2003-08-10 20:16 · Score: 3, Interesting

Taking I get 100+ spams a day I've found that its a goo idea to at least use tagging. For example posting on usernet I use usenet@domain.com with something in my sig saying actualy email is me at domain dot com. Anything sent to usenet is automatically deleted. Doesn't stop the flow by any means but at least I can track where the spam came from.

If you are feeling clever you can even use addresses that expire after a week. So something like epochseconds@domain.com

Just my 0.02p

Rus

--
Cheap UK and US VPS

Re:Filtering by gfody · 2003-08-10 21:21 · Score: 2, Informative

you might find this sight particularly useful. it will let you set up a temporary address based on a naming convention that forwards to your real address but expires after a few emails. you can setup something like rusxxxxx@asdf.com where xxxx is whatever you want and it will fwd to your real address so if the badguys get your email its no big deal the temp addy will just stop working.

--

bite my glorious golden ass.

Missing the point? by aquishix · 2003-08-10 20:17 · Score: 5, Insightful

As someone who recently acquired a B.S. in mathematics several days ago, I understand how these filters work. They are an excellent way to fight spam over the older methods.

However, I think that ultimately this sort of thing misses the point. Spam needs to be fought in the courts, not in the battlefield. I'm afraid that the success of these filters will cause spam NOT to become illegal, and thus lead to a world where we have a constant trickle of spam, albeit in small amounts.

I think we all agree that we want spam to be gone entirely, as is evidence by the first post being labeled as "troll" ;)

--
- I am a viral sig. Please copy me and help me spread. [strain #2] Thank you

Re:Missing the point? by Ingolfke · 2003-08-10 22:00 · Score: 2, Interesting

Bulk emailing, like any business is a numbers game. By significantly decreasing the # of successful responses to a set of SPAM (through filters) the business costs remain the same w/ the returns dropping. Eventually the business is no longer feasible.

[INCREASE TONE]
SPAM absolutely does not need to be fought in the courts when the markets can work this out on their own (as we see w/ these filters). In the end we'll have better technology for sorting and filtering emails which can be applied to other applications and the spammers will be gone or significantly reduced.

[BREATHE... BREATHE...]
Legislation would only be valid in the country in which the legislation was enacted so spammers could simply move their operations to a SPAM friendly country.

[GRADUALLY INCREASE TONE]
Also, what constitues spam? What if I only send 10,000 emails out? What if I change the email each time I send it so it's unique to you? What if I'm not selling anything? What if someone comrpomised my system and sent all the emails from my PC? Why shouldn' ISPs be liable too... yeah, why are they letting people send those SPAMs... let's sue them too... somebody get a rope!!

[BEGIN ALL OUT RANT!]
So the moral of the story is... everyone remain calm... keep working on your filters and other new technologies... and soon we'll have fewer spammers and better tech and some intelligent hacker out there will have a whole heap load of cash for coming up w/ the solution.

Of course w/ all of the existing hideous legislation we have today... SCO may announce that they are diversifying into bulk emailing and that they have a patent on any spam filtering algorythms and therefore if you ever remove any of their emails you must send them a $699 licensing fee for the use of their IP.
Re:Missing the point? by schon · 2003-08-11 02:51 · Score: 2, Insightful

SPAM absolutely does not need to be fought in the courts when the markets can work this out on their own (as we see w/ these filters)

Yes, absolutely does - just like any other sociopathic behaviour. We need clearly defined rules of what is and is not acceptable. Perhaps you haven't noticed, but "the market" is not working anything out - spam is getting worse, not better, and things such as filters make it worse, by hiding the problem (hint: even though your filters hide your spam from you, you're still paying for it.)

In the end we'll have better technology for sorting and filtering emails

This is the fundamental flaw in your reasoning - you can't solve a social problem with technology.

Legislation would only be valid in the country in which the legislation was enacted so spammers could simply move their operations to a SPAM friendly country.

This argument is fundamentally flawed. "Moving operations" won't do anything - they could still be prosecuted if they stay in the country... and so the question becomes: how many spammers would physically move to another country - permanently - just so they could spam? No, it's more likely they'd just go back to whatever scam they had before they began spamming.

Also, what constitues spam?

The definition of spam is "Unsolicited bulk email". That's pretty simple.

What if I only send 10,000 emails out?

Then it's bulk. If it's unsolicited, then it's spam.

What if I change the email each time I send it so it's unique to you?

Is it unsolicited bulk email? If so, then it's spam.

What if I'm not selling anything?

So? IF IT'S BULK, UNSOLICITED EMAIL THEN IT'S SPAM

What if someone comrpomised my system and sent all the emails from my PC?

Then you're not the one spamming, are you?

Why shouldn' ISPs be liable too...

If the ISPs are condoning the spam, then they probably should be liable. If that's the case, then there will be a paper trail.

why are they letting people send those SPAMs... let's sue them too... somebody get a rope!!

If you feel you can't win an argument except by inciting a (hysterical) straw man, then you've already lost.

Spam is a social problem - it doesn't matter what technologies you come up with, spammers will find a way around them. We need to start social remedies to the spam problem.

Re:great by Goldberg's+Pants · 2003-08-10 20:19 · Score: 3, Insightful

But that's still 3 pieces of shit you have to deal with. Sure, it's a simple click to delete, but the fact is WE SHOULD NOT FUCKING HAVE TOO.

Some wanker spammer got my email address and within two days my spam volume went from zero (seriously) to 30+ a day. All for the same fucking thing. These shits should be legal to hunt and kill.

In respose to the original troll, it's a bogus analogy. We PAY for our internet access. We get bombarded with ads on damn near every site... The revenue generated from these scumbags does NOT go towards funding your internet access, or the production of new content. It goes to their wallets. Ergo, you're an idiot.

Side note: "Last, best hope"... I can't be alone in expecting "for peace" to come after that.

Filters do not stop spam... by Tehrasha · 2003-08-10 20:23 · Score: 5, Insightful

...they only prevent you from seeing it.

Your server and its harddrives still end up being a storage bin for it, and the spammers will continue to send as long as your machine allows it to be recieved. Always remember that spam differs from postal junk mail, in that the -receiver- pays for it. Unsolicited postage due mail.

Spam must be -blocked- and the ISPs that allow/encourage its continued spread must re-educated, or be put out of business. Only when spam becomes costly to send with it diminish.

The current proposed laws concerning the subject are currently focusing on content rather than consent. They dont mind if you get spammed with hundreds of ads, provided what is being advertised isnt fraudulent. They overlook the fact that the claim of you having 'opt in' for the spam is in itself the lie and fraud.

--Teh

Re:Filters do not stop spam... by Tehrasha · 2003-08-10 20:34 · Score: 2, Insightful

If you think that your ISP does not incur cost by having to deal with the traffic load and disk storage caused by spam, you are the one in need of a reality check. And if you think that your DSL/Cable traffic is free, then gimmie some of the stuff you're smoking, it must be good.

I changed my mind. Simpler is better. by Peter+Cooper · 2003-08-10 20:24 · Score: 5, Interesting

I have long been an advocate of Bayesian or keyword based spam filters, but have recently been forced to change my outlook, and to argue that MULTIPLE SIMULTANEOUS solutions are the answer.

I encountered a very simple but unique spam system which works entirely on the sender's address. Simply, you create a small database with the domains/addresses you want to whitelist. Then, a program screens your mail, and if the sender is not in your whitelist, it sends an e-mail BACK to the sender with a simple URL (or even an actual link for HTML e-mail clients) which states that they REALLY want to send the e-mail to its destination. When this is done, they are added to the whitelist. Therefore, mails from forged remote addresses are no longer a problem, and neither are mails from trusted sources. And, better than SPEWS or similar blacklists, the sender gets a SECOND CHANCE to send their mail to you.

There's a commercial solution using this system right now, although the URL escapes me.

Of course, one could encounter problems when ordering online, say. Droids at Amazon will not be clicking your links to make sure your order receipt got through. One could argue that you'd put things like Amazon.com in the whitelist, but what if someone used amazon.com as a spoofed e-mail domain/address? Ay, there's the rub. But if this system were tied in with a Bayesian system, it'd be pretty unbeatable. What's more the Bayesian system would have extra data for negative matches, in the form of e-mails that were never 'approved', and positive data in the form of those that were.

So, I'd be more interested in producing a homebrew system that used MULTIPLE weaker systems, than one supposed 'sure fire' method.. as I feel no one method is perfect, whereas multiple systems can approach this nirvana.

Re:I changed my mind. Simpler is better. by ctr2sprt · 2003-08-10 20:38 · Score: 4, Interesting

Any approach that triggers an automatic action on your behalf is bad, because it can be turned against you. It's not likely that email would make a terribly good DDoS service, but a system like the one you describe would certainly be vulnerable to it. And I think it would only last a week, at most, before spammers figured out a way around it. They can already handle "NOSPAM" being inserted in email addresses, and recently added the ability to reverse and combine email addresses until they get something plausible.
I do agree with you that we need multiple layers of safeguards in order to solve spam - or at least to hide it away so nobody has to look at it - but I don't think your specific example is very good.
Re:I changed my mind. Simpler is better. by The+Grassy+Knoll · 2003-08-10 22:38 · Score: 2, Informative

> There's a commercial solution using this system right now, although the URL escapes me

Spam Arrest?

--
They will never know the simple pleasure of a monkey knife fight
Re:I changed my mind. Simpler is better. by scj · 2003-08-10 22:38 · Score: 5, Interesting
I had thought of something similar for fighting spam. Here's how I'd handle each email:
1. If the email is from someone in my whitelist, allow the mail to go through and feed it as 'ham' to the Bayesian filter.
2. If the email is not in my whitelist, run it through spam filtering software (Spamassassin works well) to determine if it is likely to be spam.
3. If it seems like spam, then use a challenge-response system (like TMDA) to find out if a human sent the email.
4. If the mail doesn't seem like spam, just deliver it. If I get 3 non-spammy messages from the same person (separated by a day or more) then add them to my whitelist automatically.
5. If someone responds to the TMDA challenge, put them in the whitelist and deliver the original email.
6. If no one responds to the TMDA challenge after a week, feed the mail as 'spam' the the Bayesian filter.
In addition, I'd use a system like Sneakemail to generate random email addresses to give out to businesses I want to do business with and use to sign up to mailing lists. These email addresses would be added to my whitelist so they could send me mail without going through the challenge-response system. If they start spamming me, I put the random email I gave them on my blacklist.

This system has the following benefits:
- Business mail I want (like receipts and newsletters from companies I do business with) get through always since the Sneakemail-type address is whitelisted. This solves the problem of businesses not responding to TMDA challenges.
- My real email address is protected from businesses who are likely to sell it and from people farming addresses from mailing lists.
- Personal email that the spam filter sees as non-spam gets delivered without bothering the sender with a challenge-response system.
- Personal email that does seem spammy by the filter still has a second chance to make it through the system with the challenge-response system. This should reduce false-positives to include only spammy emails from people who don't respond the the challenge.
- The Bayesian filter is automatically trained based on mails from people in my whitelist and mails from people who never respond to the challenge-response.
You would still get spam with this system (personal email that your filter thinks is non-spam), but hopefully your false-positive rate would be zero. Also, you don't annoy other people much by only sending challenge-response messages to spam-like emails. Finally, this would be easy for end users to use. They don't have to train the spam filter, since it should train itself. The only complicated part would be generating and using the random emails that you give to businesses and mailing lists.
Re:I changed my mind. Simpler is better. by PhilHibbs · 2003-08-11 00:33 · Score: 2, Insightful

...it sends an e-mail BACK to the sender with a simple URL...
And, not being on their whitelist, their email filter sends you an email back with a simple URL...

Re:great by Tirel · 2003-08-10 20:27 · Score: 2, Interesting

ideally, i think the client should take care of the filtering. Pour your resources into improving context based filtering and let the individual clients do the dumping. Widespread usage of this kind of filtering could make spam even further unprofitable. Since spam is entirely business related, it would likely reduce the numbers of it passing through the network.

From a sysadmin's POV, this doesn't halt the issue of spam eating bandwidth or disk space. I'll address that next.

Disk space depends on what kind of e-mail your organization uses. For POP3, most people delete e-mail on the server after its downloaded, so while the disk space may be consumed with spam, it would be temporary. That is unless you have alot of dead or rarely used accounts. In that case, you should have policies in place for when to wipe user's accounts out after a set period of time. Or set up some kind of forwarding policy. If you're using something like IMAP, then using a server-wide content filtering system as mentioned above would be effective.

For bandwidth, the only way to halt spam from consuming your bandwidth is by blocking packets at the router. If you use SPEWS to dump the e-mail by your e-mail server, its still consumed your bandwidth. So you'd have to block the packets directly. I think this is draconian and should be avoided, for the net's sake. Unfortunately there really is no good solution to this, for as long as spam flows, it flows and consumes bandwidth. The only way to halt it is to halt the initial spamming to begin with. As mentioned above, when your spammer's audience never exists as a result of good content filtering, the spam will be unprofitable and lessen somewhat.

Attacking users and their ISP's won't do much good, aside from causing spammers to jump from isp to isp, something they're readily willing to do. Attacking regular users just makes you a big jerk.

Re:What about features other than text? by Gaza · 2003-08-10 20:28 · Score: 3, Interesting

Yes it does, the developers have created a test suite and a very extensive tokenizer. Any additional pseudowords, or new ideas to tokenize a message are tested very throughly before they are added (as most tend to actually lower accuracy instead of raise it). There have even been tests using SpamBayes on just headers and just message bodies and both have worked very well.

Re:great by mirko · 2003-08-10 20:30 · Score: 3, Insightful

I have more than enough things to worry, including my shopping list, my housekeeping tasks, my garden... to just lose time and nerves other that few junk : when I get an unexpected commercial in my snail-mailbox, this *is* annoying as, here, in Switzerland, we pay for each garbage bag we throw away.
So, spam is junk, indeed, but i dispose of it almost instantaneously.

I won't make spamfighting my Holy War...
I have more interesting and valuable things to deal with IRL and I am naturally optimistic.

Let the spammers waste their time sending their hectobytes of off-topic (mostly american-centric) mail to my ever-improving filter.

--
Trolling using another account since 2005.

Re:great by devnulljapan · 2003-08-10 20:31 · Score: 5, Insightful

Just remember though, we would never have television without commercials. Sometimes advertising is necessary.

NEVER?....Try the BBC?
No ads, quality programming, small fee.

Spam is not the same as commercial by Eric+Ass+Raymond · 2003-08-10 20:35 · Score: 4, Insightful

Please, go right on ahead and point out why spam is not the same as a commercial.

I'd be happy to.

I don't know about you but for me e-mail is an important part of my work - not something comparable to watching cable TV.

Spam clogs my mailbox and I have lost several important e-mails from clients when deleting the spam which, by the way, is often disguised as legitimate non-commercial mail and comes with forged headers. In addition to pushing fraudulent products, these facts make spam a completely different beast from the cable TV and its legitimate, controlled ads which eat up only my free time - not my emails or work efficiency.

--
BOO! TERRO

"Bayesian" by RDPIII · 2003-08-10 20:40 · Score: 4, Insightful

I don't mean to troll, but I hope it's not too late to put an end to the unfortunate term "Bayesian spam filtering". This is perhaps the worst abuse of the adjective "Bayesian" I've seen, because nothing crucially depends on the application of Bayes' Theorem and/or on the use of Bayesian methods (informative priors, model selection, etc.). Why not simply call it "data driven spam classification" (as opposed to "rule based") or "empirical spam filtering"?

If the spam disaster had struck fifteen years ago, we'd all be talking about "neural spam filtering" (using artificial neural networks, ANNs) and basking in the warm fuzzy feeling imparted by the term "neural". But ANNs and Bayesian classifiers have the same interface: both are trained on labeled data and can be used to classify unlabeled data. The implementation details are not of primary importance, and if you think they are, I'd encourage you to look into large margin classifiers instead of Naive Bayes or ANNs.

--
Marklar: marklar

Re:"Bayesian" by file-exists-p · 2003-08-10 21:58 · Score: 5, Informative

As far as I know, many of those filters are based on a decision rule of the form

P(mail is spam | words X, Y, Z, ... are in it) > 1-epsilon

The computation is then done using Bayse's rule (P(A|B)=P(B|A)*P(A)/P(B)) under certain independance assumption which makes it tractable.

So this is actually bayesian filtering ...

My favorite filter is spamoracle

A new poll is required by mirko · 2003-08-10 20:42 · Score: 4, Interesting

How should spammers be dealt with ?

Ban their original networks
Throw them in jail
Kill them
Fine them 0.01$/email and improve third world infrastructures with the money.
Filter/Ignore them.

I'd personally go for the last option... Maybe the next-to-last if their suit takes place in a really democratic place (there are 278 millions American citizens and 2,2 of them are in jail, this is a *lot*).

--
Trolling using another account since 2005.

Re:A new poll is required by anubi · 2003-08-10 21:48 · Score: 2, Interesting

I like your last option best, too. I hate to suppress anyone's right to say whatever they want to, but then I want to reserve my right to what I choose to pay attention to.
Under the existing technology, a spammer is like the royal pest on a city bus which takes advantage of the captive audience. The analogy here is that we have to download our POP box, we have no way of arranging our affairs to where the signals exist, but we deliberately choose not to tap into them.
I believe the technology must change. I am loathe to try to settle what I consider a technological issue by passing some sort of law... doing this just makes immense profits for litigators, but does little to solve the underlying problems.
If the technology could change to where ISP's could provide individual bayesian-type filters at the server level so that messages fitting criteria that each individual screens for, this could let the ISP off the hook for dropping messages, as well as having to supply any long-term storage for them... Somehow I get the idea that spammed messages are going to be very similar and should show a very marked correlation to the same spam sent to other accounts in that ISP. The ISP, upon determining a significant number of accounts filters have flagged a particular mailing as a spam may provide the ISP with the opportunity to only store ONE copy of the spam, while possibly putting only pointers to it to the subscribers.
So, what I would think would solve this is if the internet became more like radio transmissions. I support the idea that anybody can transmit whatever they want to the public, and if anyone wants to listen in, fine. But, like RF, it has to make it through the filters before it gets to the listener. The damn-near infinite advantage to the net-based paradigm is we have an almost infinite bandwidth in the notion that anyone can set up his transmitter and not step on someone else's signal. ( i.e, there's only so many "channels" in the AM, FM, or TV broadcast bands, whereas the internet does not have this limitation. ).
Anyway, thats my two cents worth.

--
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
Re:A new poll is required by Cato · 2003-08-11 00:32 · Score: 3, Informative

See http://death2spam.net - this is a commercial mailbox service that appears to have really good bayesian-style spam filtering (referenced by Paul Graham in a recent article) - they even fetch URLs in some messages to filter based on website content. They don't require individuals to train on their own messages, which may be controversial but also makes it feasible to deploy this at large scale in ISPs.

Without major ISP deployments, the response rates to spam will not go down, since the clued-up individuals who deploy filtering themselves would never have responded to spam anyway.

Your RF analogy is interesting but it breaks down for people with wireless mobile phone links, dialup when travelling, and so on. The best thing is to make spam unprofitable so it goes away.

You really just don't get it by frovingslosh · 2003-08-10 20:42 · Score: 5, Insightful

Realistically, I don't give a damn how much spam _you_ get, I care that _I_ don't get any.

But you still do get spam. Exactly as much of not more because you use Bayesian filtering. Spam still wastes your bandwidth to download that spam before it can be filtered. Spam still wastes any inbox size limits your ISP might impose. Spam cuts into any quota a forwarding service might now or in the future impose on your account, or it could take you to a higher charge level if you pay for a forwarding service. It costs your ISP money, costs that one way or another are eventually paid by you. Even the processing power for that Bayesian filtering costs you CPU cycles, while having no negative effect on the spammers whatsoever.

While you might not think you care how much spam I get, you might care if dozens, hundreds or thousands of other users at your work also get tons of spam, particularly when all of that spam significantly cuts into your bandwidth. And you will care when overload from spam on your mail server is so bad that it causes failures, effectively causing a D.O.S. situation.

And as long as geeks happly play with their little Bayesian filters, they stop seeing spam and so stop complaining to the providers that are letting spam get through. They stop doing other things that might make spammer's life difficult. Heck, I fully expect some spam haters with an additude like yours to say within earshot of a congressman or Senator something like "Oh, I never get any Spam. Spam can be filtered easily and nothing should be done about it". The spammers should love Bayesian filtering, it takes the presure off them while allowing them to reach exactly the same number of marks with a mailing.

--
I'm an American. I love this country and the freedoms that we used to have.

Re:You really just don't get it by Plug · 2003-08-10 20:52 · Score: 4, Informative

I don't disagree. I think that eventually we should move to a better email model - something like TMDA perhaps, where there is no guarantee that spammers can reach mailboxes. Or better legislation to make spamming punishable, controls on mail routers on million message mailouts, etc. Or djb's Internet Mail 2000, which moves the onus onto the senders network to store all 1m messages at a time, until people pick them up.

The other thing you can do is impose a microcost for mailing - at 1c/mail, spamming isn't economical any more. But then that is going to penalise the people who have legitimate reasons to send a million emails at a time - you'd have to have a very good micropayment system working on the Internet to do this.

However, those things need widespread change, and they need people in positions of power. Joe User at home can push for it, but they still get spam and they still want a short term solution. I suggest that even if they're filtering, the action of having to check their spam filter will make them irate enough. I see it as being like IPV6 - everyone would really have to change at once for the system to be most effective. (I use Freenet6, do you?)

Now that viruses are public, caught quickly, and Microsoft are being a lot less lax with security (I am in no way commending their effort, but they at least mostly fixed the Outlooks), you don't see people writing them nearly as often. I feel spam will get the same.
Re:You really just don't get it by schon · 2003-08-11 02:25 · Score: 5, Interesting

spammers should love Bayesian filtering, it takes the presure off them while allowing them to reach exactly the same number of marks with a mailing.

I'm afraid you've made the cardinal mistake of thinking that spammers follow logic.

First question: Why do people install filters on their mailboxes?

Answer: To stop spam.

Now, take a look at any interview with any spammer.. you'll note that when they're asked, the spammer will say "I don't send it to people who don't want it."

They'll also say "we're always coming up with ways to bypass filters."

Now, you'd think that with the two statements, that one of them is false - however (besides the fact that spammers lie), any sociologist will tell you that the spammer actually believes he's telling the truth in each of these statements..

How he justifies it in his mind is that he believes that even though someone has installed a spam filter, that this person only wants to filter spam from other spammers - that his spam is somehow "special".

Spammers are sociopaths, and like all sociopaths, they believe the rules do not apply to them.

If spammers weren't sociopaths, and were capable of applied logic, then they'd realize that any filter (not just Bayseian) would benefit them.. but then, if they weren't sociopaths, they wouldn't be spammers in the first place.

wtf by timerider · 2003-08-10 20:43 · Score: 2, Insightful

When will 'the net community' finally get it?
filtering is no solution as long as there's no way to stop the spammers!

Or would you say that ignoring the corpses in the gutters would be a solution to the problem of violence on the streets?

bye
[L]

Re:wtf by Chokma · 2003-08-11 00:21 · Score: 3, Funny
filtering is no solution as long as there's no way to stop the spammers!
Or would you say that ignoring the corpses in the gutters would be a solution to the problem of violence on the streets?

Your analogy is slightly flawed. In the case of spam, it would be correct if:
- I would have to examine every corpse closely to determine if it is sill alive
- I would have to manually remove the corpse from the street
On my system, SpamAssassin kills 99% of the Spam, carries it outside, buries the remains in the spam folder and cleans away the bloodstains on the floor. The less I get in touch with spam, the better.
In the perfect world, there would be a "nuke obnoxious netizen" button on my keyboard. But alas, we have to settle for slightly less efficient methods.

Re:great by impluvian · 2003-08-10 20:47 · Score: 4, Insightful

I think there's a very simple distinction that can be made between spam and television advertising, and it has to do with the amount of control that your service provider exercises over the advertising content.

When you watch cable TV, you know that for an hour of content, you are going to see up to 12 minutes of advertising. The advertising is controlled by the cable company, and no-one can advertise on the channel without going through that 'filter'.

Spam, on the other hand, is not restricted. If I receive 100 e-mails a day, anywhere from 0 to 100 of them could be spam. None of those spams are sanctioned (or controlled) by my service-provider, and they were not part of the package I signed up for.

Re:And the winner is... by Gaza · 2003-08-10 20:52 · Score: 3, Informative

SpamBayes has a very well done pop3 proxy that will work with ANY pop3 mail client, including Eudora. There is also an IMAP filter for those that like IMAP and for those procmail fans it also has an app called hammiefilter which is a command line version of the SpamBayes tools.

SpamBayes also has a very well done and integrated Outlook plugin which leads to the common misconception that SpamBayes will only work with Outlook.

Also note the review mentioned that both SpamBayes and POPFile work on multiple platforms and he is reviewing the pop3 proxy on both them, not their counter part outlook plugins.

Re:great by advocate_one · 2003-08-10 20:55 · Score: 2, Informative

No ads, quality programming, small fee.

No Adds??? no, it's stuffed to the brim with promos for their own stuff though... (Gardening magazine, History magazine, Nature magazine, Radio times, TellyTubby toys, Fimbles stuff, trailers for upcoming programmes and series)

Quality programming??? it's gone really downmarket in the last few years..

Small fee??? That fee is your license for receiving _all_ television programs, even cable and satellite... not just the BBC. Although that license money goes to the BBC, really a goodly share of it should go to the other service providers as well.

--
Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.

Re:YFI list by Oddly_Drac · 2003-08-10 20:57 · Score: 2, Informative

"Address doesn't match reverse lookup"

You'd be surprised how many DNS servers are completely misconfigured for this, but I think that a simple ping to the address given could actually show if it _existed_.

Personally I've found that I can reduce my spam by a huge amount by never viewing HTML...which brings a thought about tracking and tracing the webbugs in any given piece of HTML email...

--
Oddly Draconis
Too cynical to live, too stubborn to die.

Authentication of senders by flakac · 2003-08-10 20:57 · Score: 2, Insightful

Sorry, but filters are not the final answer. Even when the filters can "learn", the user still has to expend a certain amount of effort to "teach" the software. And quite frankly, spammers (or the people who write automated spamming software) just need to study the filters and learn to get around them. And worse, you can never be sure that the filter is not deleting email that you actually want, unless you set it to never delete suspect mail, allowing you to examine and delete it manually. But at this point, you've gained absolutely nothing -- simply setting your email client to put all email that's from addresses not in your address book, or that doesn't contain your exact address in the "To:" line will achieve exactly the same effect.

The only thing that can truly save email is to switch to a service that requires authentication of senders.

Re:Authentication of senders by frovingslosh · 2003-08-10 21:33 · Score: 3, Interesting

The only thing that can truly save email is to switch to a service that requires authentication of senders.
I agree with everything that you said about filters being ineffective. But I strongly disagree with your "only thing" statement. Particularly if you mean it as any of the systems I've ever heard about, such as "If it's not in the address book, the sender must acknowledge a challange message" type of approaches. The problem with such systems is that many of us get quite a bit of e-mail each day from people who are not in our regular address books, some of it quite important to us. We do not want that mail lost because the system at the other end was not in out address book and did not waste their time responding to a challange and response type system. For example, say I purchased something on-line from a vendor I had never dealt with before. Their e-mail system may automatically kick out an e-mail that informs me the product was shipped and give me an important Fed-ex or UPS tracking number. I'm glad they do such things with their shipping systems, and I don't expect them to manually respond to every challange they get back; realistically they will send any such challanges to the bit bucket and people who want e-mail that is important to them will end up never getting it.
So I do not believe that Authentication of senders , at least in any of the traditionally suggested ways, is the correct approach. Much of the spam problem we have is due to what I consider flaws in SMTP. I would very much like to see a replacement for SMTP that considered the spam problems (as well as other problems inherent in SMTP). As an example, another post here mentioned a system where the mail is held, not on your ISP or upstream provider's system until you download it, but rather is held on the sender's or sender's ISP's system. The recipent would presumably receive only a very short indicator of where they have mail waiting, and would fetch it themselves when they are ready to receive it. The puts the burden of storage on the sender or the service provider for the sender, and avoids considerable bandwidth wasted by senders who supposedly send out e-mail with addresses generated to match all combinations of up to x characters (the excuse Mindspring gave to me when addresses that I created but never gave out or used started getting spam, not that I believe them). In addition to putting this burden on the sender, it would insure that there was a good address in the e-mail to fetch the mail from, so spammers would have a much harder time injecting their spam into the system and would be much more traceable. And while I'm not foolish enough to think that laws could completely stop spam, we've seen how laws did drastically curtail fax spam, and some fax spammers have recently been made to pay serious fines. I do think laws would have a big effect on spammers; ther are a lot of spammers who just don't want to have to move out of the country to keep up spamming, and those of us who hate spam will track the spam back to US sources if we have a law with teeth in it to impose fines (or worse) on them when we do.
Of course, and change to or replacement of SMTP must be phased in over time. It's not a short term solution to spam. But I expect SMTP would quickly go the way of gopher or archie or the rest if a viable new protocol was presented that addressed these problems effectively, and this is where I think out greatest chances for sucess are.

--
I'm an American. I love this country and the freedoms that we used to have.
Re:Authentication of senders by pongo000 · 2003-08-11 02:26 · Score: 2, Interesting

say I purchased something on-line from a vendor I had never dealt with before. Their e-mail system may automatically kick out an e-mail

Using TMDA, you would generate a "keyword" address: A unique addressed, identified by a keyword embedded in the address, which would allow your vendor to bypass the C/R system. If they keyword address starts being abused then (1) you can easily disable it, and (2) you know not to do business with that vendor again.

As an example, another post here mentioned a system where the mail is held, not on your ISP or upstream provider's system until you download it, but rather is held on the sender's or sender's ISP's system.

This system quickly breaks down, though, as delays are introduced by having to wait to fetch each piece of mail. People bothered by such delays will write/obtain software that automatically fetches the mail at a predetermined time, which would then shift the bandwidth problem (part of it, anyways) back to the recipient.

The other problem with sender authentication is who, exactly, determines whether a sender is authenticated? I run my own e-mail server. Will I have to pay out bucks for an "authority" to confirm that my sending address is valid? Right now, some ISP's (notably Time-Warner offshoots) are denying access to their SMTP servers under the guise of reducing spam. If your IP happens to fall within a certain range, they simply don't allow you access. We will end up in the same morass RBL has put us in: Who plays God in determining whether a sender is truly "authentic" or "worthy"?

Why not stop the sellers? by Anonymous Coward · 2003-08-10 21:03 · Score: 5, Insightful

I know this is slightly off topic, but can someone answer me a reasonably simple question thats been bugging me for a while?

Why not instead of hunting down the spammers do we not hunt down the people who are selling and advertising their junk via the spammers?

The spammers purposly make themselves difficult to find, but it must be easier to track down a company that is collecting money and sending out products? Why not make the using of spammers services illegal and fine and punish those doing so?

I think Im correct in saying and please tell me if Im wrong, but here in the UK a similar situation is people "fly-posting". In these cases, if advertising posters are put somewhere illegal or unwanted, it is not the person who put the poster up that is fined, but the club, record label, whoever is beign advertised that takes the rap.

Just my 0.02p

Re:great by Goldberg's+Pants · 2003-08-10 21:05 · Score: 4, Insightful

You probably ARE a scumbag spammer.

For people who have to pay for their online time (England for example), these scumbags are essentially stealing money from people. Filtering only works once you've downloaded the mail. You still have to download their worthless drivel. Sure, it may be pennies a week in costs for a user, but you tally that up over a year or two of dealing with these idiots, and you've got a sizeable chunk of change. Certainly enough for a nice pizza.

Let's not forget the TIME these shits waste as well. All this work invested in stopping spam. Who know's what cool stuff may have come from the minds who instead are working on ways of dealing with the email cancer.

As I said, these scumbags should be legal to hunt and kill.

Mozilla - filters on client not server by Zog+The+Undeniable · 2003-08-10 21:15 · Score: 3, Interesting

Moz's Bayesian filtering works well, but its Achilles heel is that it doesn't work on the POP3 server, so you still have to download everything. As POP3 allows the header and the first part of the message body to be read without downloading it, surely there could be an option - once Moz has been trained and you're fairly sure the false positive rate is negligible - for filters to operate on the server and delete spam from there?

--
When I am king, you will be first against the wall.

Re:Mozilla - filters on client not server by pe1chl · 2003-08-10 21:46 · Score: 3, Informative

It would be nice if there was filtering done on the server. Then you would not need the packages that are reviewed here.

However, that means a change to the server, and a change to the POP3 protocol. The ISP would have to install a filtering plugin or a modified version of the server, and the client would subscribe to this service and train it (every client would have his own dictionary). With the first few messages there would be some special POP3 report back to the server indicating that you consider it spam, and from then on the server would filter on its own.

However, that would be difficult/impractical to roll out, so you will have to live with clientside filtering like in Mozilla.
Re:Mozilla - filters on client not server by letxa2000 · 2003-08-11 02:11 · Score: 2, Interesting

You have pretty much described PrismEmail. It, among other things, does Bayesian filtering. It's server-based so you don't have to download the spam. It's user-specific so you have your own Bayesian corpus that applies only to you, not server-wide. You can inspect blocked email on the server at any time or wait for a single spam report each night to see a list of all email blocked--a quick click will then release any message that was misclassified. And you can just click on a link in the headers of a message if it was spam and it got through.
Really, all the people that think that server-side Bayesian filtering is impossible are confused. No, you can't have a single corpus that applies to everyone on the server--that defeats the purpose of Bayesian. But you definitely can do the user-specific filtering on the server. Let the server do the work, you only download the good stuff, and there's nothing to install locally.
Re:Mozilla - filters on client not server by HermanAB · 2003-08-11 02:56 · Score: 2, Informative

I run SpamProbe on the server. For any given business, everybody will receive pretty much the same sort of mail. So a single database works like a charm, with atypically 99.5% accuracy and zero false positives. This works because Spamprobe also counts word pairs, something that no other word counting filter does. To compensate for the enormous increase in computational load, it uses BerkleyDB as a backend. For corrections, i create a user called spam. Corrections can then be forwarded to this user, to reverse the database entry for that message.

--
Oh well, what the hell...

Re:great by ntmuffin · 2003-08-10 21:20 · Score: 2, Insightful

Yup, I was also thinking "for peace" ;) Long live B5

But on the other side ... I've had the same problem you had - going from 0 to 25-30 a day, sometimes even more. I don't think we'll ever be able to stop the spammers, but I think that some of the blame has to be put on those people offering free mail services like Hotmail, Yahoo.com (and .ca) and AOL. 95% of my spam originates from accounts on their domain, and when I'll try to send bounce messages with Mailwasher, the accounts used to spam me doesn't exist anymore ... so if these mailservices had made a system couldn't be used to create accounts automatically with a script, we might se a little more spam out on the net, as I doubt that the spammers would bother using lots of time creating accounts themselves ...

I like the thought of an all year huntinglicense for spammers though ;)

In related news by heli0 · 2003-08-10 21:44 · Score: 3, Informative

If you have ever signed up with the Direct Marketing Association's Mail Preference Service (list of people not to send junk mail to), but continue to receive stacks of crap every day, here is what you can do about it: Prohibitory Order

Links to pdf's you need to print and mail in included.

"A little-known Federal law allows individuals to send a Prohibitory Order against companies that are sending unsolicited sexually provocative or erotically arousing mail. The Supreme Court went one step further, allowing individuals to decide what constitutes "erotically arousing" mail. The law makes it illegal for a company to send mail to an individual within thirty days of receiving the Order."

"Postmasters may not refuse to accept a Form 1500 because the advertisment in question does not appear to be sexually oriented. Only the addressee may make that determination."

--
Whenever the offence inspires less horror than the punishment, the rigour of penal law is obliged to give way...

Everyone? by Jon+Peterson · 2003-08-10 21:46 · Score: 2, Insightful

"Support both Windows and Linux " ...
"The first requirement is because I wanted the results to be applicable to everyone"

My how the definition of everyone has changed. So it's bad luck Mac, Solaris, *BSD, HP-UX, VMS users...

--
----- .sig: file not found

Re:YFI list by aduxorth · 2003-08-10 21:46 · Score: 2, Interesting

another goodone is if the domain from the envelope sender doesn't have a MX record. bam guarenteed spam. The other one is to verify the sender not just the domain. This kills all those spams from lkiqprejbn@yahoo.com which are obviously bulldust.

That alone kills off about 70% (IMO) of the spam that comes through servers that I administer, and as far as I know, only 2 emails(over the last 4 years or so) that wern't ment to be rejected were rejected because they had invalid sender envelopes.

HTH
cya
Andrew

Something he misses about popfile. by CGP314 · 2003-08-10 21:49 · Score: 4, Interesting

One of the things I love about popfile is it is not a Spam filter. It is a general mail filter. I have about ten categories of mail that it sorts out for me. This also helps cut out false positives. 'Work', 'Personal', 'Friends' and all much more similar to eacth other than 'Spam'.

Eh... by hendrix69 · 2003-08-10 22:02 · Score: 2, Interesting

POPfile really got shortchanged by this review. It serves as much more that a spam filter. I thought I'll give SpamBayes a try anyway but the Outlook plugin won't install on my XP machine. Some problem with an unresolved dependency in shlwapi.dll... boring. The point is, the SpamBayes site doesn't have a tech support forum where I can ask for help with these kind of problems.

--
The power of Christ compiles you!

Why filtering isn't the solution by nuwayser · 2003-08-10 22:16 · Score: 4, Insightful

An analysis of filtering methods against spam is kind of like a comparison of bullet-proof vests in that there's no incentive to stop someone from pointing a gun at you and firing it. In the past, spammers have been grossly affected by more sweeping changes, and I'm afraid filtering methods are only creating the mindset of, "Give up, use this software, it will do the deleting for you." It takes the attitude of, "just delete the stuff" and makes it automatic; sure it's convenient for a time, but in a year you're still going to get spam and your ISP will likely have fewer resources to deal with the complaints.

I'm saying, why not focus instead on technology which puts a bigger dent in spammers' ability to operate, like how to secure against proxy hijacking.

--
"The cup... the drop... it's a YES!"

Re:hmm, if you really are so clever by Anonymous Coward · 2003-08-10 22:20 · Score: 5, Interesting

Very good.

Speaking from experience, I know for a fact that many of the harvesting programs (written in perl, running on linux, written by geeks) are very robust at deciphering most email obfuscation methods. You all sit and shake your fists, and the spamware writers are laughing their asses off.

You have the easy answer: don't obfuscate your email, don't even bother putting it on your posts.

Re:great by Zog+The+Undeniable · 2003-08-10 22:28 · Score: 2, Informative

Yahoo uses captchas to prevent scripted sign-ups, so if you get anything from a Yahoo mail account, there was once a human (OK, a subhuman) at the other end.

--
When I am king, you will be first against the wall.

POPFile is more than just a spam tool by rediguana · 2003-08-10 23:20 · Score: 4, Interesting

POPFiles utility does not lie just in managing the spam menace. To me, the real utility in POPFile is the ability to create x number of buckets and train it to sort your mail. SpamBayes looks great for spam but has no further utility. I like having POPFile sort my work from personal emails, and file all my mailing lists in another, and even jokes. Of course there is the spam folder that I check every now and then. I look forward to it being able to support IMAP servers as well.

Re:POPFile is more than just a spam tool by BradleyUffner · 2003-08-11 01:16 · Score: 2, Informative

I agree, I just discovered POPFile last week when it was shown on BBSpot. I use an exchange plugin called Outcast that allows POPFile to work over exchange also. I have several buckets setup to help sort incomming email into the correct folder for different projects and it works fantasticly. I've only been training it for about 3 days and it already sorts with almost perfect accuracy.

POPFile, and Outcast rock.
Re:POPFile is more than just a spam tool by topham · 2003-08-11 01:20 · Score: 2, Informative

I installed POPFile on my parents computers; I was worried because I thought the interface (web interface) would be confusing to them; since you couldn't do everything within the email client itself.

Works great. My father, who gets far more spam than the average person (why I don't know) has virtually 100% success rate.

Re:great by Afty0r · 2003-08-10 23:55 · Score: 2, Insightful

Actually, the rapid growth of endorsements, product placements, "documentaries" about products etc. means that you're really seeing far more than just 12 minutes of advertising, the only restriction is that you're limited to 12 minutes of OBVIOUS advertising.

SpamPal by UpnAtom · 2003-08-11 01:04 · Score: 3, Informative

I did my own investigation of spam filters about a week ago. I didn't test the actual algorithms, just the features.
SpamPal with the add-on Bayesian filter (search Google for it) came out top. It works as a proxy and also provides blacklist/whitelist/known Spammer list checking.

Re:great by lone_marauder · 2003-08-11 01:11 · Score: 2, Insightful

OK, I'll bite on this troll just because it's still at zero, and the moderators need a reason to finish it off, placing it firmly in -1 hell where it belongs.

In the days before user-paid television service, it is true that advertising was the business impetus to put up huge powerful TV transmitters and undertake the other investmentss necessary to support land-based TV broadcasting. You are correct, therefore, in pointing out that TV content from 1977 derives from the business need to advertise.

But to suggest that the meager investments in bandwidth and hardware the average spammer makes is somehow otherwise useful to the world is absurd. When one considers that most of the infrastructure costs of spam are borne by the recipient rather than the sender, the idea of spammers contributing to the public good is assinine.

--
who are those slashdot people? they swept over like Mongol-Tartars.

It's virtually impossible to not get spam? by setien · 2003-08-11 01:24 · Score: 5, Informative

No it's not.
I get spam at the rate of 1 spam mail per 6 months or so. Or maybe even less. I can't remember getting a single spam email on my actual email address for about a year.

If you have an account on a crapless domain (i.e. not hotmail.com, msn.com, aol.com and the likes),
it all comes down to this very simple rule:
Do not, under any circumstance, have your email address posted publicly accessible ANYWHERE on the web.
It WILL get trawled. And then it will be spammed relentlessly.

If you have an existing address you don't want to give up, or an address at hotmail.com or a similar place, dump it.
Then exercise a bit of common sense about where you use your actual address.

I have a domain which catches email to unknown addresses and put them in my regular mailbox.
Whenever I have to give an email address to some place on the web, I use *domain-i-am-currently-visiting*@mydomain.com. So if I am visiting foobar.com, I would put in foorbar.com@mydomain.com.
I have been doing this for years. It enables me to see what was the source of the leak when I get spam on one of the addresses.
It has taught me one thing: I have never, ever, ever, in all my years of online shopping, forum posting etc, come across a single website that have ignored their own privacy statement. Ever. Even the slightly sketchy sites (like divx subtitle sites) don't leak addresses.
I was surprised to realize this.

The only addresses I ever get spam on are the ones I know to be publicly displayed on the web.

So it's that easy to avoid spam.

--
Give me liberty or give me kill -s 9

Re:It's virtually impossible to not get spam? by Aidtopia · 2003-08-11 03:31 · Score: 2, Insightful

There's one more ingredient to your recipe: get lucky.

It doesn't help when the spammers use a dictionary attack against your domain (aaron@domain.com, abigail@domain.com, adam@domain.com, ...). I guess your domain has never caught the attention of such spammers. Lucky you. They troll my domain on a regular basis.

Some of the published experiments that try to track the harvesters have found that short names near the beginning of the alphabet (like mine) are far more likely to get tons of spam. Other problems are needing to support addresses like "webmaster".

Blame the idiots that respond to SPAM. by momus_radar · 2003-08-11 02:13 · Score: 3, Insightful

This method of combating SPAM is amazing to me. Admitingly I'm a little behind the geek times so my interest in this method was peaked when Apple released Mail.app. But I still use Mac OS 9 and am in no rush to run X yet so I'm glad to see there are alternatives that I can use.

I think the only reasonable way to rid the world of SPAM is to get the foolish folk who respond to it to stop. The reason there is so much of it now is that it seems to work; there are people who actually respond to it. If these people stopped responding to it the use of SPAM would most likely diminish.

Sending SPAM costs money. No sence spending that money if no profit is made.

The real reason SpamBayes wins... by Moryath · 2003-08-11 02:56 · Score: 4, Interesting

The "unsure" feature directly combats the latest Spammer technique -- filter poisoning.

You've all seen it work; the Spammers don't just send you the same spam once, they send you it 5 to 20 times, and they include a clipping from the headlines or something under their pitch.

They're not doing it to get that one mail past to you. They're actually HOPING that you classify all 20 mails as spam.

Why?

Because every time you classify that mail as spam, EVERY SINGLE WORD of that news clipping is "poisoned" inside the filter, and becomes an indicator of a spam. Then you turn around, and get an email from someone legitimate using those common words... and it gets wrongly classified too.

Enough false positives, and the spammers win, because they'll get you to turn the filter back off.

Enough is enough -- time to establish open hunting season on Spammers.

SpamBayes Testimonial by Cytotoxic · 2003-08-11 03:37 · Score: 4, Interesting

As a network/web/computer manager, my email has been provided to dozens of companies and trade shows. I still remember the day (August, 3 years ago) when someone first sold my address to a spam list. I went from 2-3 spams per day to 15-20. This spring brought another explosion, this time into the 100+ range. I am currently receiving over 6,000 spam messages every month! Obviously my main email address was useless and needed to be burned on a pyre to purge the evil.
After a week or two of this, I installed SpamBayes in the form of it's outlook plugin. I showed it my email archive as my "good" messages, and a bunch of spam gleaned from my deleted folder as "bad". My mailbox is now perfectly clean. I have received at least 15,000 spam messages since installing SpamBayes, and I have probably had to hit the "Delete As Spam" button about 10 times for ones that it missed, most of those being variations on the Nigerian scheme. It has never grabbed a real message, and the "Unsure" feature localizes everything that I really need to look at in one place.
If you have a spam problem, get SpamBayes. It is that simple. There is no need to speculate about that better method that you thought up, or how it really won't work because of XYZ theory... it works almost perfectly, and it lets you know about anything that it is not sure about with the "Unsure" folder, so it never throws the baby out with the bathwater. In short, this is almost the perfect Spam filter. It even caught the emails that were using GIFs to avoid being filtered on content, placing them in unsure until I said "this is spam", after which I never saw another one. Pretty darned cool!
It is actually kind of fun to watch this thing work. I came in this morning to find 568 new messages in my spam folder, 3 in unsure, all of which were spam. No spam anywhere to be found in my inbox, just 15 unread messages that were correctly left alone by SpamBayes. Just imagine having to flip through 600 emails to find 15 real messages! Now I just hit "CTRL-A DEL" in my spam folder and it is all gone! 5 seconds a day to deal with spam, I can live with that....

Re:A new *law* is required by felis_panthera · 2003-08-11 03:41 · Score: 4, Insightful

Out of that 2.2 million people, somewhere near 700,000 are in jail from possession, use or distribution of marijuana. A law that was originally used to control migrant mexican workers has bogged down the american legal system to the breaking point. Imagine, 700,000 new cells open for child molesters, rapists, spammers, and SCO executives.

Wouldn't it be grand?

PS: Sorry about the OT, but things like this need to be said whenever the opportunity presents itself.

--

The chains are broken
Loki is free
Ragnarok is at hand...

MIMEDefang + SpamAssassin + Razor by wytcld · 2003-08-11 03:58 · Score: 3, Informative

SpamAssassin has Bayesian learning, which I have running but not for long enough to test. I recently set up MIMEDefang as a Sendmail milter calling SpamAssassin (which calls Razor). This setup allows Sendmail to reject e-mail beyond an arbitrary SpamAssassin score. The remote mail daemon is informed the mail cannot be delivered.

Setting that score at 8 has resulted in no false positives over a week (I log From and Subject information - it's all obvious spam). Then stuff that scores between 5 and 8 I divert to a separate mail box, which I comb through every day or two. There have been two false positives that ended up in that over the week. This is with hundreds of e-mails for a half-dozen users coming in a day. I also end up, with this setup, with 2-4 spams making it through to my own mailbox (the bussiest on the system). These are, because of the filtering, the least obnoxious, and easily enough report to Razor to spare others. Meanwhile, I like to keep a window open to the mail server running "tail -f mail.info | grep REJECT" and watch a dozen or so attempted spams an hour refused acceptance with a message like "554 5.7.1 SpamAssassin score of 15, rejected" back to the origin, which is enough that if it wasn't spam any good mail daemon will inform the sender, and they can find another way to get through.

Even if this gives spammers a clue about ducking SpamAssassin, the spams that can get by it are by far the least obnoxious. I look forward to seeing if the Bayesian feature helps (it feeds itself anything ti scores at over 15 by default). But it's a pretty good system short of that. If it became standard for ISPs to reject all mail with a SpamAssassin score of 8 or higher, the loss of legitimate communications would be exceedingly rare, and politeness standards would be encouraged.

--
"with their freedom lost all virtue lose" - Milton

Re:hmm, if you really are so clever by Wilk4 · 2003-08-11 04:14 · Score: 2, Interesting

According to Why Am I Getting All This Spam? Unsolicited Commercial E-mail Research Six Month Report, most harvesters really *aren't* that smart, so even simple email address obfuscation and removal from websites can have a dramatic impact on how much spam you get.

The other good news from that study is that they show that spam does decrease after you remove your email address from websites... in other words, they don't keep the addresses as much as we generally believe. You aren't on every spammers list forever just because they get your address once.

Mail.app, remark on graphics by dr2chase · 2003-08-11 04:34 · Score: 2, Interesting

I was more than a little disappointed to see that Apple's Mail.app was not included in the comparison. It wouldn't surprise me in the least if it were already the most widely used Bayesian spam filter. Unsurprisingly, it is also very easy to use.

Mail.app also combines Bayesian filtering with the Address book -- any mail from a known correspondent won't be tagged as Junk. This reduces the risk of false positives. This is an integration cheat not available to stand-alone spam filters, because Apple supplies the Address book app and provides other integration between the two applications. But, (as a self-centered end-user) I don't care that it is a cheat, I am merely happy that it all works well. (And I cross my fingers and hope that somehow, Apple's C/C++/Objective-C programmers are less prone to leaving buffer overflow holes than Microsoft's programmers clearly are.)

The author needs to read Edward Tufte's books on presenting information (e.g., The Visual Display of Quantitative Information).

Bayesian 5 third season opening credits by Dhraakellian · 2003-08-11 10:56 · Score: 2, Funny

The Bayesian Project was our last, best hope for peace.

It failed...

But in the year of the Spammer War, it became something greater: Our last, best hope for spam-free inboxes.

The year is 2003, the place: Bayesian 5.

--
I've read Grocklaw. BoycottNovell, you're no Grocklaw

Slashdot Mirror

Comparison of Bayesian POP3 Spam Filters

95 of 326 comments (clear)