More on Bayesian Spam Filtering

← Back to Stories (view on slashdot.org)

More on Bayesian Spam Filtering

Posted by michael on Tuesday September 17, 2002 @08:26AM from the snake-eyes dept.

michaeld writes "The "Bayesian" techniques for spam filtering recently publicized in Paul Graham's essay A Plan for Spam doesn't actually seem to have anything Bayesian about it, according to Gary Robinson (an expert on collaborative filtering). It is based on a non-Bayesian probabilistic approach. It works well enough, because it is frequently the case that technology doesn't have to be 100% perfect in order to do something that really needs to be done. The problem interested Robinson, and he posted his thoughts about trying to fix the problems in the Graham approach, including adding an actual Bayesian element to the calculations."

1 of 251 comments (clear)

Min score:

Reason:

Sort:

filtering not the answer - maybe this is by frovingslosh · 2002-09-17 08:50 · Score: 5, Insightful

Sadly, unless you are an ISP or other mail service provider, filtering does nothing. The spammers work in volume. They count on hitting everyone to reach that .1% that will respond. That response is what they are after and what they get paid for. You likely know better than to ever deal with anyone who spams you or to ever respond to their spam. Filtering your own e-mail has absolutely no effect on the spammer, you were not going to respond anyway. By the time you filter they have already wasted your bandidth, and perhaps mailbox capacity and even forwarding limits from a forwarding service. Your filtering is useless, puny human!
Here is a suggestion for something that might make an impact on spammers: IF I open my firewall, I see several attempts a day from people trying to get into my mail server. Of course, I don't have a mail server, but spammers are always looking for open relay points they can spam from. My suggestion: Give the a nice open relay server they can send mail to. Of course, you don't want to piss off your service provider by sending spam, and your upstream speed might limit you to less than you can receive, so rather than run a full mail server lets modify some mail server code to just accept mail and send it to the bit bucket. Maybe we can even misconfigure existing code to do this with no programming changes.
No valid user will be affected, assuming you don't otherwise run a mail server. All that bandwidth you pay for can be used to receive e-mail from spammers before it ever goes out. Eventually their customers will see the response go from .1% to 0% and their business will dry up. This will impact spammers, blocking your own spam after it's been delivered will not.
This need not even impact your own bandwidth. You can run the server when you are done using your system (Might make a nice screen saver - a black screen that just shows how many spammed addresses were prevented from getting spammed). Or you cam impose limits on bandwidth at a firewall or router, or even restrict hours of access.
If we set up enough different false open relay servers I think we could have a real impact on the spammers.

--
I'm an American. I love this country and the freedoms that we used to have.