MIT Spam Conference Conclusions
RT Alec writes "The 2003 Spam Conference has concluded, reports InfoWorld. (related read: abstracts of the conference discussions). I was unable to attend the conference, but it appears all that was discussed was filters (client and server). I think the key problem is ISPs that do not block egress traffic on port 25. If you need to send mail through a different SMTP server than provided by your ISP, the admin of that server ought to provide you with a means of using it with authentication on a port other than 25 (you do have permission to use that SMTP server, don't you?). It is not too tough to set up an SMTP server to require authentication, or at a minimum to run off a different port. I am suprised that this is never mentioned as a cure for spam. If just AOL blocked port 25, this could reduce spam by 50% (I base this figure on close examination of the headers of the spam I receive). I was pleased to see that Barry Shein, president of The World (a Boston based ISP) was included in the talks. I am not sure by the abstract (see link above) posted if he mentioned blocking port 25. In a recent interview he did not mention it."
"We conclude that spam sucks."
;-D
Tax money well-spent
Reply or e-mail; don't vaguely moderate. Ex-O'Reilly/MIT employee, now a full-time Google employee.
It's now common knowledge in most academic circles that one can customize their email client to block spam via the utilization of a standard Bayesian filtering mechanism that keeps a document corpus of messages that have been marked as spam by the recipient of the emails. Any further emails received are then fed through the Bayesian filtering subroutine and marked as spam if they're tested as such.
.96. If you based the probabilities on word pairs, you'd end up with "special offers" and "valuable offers" having probabilities of .99 and, say, "approach offers" (as in "this approach offers") having a probability of .1 or less."
As Paul Graham writes, "A few simple rules will take a big bite out of your incoming spam. Merely looking for the word "click" will catch 79.7% of the emails in my spam corpus, with only 1.2% false positives.
One idea that I haven't tried yet is to filter based on word pairs, or even triples, rather than individual words. This should yield a much sharper estimate of the probability. For example, in my current database, the word "offers" has a probability of
Reply or e-mail; don't vaguely moderate. Ex-O'Reilly/MIT employee, now a full-time Google employee.
... is that you can get a message from anywhere without any real challenges or permissions involved. I honestly think that work needs to be done to replace email on both the client side and the delivery/protocol side. I'd go into detail about how that'd work, but I really wouldn't be suggesting anything new. I just want email to be more like instant messaging. "You want to message me? Well, first I have to authorize you..."
Fortunately, it's not a burning issue with me. The people I really want to hear from are all on IM. Anybody outside of that has filters that expressly let them through.