Spamassassin Beats CRM-114 In Anti-Spam Shootout

← Back to Stories (view on slashdot.org)

Spamassassin Beats CRM-114 In Anti-Spam Shootout

Posted by timothy on Tuesday June 22, 2004 @03:24PM from the hawaii-alaska-and-utah dept.

Simon Lyall writes "A new study of antispam software shows that Spamassassin performed well in various configurations along with Spamprobe , Bogofilter and Spambayes also came out good while CRM-114 failed to live up to its previous claims . The study shows: 'The best-performing filters reduced the volume of incoming spam from about 150 messages per day to about 2 messages per day.'"

2 of 330 comments (clear)

Min score:

Reason:

Sort:

Correct link to CRM-114 by athakur999 · 2004-06-22 15:27 · Score: 5, Informative

CRM-114

The link in the article points to SpamBayes again.

--
"People that quote themselves in their signatures bother me" - athakur999
No HTML, Just ps or pdf, conclusions inside by randyest · 2004-06-22 15:34 · Score: 5, Informative

And a long document it is (funny placeholder images though.) Here's the conclusions for the impatient but interested in a little more than the summary:

Supervised spam filters are effective tools for attenuating spam. The best-performing filters reduced the volume of incoming spam from about 150 messages per day to about 2 messages per day. The corresponding risk of mail loss, while minimal, is difficult to quantify. The best-performing filters misclassified a handful of spam messages early in the test suite; none within the second half (25,000 messages). A larger study will be necessary to distinguish the asymptotic probability of ham misclassification from zero.

Most misclassified ham messages are advertising, news digests, mailing list messages, or the results of electronic transactions. From this observation, and the fact that such messages represent a small fraction of incoming mail, we may conclude that the filters find them more difficult to classify. On the other hand, the small number of misclassifications suggests that the filter rapidly learns the characteristics of each advertiser, news service, mailing list, or on-line service from which the recipient wishes to receive messages. We might also conjecture that these misclassifications are more likely to occur soon after subscribing to the particular service (or soon after starting to use the filter), a time at which the user would be more likely to notice, should the message go astray, and retrieve it from the spam file. In contrast, the best filters misclassified no personal messages, and no delivery error messages, which comprise the largest and most critical fraction of ham.

A supervised filter contributes significantly to the effectiveness of Spamassassin's static component, as measured by both ham and spam misclassification probabilities. Two unsupervised configurations also improved the static component, but by a smaller margin. The supervised filter alone performed better than than the static rules alone, but not as well as the combination of the two.

The choice of threshold parameters dominates the observed differences in performance among the four filters implementing methods derived from Graham's and Robinson's proposals. Each shows a different tradeoff between ham accuracy and spam accuracy. ROC analysis shows that the differences not accountable to threshold setting, if any, are small and observable only when the ham misclassification probability is low (i.e. hm
CRM-114 and DSPAM exhibit substantially inferior performance to the other filters, regardless of threshold setting. Both exhibit substantial learning throughout the email stream, leading us to conjecture that their performance might asymptotically approach that of the other filters. From a practical standpoint, this learning rate would be too slow for personal email filtering as it would take several years at the observed rate to achieve the same misclassification rates as the other systems. Both these systems were designed to be used in a train on error configuration, and do not self-train. This configuration could account for a slow learning rate as each system avails itself of the information in only about 1,000 of the 50,000 test messages. In an effort to ensure that we had not misinterpreted the installation instructions, we ran CRM-114 in a train-on-everything configuration and, as predicted by the author, the result was substantially worse.

Spam filter designers should incorporate interfaces making them amenable for testing and deployment in the supervised configuration (figure 4). We propose the three interface functions used in algorithm 1 - filterinit, filtereval, and filtertrain - as a standardized interface. Systems that self-train should provide an option to self-train on everything (subject to correction via filtertrain) as in algorithm 2.

Ham and spam misclassification proportions should be reported separately. Accuracy, weighted accuracy, and precision should be avoided as primary evaluation measures as th

--
everything in moderation