Bayesian Filter Testing?
pu33y asks: "Since the publication of Paul Graham's A Plan For Spam, several programs that perform Bayesian filtering having become available, including CRM114 and Bogofilter. But missing is any serious testing to see how they perform in relation to themselves and to other, non-Bayesian filters.Searching Google has turned up nothing and when I asked Paul Graham, he was unaware of any such testing, as well. Can anyone point to any such testing or provide the results of their own personal experiences with Bayesian filters?"
Dspam (http://www.networkdweebs.com) rocks !
Some impressive stats were posted to the mailing list.
It's main feature is that it's completely maintainance free, and that even dumb people can use it (I know, I am).
My personnal stats are 2 false positives actually (one from PayPal, one from a company I work with), 280 spams learnt (I told it they were spam), 2877 spam catched and 4354 innocent.
Votez ecolo : Chiez dans l'urne !
Oh yeah, just check my mailbox!
Here is some test data to help you with testing Bayesian filters!!
d 2a f28fb1e8f36cd1958e9abf699fecae8cb012956d677257fa14 eb3c53094d0323749eeeca10914470c7bdc758417aa5c47a1e 63e62982b3
CRaPFLooD
8f324d539761a3b21c773e02ea006d45037197600e086ba
Here is some test data to help you with testing Bayesian filters!!
3 54 06c9eda3f85d168b8369fc782a66a1118bd9eda89ebc075ced c21c731cd7a75ed1cd89be0eff9794bdf49dcb9b7357c377c7 310c411343fcba6537741bf200fd2d7e965afd9770368851f0 daad52f957f7d12038c84bd67e3ee3406563d7f446f46ed6bb 6fdaafc6ffbb81bb68aa19d2cc2ba1505fbf98dacf6b08063f 9531c0b377035bd91513d42df2f4a5087b20ba9e2ebc6c4f4f 0547dc2cafaf4529df4aa475337b4d78f0
CRaPFLooD
62cd381bcb4bf2191c555395945e4c0945278671ce2f009
Here is some test data to help you with testing Bayesian filters!!
3 d3 aa044dbfc581ee6ef37c86758f3f9a4b4a3d40582aaa0ce910 e10e4c251fcabcc824aab902ea8ab1f6c0ad82dc2859e28e9f 78dd886f069fc7607b06375287c8731c0c2f81504969f0aac9 618d7eda87281f7c3289ad4fa980f899a57b8ba9ce6bff1c9f ce94561d537f2661f95a10070a86b6174fd2e78276f2f938b6 6c9734d50065cb61f1b4
CRaPFLooD
38b86da11311413fd0c09044b27de8518a0a18ee792712a
Here is some test data to help you with testing Bayesian filters!!
b 68 c4a4945a27daf252eeaf9f1eb952e476fe29e3d9eae99207c4 2ce3e98e0db4ff14b35b32d870a8180933be379609eaf8430b 56c790acdaf7e7ab490f2e8981228ca95a758349ea01fd22ff df9a9b247cbd1227c897ef48051b924caaec65378fb0ef6356 723e13
CRaPFLooD
cd0cf300399ab394be7892d2048f7898b82b1c2afe93439
Here is some test data to help you with testing Bayesian filters!!
CRaPFLooD
a87142c1423d1e2ce56083a17bf35c43
Here is some test data to help you with testing Bayesian filters!!
e a7 abfbbdf742976c905bdee543ec838ffc6dbfb1803dbed1a399 aa1e674b7715f06319e249838f8667aa773382d70db394d69c 0d6941b8859dd39546e77951471a411d0b3fdedec880e4e458 131f4edc49cacad3ec586437fca770bbc016b9f553eddcbad5 69617da0b2cde67461f77864b00cbdadf38c1ecec0883fe51a 728bd2f1f1e9f23a4fbbfea76b7ab660fd8591d5b3c6002add 88267111d8b135cea83500667187e08a3e
CRaPFLooD
4b111639a261f2e73142ca964dcf24e1cb8750613b57d53
Here is some test data to help you with testing Bayesian filters!!
f 02 bd794d43e3c3780ace141f8779c77b60cdc66fa22da9009040 276ea7561f9eb34bc0d3c32f47df699d9ce40d4528b2805aab 05409614d66a51a5ddd3a3e143a1e186948d00b602f7ca5c71 71a93dcf62615d0e70577f431951083245a70e3d45a2bf8c7d bbf9c28464c6c8cfd2a9a230f1fd31fe7d162f7ec6e2c098e5 b8cc247b3d953d5faca258a3708496bfdab79bce64d140e30d 0b14f5c9b660813801b453b5903af18ca75596542fbe4b92a8 4e5ff41a01641eee19d26f1f725a19c452b3fb55f3484a0c
CRaPFLooD
1123c78a9021be20cbe229afba3c82bc5fc78eaa477c004
Here is some test data to help you with testing Bayesian filters!!
6 27 f12ed1f3d3d24e66de065870e6f6d0124d7145abd6dffbec1a 5e2c550fa64d96cffa773ecc21e838e609e1b3f2028b6f96a1 312ee7cec12740fdfa07d35fd4ad3577d135b08ee829e055ad 65e83cc5d56f5481d543b67d229f6715312aa818e40a6b8475 f0d2809ead219c61072cd000fa5fcec473e31463e8e9536fbb f4d06828cb7958a93d58fde23969ce435f4c01abd70ef3bd84 59ddba0b358a6d15b8ca0bdb44597323b9430ee241750273d3 ebc8fff752745dea
CRaPFLooD
e26dbb5b1843bf566ea7ec757f3325c44540c5d4a85c10a
Here is some test data to help you with testing Bayesian filters!!
7 2c 32f77cd33fc89f6207e48bc2612a0b21c093b4a46a3be8828e b2792daff9dd600425baeb9f421a9dcc7f7c341f77a6d814e6 f1e0fc6df3b7622760d6df11380e55572cb1d3f0f2b1043fc9 fa760359c5f79a4efcb181fa
CRaPFLooD
8b8369fc782a66a1118bd9eda89ebc073a2815d4d3dee03
Here is some test data to help you with testing Bayesian filters!!
3 73 c2c7acdd53f0c28ec959b57278128ac65607f07812f0fd2fa8 fadcc0c3506875c09e7e7da288354d65e08bffbef39d2422fc f6274c5570dca6444f543eee06f522edce4eec60ea857fb8d4 be011c80adabd3672214301a5608e05fc082b3c54769d871f3 40d9da
CRaPFLooD
d4e3e8180a65648886ff348c7a6bbff5197fabf83fac149
Here is some test data to help you with testing Bayesian filters!!
b f8 f0eebddf36b27ddf515135e9486ce557118e00bc99d7a3ee53 8c1cd9553dfe0889aad671eb7844
CRaPFLooD
f9a58204a09ad0d3ba91ab2b05906723d4f0886cfb50e0f
Here is some test data to help you with testing Bayesian filters!!
7 0a cb607ad4ab4be0f7f9aec108283fc34065ced39c30152820f9 389d9d20f60602788f9d1238b253f6bebf6733d9f6e5919871 87528d9c4dec7ed18db658bd25cf2447ee679fc699814780ed bfedcd6356e9be7786960e64aa2fe823af2ab188fab7f59af7 dab45f8214761f5c0dc1e9fc3116895af4af3754e1667a6826 19437f005874858da869
CRaPFLooD
bf959c64f2ebaa7aa642e0b5280bbdf656e35bb7fdbbc65
Here is some test data to help you with testing Bayesian filters!!
2 8c 3cf51bb0df648a94e43a864f72a2ca530193ce2fc3e64f319e 33a217f7368ff7ceef7731ccf024cb6c79472566e769552b29 d91babc932e54d0cd92dcee22f6f9f2e7ee93497b1f755dd97 5394a5ff693434070f8f8376e8981c14779066b2cc99cfea90 7b6abc9613a72a5431c771b0a3664ee751fe8453a65197918e 28a61e45b2e5b4e100fc
CRaPFLooD
59e7b9a9e9561e7ceb84b64270baac4cd0d486f06c0b0cc
Here is some test data to help you with testing Bayesian filters!!
2 57 50ba4a6d3dd3bbd15d53e770af5b0e8d58fd6a5bf44a36d748 adca7e034f5203e7cc99deb1e4586f1ae1357113c65a43d478 29739fe4be12524d9b3d14f17d77162784fabdad5b3bd52329 7fa0997303e26d915aed12c9ad9582d61e0e515df4ee5f9f1b ece798afa5d8d1cff2526e9c9bd71ef3b4c7cc8efa2947340e bba3304c9a0374a037e71b9f12a814847b47a21871a32ac434 9d68230fb510baa246a67bf901c7f895eaeea86d31fb5dda6d 18f0e716591310c475386da74404e906cdb7acbf5e3ef18b90 db00f685c2967774fb398fb77c029f56ac6061c5646a646d98 3c9314dad956
CRaPFLooD
7ed334513e65f01deefba9e87e53ea94019c4f590ec38a2
Here is some test data to help you with testing Bayesian filters!!
8 65 a377910f2f69502487476ffe91055720cb2439010cb265fa2a 0afd38333b878620a411041df83b2ec6c9b2b8bd4007bf7959 141ef58651d692938bafade3b6eea0fdbde76d79ffba10b45d 4075c1ba3beb58fc7032494d8b2b3227d4041565691957bba0 30b4c1
CRaPFLooD
a45e648e2cc4f7c64ffa18955e3a1363fe445f4352bbd53
Here is some test data to help you with testing Bayesian filters!!
d 11 7ce1499a1883b61c555091db0a110c5f994d8f1866571e2685 2cccdc37776d4a2ad18a0a70534de359c58cdc365402ce3ecb 486f4a8c3b1962233978ffd7b03d3ba3fe154fd0b436ef936d f8683c83cce271d07ec64a3561038c999601fc7ccb2ac798b5 eb268909095ce797d0997799028bb95d9bace108376f0ede70 2484d7e7de0af1be070ae8f7da8e2d75253e4003cbc4c24286 ac5b356ba85608c7d29ee64f47abd80daf
CRaPFLooD
b4f171b072ac194fee083eec81c35661e64a2afffe539db
Here is some test data to help you with testing Bayesian filters!!
2 d8 4cb754ddcac64faf7244bb99debb4a1152fa49a993a05cc6f0 5bf764c61aef841f7eee47bda100d99f37116c32298b4a726d 5a2134e1e7
CRaPFLooD
98710a10740ffcf09fc8a9369007b56df47a07182433c04
Here is some test data to help you with testing Bayesian filters!!
1 d7 95202d88d4be9f2e850fe8e54cb4e9dfa92804fdac6f0cb9a0 8f4b3a92546eb6bccfb4d90eabb9716fd5eae768d31abd8ac3 a14e9716f5e67a27ba8bb6ba92dc274342c874d373d4a7914b ff40acc6f2d3d5850cacae284aa235219753273941bab0a2b1 b3adff
CRaPFLooD
5974e6a5851b406db980816e2a46b42e4dde03babe16fa4
Here is some test data to help you with testing Bayesian filters!!
3 0a 0c24710a23e937332da72e6fc156f0340836378416136c0142 3c252b8b206ec86fdb8fe9207f64582101a0a43e7f8eeca337 080a1dddefe5ded0be7bbaf0e297caa2d09642c6612932a8c6 abea2b826d1a25924de36edd14b14198b6e26157b7eba06b39 0ab7630c520c2013c56f90d1b2970a0af97a05a82ed528e251 1d3232d2c92e274ffe318306c1664f305f0cc633fba797adc4 04a9351d889897deeda87c75ee538b4f6345985c46573e9ba4 d6e933c1882303bcf2f4a570a20bd14d9d568f96e7af432a9c fb1408152933f6fd6361560194325eedb24109f64fd8d18c12 f6cfe0fe8868
CRaPFLooD
957fe82f1784483968a1af17af7dc0db41a3d2506bf56db
Here is some test data to help you with testing Bayesian filters!!
a e6 928ffd73c133638d77f1c5fd595f3c38ed6c137e6744678cf4 ecb5db3bd52916325cf38feb7c01b1638781c398cd63b01170 94c6fb44a9788719cc45ba3937014fad2961c6d27b9085f5ef 67f2f7f0f38e869ffb5016a192d1e1eb1cd6f9fba3227870bb 6d7f07a38ed22e6819d56ec7d09640097f1d7a647c30902dea aa927b1cdf13e4cb3dd591ab56ac0b05fccf4b89218a228868 a30ca86f4675b000cbd75d1a06012da46772b7d885d0da927f be2636f5ef2d77bbbc80062ea0785aa09feff108dea55ed2e2 9dd4f67cb75f88e798af418aaa16e1327c6762ff523b24144f c1b06b9b84c3f338012f9f1b6f6b2cfbb8f9179c8744da77dc 011d963f60ef044f7af65c4f186c0961ddbcd6dabdb0d70dd5 f18c26a75d2903364418c3c098036f50ec56b0e9fde9a248d4 3d903de47f1878b2004da417f7e7fcee1365802f541af4470e c23e75829acf5370d2a1ebd6949e8063cfb4
CRaPFLooD
496c5ec9635f46692fdbf58f3a0837f0a283c2fa682d70a
Here is some test data to help you with testing Bayesian filters!!
1 4f 0fe3d2ceb4351e02f6364bf207d6837974f799821e8b694c62 38c47d5bf280ec83929a7b7bc5cd5478c9cd79f4e04b6a2f6e 2b4ab9f5130c3e626d1a287cdc48c77515c8dcc243b761c023 7639f01545ae49fd0ddbb1c2b5209fa882c43a2d47dc6f0243 e9d64fd55f42621864ab6071c77e68f1e7005378d3e5766f35 5ed5b69199a2df00382bd1aad48723acc40982b008f41d334d e0db3d20709cb13931e9c88a699e83fd1f99f0a0e0c711d6f3 0c2a2bea5081732b33f009eb844c3e5ced55194fd9081891
CRaPFLooD
83c12dcd1b325480f9020a593e8857c3fd23e2804a17fd5
Here is some test data to help you with testing Bayesian filters!!
1 bf 571ff6ca7a9edbf46ba1f102a886ac432e7722d33cd1a442af dcd383ddd4cbf623792bafdbd6ab3fe447909922fb9cddfd2c 46b3a90244dd34c8f1d06e553a0c847a31f19230eaa10a8682 253b9363a913fadc30179836eb67b990e0f77ca7ce23542656 bc886bd3a8a26fb7057c1fe2ac31df1d7f84daa403e6b64c89 177441154268189e1c7ba6e79d4197c9f9927b4102dfcc472d cbcef2801266281a8ac407499a2cc9f07337007545a7abc397 017002fc9a713ea2f3ea4b18da654168884f0ce1a1ef075bee d0a51f1373a39858cc2249e28ac96ec4729cad54d5669e0ca2 5c9064b41ca9694a6dc002be509353ef2c5dd874dfab40f550 c59a645a48fe83bddd0111cd29eba4b26e0f2a4c03268104a4 28c48805cfe9d26390606405f1a2d2095a8fea96bca3531762 af8a993c4f60c48fd5e33bce83883b6dd17cbf1237389c748d ce1864295252d80f840bf25dee79eb141ee7127dd14e853557 0664af54cccb4d55c831baf5edc57d76fd3005222389ca498f 4dfb65967ce5a92058a3fe929977eb0c804506a8b44444c448 3864113b922d428f72f7d9931feb4ecedbb2c0722575eb
CRaPFLooD
49283c94f5e8bd4ba247a4b989a423db6095b12ecdfcd30
Here is some test data to help you with testing Bayesian filters!!
5 ff 56ea584dda833ef633469121cb1895ba8023439e8197dff7ca 9ee221d07eb7203bfbc4f243bf14
CRaPFLooD
63e5eadb2f7c16febd818b28fc9760e8518046ddfdfff11
Here is some test data to help you with testing Bayesian filters!!
1 93 35c6083387c559c406a27b1e2afbc53175bf0ed33452b9b496 f7e66ad735c3bb11ebe44f38188da3c7d50f299a0f9336ff84 703bc49d51ea422b287d83130dca638b4ced480302
CRaPFLooD
741827c5530ba062a192fc83b86bbec8a96604dd2ee67e1
Here is some test data to help you with testing Bayesian filters!!
0 05 21cb170e9fcf064d04ff8c215cf3678bb0b3aaf7ee49399adf 3fead0348d61f962f6b9e0d8644b
CRaPFLooD
c46bd85211b2d36250d237280f5073e9e216f65e1f66009
Here is some test data to help you with testing Bayesian filters!!
3 c5 cbd2aa9428ef5c
CRaPFLooD
ad67e3f666e9bb2b0a7b77ee293ca64e127eeffce310580
Here is some test data to help you with testing Bayesian filters!!
d d6 cc52c26495e2072806c23dbff477d1a3911b012d7db02a536c 07a4cf964114e715afb4d02e23470478c122cdd8ef5f8ac1d8 6f94ccf23248c886498742f74515bd3e06a858a9fc5a8b7dbd 97e9934d1254f771fbcf7715beb9056c51013832498f9c556e 7004fa5f9ce39aec46f3e8e8aebbc722d8ceeb1747d1fa8599 d2bd102ce1e8b8ee239968d5b0923a8bb1387adfb044e105d5 12
CRaPFLooD
e3998932e2e851de8a24b733628c29def5122fce80d998a
Here is some test data to help you with testing Bayesian filters!!
f 2d 29322ea14aeffbbf3b5249e6b175f68314ebf929555fa4d629 25469eb0de7e25e9e503174f0ded1f3e9145ab192941f32098 750221c60258bf703fc3de07019c04902398cd94c1ca157d1a 1382fd3eb266129b96bdba7489f569993f4af498b320aaaf4e b610c3170bd74f5de4094d67a13c5b18c90822334736a47dc0 3950c42c0caa51048df3988ef4f8ac227c7770725d09035054 47247ae12451e8fc630db86b7a2a2495e5
CRaPFLooD
0daf2855c770bf6e9d55b2153ed179d8bea65a5d4303b90
Here is some test data to help you with testing Bayesian filters!!
2 4e 38b27ee7cba1f2789980830f6960e332045653a8fe1a1bb5aa 9ff8b82c47815a21a48bba4649b927d332089c22e8f8578ec6 2a8e31dd2d7a48a717d940c46b6f33e3e4a45c649e18e8fd13 b330e44b53cbbfd4fd6ea87d804ce8b3cf665b202575a24b9f 0257709bf2f59888976ef90dc2bd083d0a4cb5
CRaPFLooD
30b9da0acd32a31d856d3c3dc0be7bbc708985cc1d69771
Here is some test data to help you with testing Bayesian filters!!
8 3c acf1036d8421770adf03425855371172c2c007782d56cccb5f 437833b580d047e697e150c45c570e8c5c04107d38331fdd92 1015823de66bfb99cf1f7265cf0b0227bb48fd88377fc7b797 9ce9c02bb7a36e550072605377a2b7573a53becb7ba74e3b16 6bc2bf8ea40402d86c45e1763290987873772c7a4052ec54cd f2586645806704ebdf9c9acb094ef96b6d90bce1308c068bb5 a8d4b2a9fc5d7436482fab7efacfd3c096e485b3fcc9fe53fb 1ce882e40eaef046c6eac1ead1a5946e78fb19701ff40acdd0 e3dc7e1ec208888973ef994b418797d200029c1c2e679c9b43 4db0a79fdb60c6b586ed113de29a3f247835c5c025bc6af902 63cfef090d63ca7af8509b775d31e0c59f84dab4f1eb04ab2c 47f8f0d2db782d22265fd45c6af25f3bb5fc6a1a
CRaPFLooD
dbef360d784255cd05a5079f002c3cd9b6f83b060bfab7d
Here is some test data to help you with testing Bayesian filters!!
2 58 5575742d6c29da03f27843d915554916cc80323ce5f7879383 9b24d0e47e31df5ebffee62346980dacff7f88d2efe2d61903 27ca13d28364a45edd9346a078f0d17405a9028424a2d9a2c3 a47bef62661458ba7b014771f74b6a56785817c3ea2562ae7f da9c56af9c36fe9a3205c8b7a2011402679887eb7bb938ff69 0a3db93b8865bdca391a7ea7fde02e53834cf198b5ff640e0d 18dc754039e4ce819027917a58ab573643701da6c10260ece3 731c5e4fdc7f63e10dc8732a8ba6b3b578c1e891a9eb6aa3
CRaPFLooD
4de7729ea5daf28540ee79b3dca73d19768a02c350549c3
Here is some test data to help you with testing Bayesian filters!!
8 11 893ab226018e71542c851650c2c1962d8c852785477ae4eaa4 2cbadae668c5640b6651ef54dbd1e84b99119816b4386c3734 565391ac3d1e0150ca25987676d127c0563c500aded60c24a3 d320c44bcd724270bc61f703c5a73f074ec0f725cd2d51335d a5ec770d45076d04903805bb75ebc5d9f35911ba66841940a5 fb383b6258d37ed444dcdf05dec7f743ab807bfd4a47135aa6 1f
CRaPFLooD
4e4dfebee38dd25062b6888505bcca50d2a6adc8a1cf68b
Here is some test data to help you with testing Bayesian filters!!
a ef a44b7ea18d05921097cb495b539caa7795b1ed8a22f07a155c 020bc509d903e16329782455bcfd6d134e7a7a2c904f11f901 4f1da0ef9f0ef4b671e0a6d33fb9aa1bdfed298f8ed22ca026 06199145252c5684fa306eb7daf33b816cab0eff806965a308 f32db6bc55ea9e207158cae19485c14e573efd56b65e3fa7ff 454b04af61f18072680b42d586cc59200baf35e620380b52c7 8a3f50f91e24ebec68162067a237486ddccb00790b3d06ea71 d8a2ddd76c73494b9c51041127bc0f99dd1c98e72dd11d0166 6f966c0387c8eaecc88d4e3ee81ad07318634fbf3172a58af7 c638ddc761842e9d8f303698538ad27608e35036d17dfe1d29 cf02c128bc7f714c4fab1fe7dd81d7c3d7086dd2365d06792a 2f732ba660840c815f8c0cebf6834f59598537b32b3c3b0699 a961dabc0e33f47f9d835ab6e1cd6fb82a392e6062ef9cf58e 5aee
CRaPFLooD
a4e3073e15bd50e07f194ef7c0d28b01dbb6289c55dcd28
Here is some test data to help you with testing Bayesian filters!!
8 34 181a6ca58b91c8b9aebf4d8e8ab2367d5d83fc74c4944f91d2 f164088bdfcd1e72893f5dc79ad966bbaf00db5e8dd5bc97b3 642ec84aa3ab69e81104955a94f5a064594621067ed238f0c1 fbf6fccb32a3f8efcf03e3b87a9693015c73ae1d46ea1e8fd0 c7a1fbdf2a3e9e504593d5b2dc0c84d1c0cf0207eb14f73500 0a19bcd2f4db35946a7ef4313656efe1a999cc68d8320d3baf 562f61bc8be89fcc808b4679e1d5de7c605fcf718a074921cc dfdb7826e689b4d9644d47ad4e86f0e50d1904691958f7432d 9e9a27caab97a9899ba3076caa5f49
CRaPFLooD
9e4775f046c2105d8e7d6655287a2f34a5ff1a4d4bc44f3
Here is some test data to help you with testing Bayesian filters!!
c 3a fb907865fad651be5b7dd09833f473d70afbf4c2f8642b02e5 b1b0818cc757f872fb69516016aafad00f79eb4f43486f8875 cd908cb8c343e3f815cf7fe4dc813ba16eef04e1da30d633f4 4caa12247121ebcbaa63b6eee9a0c2712fc916cfe783cf1a05 8b6823d1a984d72ffb335d42bdfdfcc3b643a5a3dfb5ca009d 062ccff6d3386d2bcd57eecd4fe60bb446cb4d122bf2f14191 f8
CRaPFLooD
7f952f62bb69736f71ff3f04a99c07090bdfd4449d186fe
Here is some test data to help you with testing Bayesian filters!!
3 1b 443bfb76ef40244aacdfc0ded0f8de24c916f470f2a00a90cc ff101868bc666420b5a026945d3a2a7296e2d82d191cd84793 95b5500f66f1ce94556e4edc2c15a463ca093a05bca29c222d c5b93356a5372920962bdec03385fd0e2a21e21835b3effdc2 ec86cf0ea50e88822a30ab6cd93ba4a082d614a5e2199985ea ac52738618e33cb751cd8f0a57cca51a04da9ceec67fed0bc3 36e9eb82716899ac9fd8d3b6a7382120960e495e80390ae647 7a619c5067c8b16fb240984b4cc0f84615308bf708df65160c 036d41b0ecc6fe6278c05ac1c64f1b03e8f197caad139ae9fc c58677c416d588d8730623c151b23fe6db66617469f1c22f80 2f4e2192468fbc9485e31d062f9dc70107a880137314cb5a86 ed80dce8dbb98c5cb7d0c5942a4a7e47c28bf5fb
CRaPFLooD
10489e28509965bdbb4234b23d3dbea4dafd5c00f7059f8
Here is some test data to help you with testing Bayesian filters!!
8 1a a94c526f734b2e34b9df8d0ab73f851c2950554db628f7f474 573a89a8f1da580cbfd9b0fecd3384d4d52d3a751e6f8a4c4a ea8d7e48c0f3710133145ee981b7d040ba8c664d7bab99bdd3 24780a346267d29dc02230d2f623335c7b158cd2451c10e2de ca8a84a6ba303643aebd67909eb3be9dc1a549
CRaPFLooD
3ca5a9d2f776398791637ceeadd3601f76d11e442601913
Here is some test data to help you with testing Bayesian filters!!
6 a1 d6661b2d1f9bde4e6e15ac583426150b75fc9b2951c7b1be92 5cdb6f03c1b58212df4c2831aedd1629b0b88d5cad4c39aacb 76561bd1f4eb352bcf0192b4f307d42f403320825676748ef3 819161b74dd810882617d696c76aee81096b61d20e04c6fcde 1da3bbea7a98942b8033462a7c54397555e842
CRaPFLooD
7fccaf0a51b9cbbcbcca2a4e5cefe7c81d9f2d61aaea68a
Here is some test data to help you with testing Bayesian filters!!
1 f3 e31cac0735c92983009c6fc799bde69b14954283158163c966 7d88b43ec40d0f88ee5123e8362d24111b1ff9dd00921f4219 f28c2ea2d153cd3f9da90f47412176a5262e8bb0e581756e3c 3b5dc66b5f57dc1c149a2160b1bc88a6e84e94eb2195e54a45 5eea03312f7d448c5d7890d6981f12441bc249f5baaf0afe41 9681731ec3d30dafd9544d243d5e19446e87ca7d4bf4fc9e8d 295c6806e53b62feea30d62e7e727a5b356b4b866a5bb1edd9 fa8851c5a1be48dd
CRaPFLooD
3536acd5505961bd4729ef891b11e48a5be786c6b78d9bf
Here is some test data to help you with testing Bayesian filters!!
f 41 6b3fd7ff4913491a03292de420060cb8466afadcb215d1c271 1355c663caed162c2b57afcd6957d2955742de13d6802c4fec bd222abecd13e388b2c8eb8ce5efd1ac19ecc785138425bc94 a44e3d1bb3c8c026b2702c00d71db147b2b8d0d6f1756f62f4 5cd4af
CRaPFLooD
12c94ada5a8732fb3fb486c71eebf7f1eb53abba5d9c047
Here is some test data to help you with testing Bayesian filters!!
d 6e b8964999ba5e3c3741b1f2d6683c4241e972810b25d59955b9 d07f95df2d8a391673726bf4ef3d640f83ae6bb1f036f45e45 3ceba79db9073ddf1464a739c88592e9b2e82b0282a823ead0 8388fe5f9d24e842e2e0b319dbb5cf0768cb6a85d7de098809 a1a90783d105393b3a4b74600680deca304e975899ced5a563 5c96e464698715a7049919f2eaea82caf2f1ad75be78c2f08d b4
CRaPFLooD
bc6742d09b1f13d47d72d38c2b7efb1a7fd55078dabca05
Here is some test data to help you with testing Bayesian filters!!
c 1e 7a37aa64d3fba4daa172021eb0b28d756615925486f12964f9 51635b90805e592ee138ed09c9b10e0f9e664029e8912996d6 5c1cf09761ce6846556e8c145e84416120fd93e50cb2b9e8eb 2b9bc46c4f57785472b96f4d
CRaPFLooD
6201a8c498ddef92d0ac8e9f0fc5b0c168e8e6860586596
Here is some test data to help you with testing Bayesian filters!!
a 94 78b6e58b783de9595675a771ced4a4e62f56480fc9f8931b29 1d5ca549452863d9ae59be20724802aecc1719dc308a7efac6 064861cf93fcc4103fcaf40d4720096ea4ed0fa2a74f9604e8 959af0faa3ba6316b5cd1b2380677a46c7411d17a8a88878c0 e5db331f405407c2244291aa276d8fbc6a5a8923270334cb68 c628783066181ece864b41c61b92d261807c632c4e454b16b2 b46f9112419e145c32d9ed4e95e3f59932
CRaPFLooD
e0b43d7adf04f77c81d846f9d8eee988f00b6379b82a515
Here is some test data to help you with testing Bayesian filters!!
a 8e 1ecd7cc3c2021cc1a920395a4785672c3ba83e363cf4168811 5bc763427b1dcfa06190a8eb1341950d4aea25553820c10d71 160da8094483507bf96216e11ddc2e0773263cc2b2a3171cc0 f610fdfdf460831fb25a3dc776748ef3819161b74dd8108826 17d6969cc7ab76cfb05575445a34ce0075c8d5136b80c5bea7 4028352d1d8218755d89203d2ce999aca98446253925f69cf1 8edcf702356e8c9c78c3fd3e87f0d8896f
CRaPFLooD
033c253e16760c3b1de8b9c807e10bf7167c2377ab62126
Here is some test data to help you with testing Bayesian filters!!
0 01 b094c2aa24146a890e5c57afcea4d5293f246b90bf689b41ac 91b95e01a50eca104fdfe315e1b08aa25d565a8891e68d1231 38a137622be9a1449858c471af13976b2746b7b83b1ef4c899 cd6f0d5cae3a2ea3a91adc1c2d15e029e5ab6f925e5cc64473 50ad628a76780c6f3cd22891a6e7926fb9ad1cbae65a46f2a0 1ecfb0f065ef8550be2cad7b25e2374b423547e6783add6d74 2801a59562d8e6963980a9b9c825b6f6d7892690ce6f290b95 b763398d21584f85c988286fe12719e8bb8c6e76c05e59630c c0573dc3c00da89c04e5a8259ef832bceacf148cc8a6e6b859 d8f64ddc9678533cfabd03d46e4e4e22349f6fdbb46e7fb896 19c7a1215bebdd9f53c27fe39e
CRaPFLooD
913ba89c8ec99c3d45e225791ce96d1d3a3d255db4902a0
Here is some test data to help you with testing Bayesian filters!!
8 b8 ed6d166f96e0c364546b5e7f66564695db8f6789b59e5ce7b0 55f65d16c6f4341d2b2fc636d1ce356a2b87720534b2fa275c b81fda2e73fcff8f512ada907170d94bb6cfd42105484c907e 3f8086af997314936318d233eb00ff8834aa86e458625ff169 2690b95069fad6e595d53999da1d1d8661d3a8af64b05ee108 fa0c8e4a93c17cd0f963776ba4195839c49aea8c43632921cf f7
CRaPFLooD
514e5f200a15c7c9f62336c74ffa6840853125f5451f07f
Here is some test data to help you with testing Bayesian filters!!
a cd 7b8ba5d03477a1c6d16d558ca9f98ad8e8fce9b6ded5776c0a a5ae1b6aadd1cfe5f1151b1a6c60ed968bf0d79db72c85a5d0 4c0d42b30d22e5bd72848fd6a2d1516a0cf00e69338fed0597 0c522556e135fbf1a4cefe9f86185bd3891e4ecdc0aaf841de 86b1a2
CRaPFLooD
c7c581b3bc5c2d2d8d1fe6e087f167805ffdc65e9d413e3
Here is some test data to help you with testing Bayesian filters!!
6 7b f31635211f19ab12171a44e157301d48dbc2584bbd4c21c8cc ff69b7f6a9597cdb6fbac0ac0e0ba9ab29b18796823dd9be52 a3f74c145966e06d5277b3189ba8dc689f3908946246d96516 81cc411ec0da74848ffb81977cb7550aeced4811d5807ecc6b 12da74
CRaPFLooD
7501788417c5c49436d7aba3c06958339edbcfae7bd1708
Here is some test data to help you with testing Bayesian filters!!
d b2 d3be8ca8dfa1fcc5fa0e23050c9d66b9fecfe55277c01c3b41 f13b014afedf7837484559960da62d01216e288ff2a1a0fd90 c4a4b6bd0c937108ddfe4028ff246316df19177c1c63a5d849 95764582cb7a1cda3d55db27d53f9da4b5e7032f8268abe35d b6baa92661d3ecfd1458a72d642c635f4972cee4586a6840ee fef891679438fed09a9123b766e49fad4b4f985175948e10e2 012804d14b1b70880c48b4cd4882e23ee011f1c752c87d9b31 d7e943633908ed7fdd52f19461cacb8aebe3d4d0757c78fcb4 c311919a9d216efb935364c3407df026c1b3c9f794e5ba6f13 4dffcf7627be
CRaPFLooD
1a71837cc53131ee11e4a774982094d63d910ef9becd86e
Here is some test data to help you with testing Bayesian filters!!
d bc 52f5aef2199e8ae6f393115ac44b510e769cf04a586588ce77 5c687b554910bb2d0b0aad43ddba6887ca2d8d0700438c6181 09c196455ffdb12181439ffd84bd2b567ecef24c2729c60e8c ce9e070ccd32fcfda7a5c527c5f94ca1544a474c06c8406fd2 63d948
CRaPFLooD
804fce744c17d9250210436d9870949076ab87e10ad605c
Here is some test data to help you with testing Bayesian filters!!
2 17 0c2910307dd9393bdfd03512668577fa1d90ede340beda4299 9a62bd597f7f6fdc85c73e0daaf0dc8913dbc5afc5fe40f10a c6aabf6854c6214f02db0edfe59c186028c730e3d49bc93fe9 d842a7f269ed8aab2a63dbc415295ed1747af5db2094e4be69 f75026af3f3aefaf93ea8364b3e2cb27b9594c2ad53786265b 672d888710f5183a13c728c4f21aff693f7cd036ee2f40cdb9 5f7d21bbfa846801f35e39fdaf43e41f34693d963c5bc8ff46 b4351667b3c4a66391f0590046c609fbfa1061b55eb939ec8d c962351b83a23171a713e55a77365fa65b920e6ce1070509a4 76e3b48aa56fe13dd027be0f2152ce387ac0ea83d863
CRaPFLooD
bec6ae957f33a68725a02c625838a376abdbdef318a2932
Here is some test data to help you with testing Bayesian filters!!
b d7 903b72eb04da245732c659cce083fb6eceea71aec9f6d31f00 e37d4309d3b61b38186081e54e1444b4596c7a979aa7e4fc82 0c07fd78fec02c0334e9edbadf5d1d426557ee305611c3bc83 e4a0ecfb01176e866d97435f17f0ece5b62d48c8ee8604a177 3086f231f63cf7c0c29e77cd7009fbd45183ed3afeabd79933 1d719aae4dcb5a8085d96595f3201568e5db6f3cac0a73bff2 435c1917d0afc16d36b7b2471ae6a664adc25880f6f57be9bf 99f73c19e8e021108120c3da60c7eab350a03d0a30083aa452 00e134db3d4bafa200720c976027f1b91c0e214d2c012f44bf ab098fb99a1cc7a604e149249f92e1ac733f87f19d226bf4ab 3c1385947af550e61e8c70ff2c168253866cf4542ad1895c7d 2dd3a175
CRaPFLooD
c7fdc5f5c63fbdc92af9e5df3721ed6b368b0b60811931b
Here is some test data to help you with testing Bayesian filters!!
e 02 6f5b8924db466de53de2bed06ebcb1bbfaec7fb064f24b28fd 3872c62aad371e8b21061f1a35bd0ef0b10d3483d88e000de2 41559a1af6e913bfa37964c7073c81c32a6d750b995c32e3e4 4a68946c2a47629190bd9159d6ed7418057a4f121f9423f936 767a3e863b70241d6e9f5939d721e8975aaccef725da317ee4 cd0ffa7b389f3e7e084733681279c167c7fcd2997ed72cfb4c 4ac861d09250698d5319277fce78d55960c96b5ab3e42e45e1 de64114c04afdaf5d235338629ecad229687a3b849672e387d 5379406b90a3a427312f38fb9613f0e3db0dc2ac4b9393850f 8eb4f6281b9051aa0a41c57f3f62f43ff245a59ebe596ab530 9c61d84b1386faaf1eb27aff0c6e5b3badbdb52b39d7fcffbe a7cdc682bb7a9343965afb3a39953de36c18a15efd278a8f55 71d3db556bd83198beb09a
CRaPFLooD
1a2556609a024ce314f6c2c4afd261bc5e62fce928e0a96
Here is some test data to help you with testing Bayesian filters!!
d dc fe66efe033ae088b3bff7384cfaa945eaf210e4df4692fe514 d9ac0c614f467c5c8f901a44fb6b7cb8bef8be2221a6c4fc8f a63e058a5b
CRaPFLooD
e362eaec7ea0238ab69f14b9120cf928a5460fe6c23289f
Here is some test data to help you with testing Bayesian filters!!
5 40 f601dbd5e4b7e305f7136bcb0b2ca2dff08f65bf468a559e08 261c0d1ff04f85381fa86fce10879dd39546e77951471a411d 0b3fdedec8949e6667758e0ad6d5c27933d9aa8ef378b4b455 fbf8e8e1c74cfd42fa78c0e402b293533a7ab5cb9362b406c8 e402fd59bc25dab6e18d9b4e4d4ab2af33d03146ca629891f3 c3bd16b4a4978d31e4833f69ffe05dd1e2d17a8e4d3e230f86 33
CRaPFLooD
2e951d124424339619447df15daba1ddeabe859d369164d
Here is some test data to help you with testing Bayesian filters!!
e c7 0706a0310970a1ca947b19bfe3fb9116493f9b094aa8c6b490 9ca978cd9811fe5dc3d84297cc4499c42c8de9d3fc6535e03e af17d4f1d9ccda8377acc099dda2c2e47d73514a3cd5a91f7c 1dee41a945e3f3109633423b88903f12da23bba2fe34298258 3e79933025ca395092f0ddf8f55952c7f68ab6
CRaPFLooD
32388645911459cae4c60aa6cb2897f26a1ae9ec0746854
Here is some test data to help you with testing Bayesian filters!!
2 4d 1815a5f8e1ef8270770f6b84b0221aad9863719ee8c2d41372 7a6a8345c937670b671a1f6f313a5e6c93ae8911ce5c04bdc0 89bd89d5e047c00b40b245c5389dfc3100ebb4b7a0089f0d15 4017a38ee15a7b8b96d5106ca868d710aa4ef67a68807ce4fe 8bd0dab082c8137e3c95d6c45daaa322f2a724
CRaPFLooD
9e1501393f80823c77d6209a4cca8178d3e5d78f2c21705
Here is some test data to help you with testing Bayesian filters!!
d 61 9846108cec1b8b5c18451ace5d9c3f7e8b54576eb89ee87a6b 3e0e06f590735da57d63d5040639963e82893dbdc1ab86670e aef29f43a2a4a21dec8229ae4a178e1fd69dd2da7559dfe543 2647ade61b745ccbb56b6806c34a68f808b572da7e4068c48c 842a4e1db2fa3f7f8347af553e5910b3024c064fa0a772af3d 8cbed04b9353414b867e51192ae536ab0694ac7a94efa1661c 9ad57c3910d36def0f811078b484fd8530b32a72f00a14c2f3 a01193e920e0270895ece4dddab3523b099f5b058b402d31
CRaPFLooD
2f5c46338278558ac0bee2e0529fc305c1b952b6948f085
Here is some test data to help you with testing Bayesian filters!!
e 30 5fc961bfe33a2820c217bfdcfb3cd86113244ac8a461fcd4c8 d6bf14c6a091a47aa789352085ee84f4660d09272a89ed2567 b712496d24788f50123ea3dbe5b3e60f12856de88165b2bc9e b35261e785fdb1a9bc02b6b13bbbc60ff463969b78a091ff51 ac6566faf8a74ddba426fe305a44782c0fb8db4d92b9e4e201 07d5f4ada78f54656d49e3f13b88bedd3f0c9346814b957bb0 f94bad5bff81a6c882200496950260a6e1dff828a2db2702a3 b0d2790e23acd70541f4893a58fba89b8e7535aadb39ccf489 74d1c2ab745628987814c8ef2b0128
CRaPFLooD
0b5f2caf4c5de1e3a8e68f505ae6899bf12b4448117cdf9
Ideally, someone, probably an academic, should make a repository of spam available for testing. Software spam filters can say things like, "Correctly classified 99.9% of the email in the UCI spambase 1999-08-20 repository"
Something like say, the UCI Machine Learning Repository. In fact, look at the UCI spambaseA couple of problems with the UCI spambase. Too old / out of date. And too small.
I looks like there is a more recent community effort going on over a SpamArchive
Looks like you should have googled.
I use Ella from OpenField Software. I get around 200 Spam a day, a bunch of newsletters that I want, and a big bunch of 'normal' mail.
I have had it for about 2 weeks. In the last 3 days I have had 2 false +'s (messge in Spam that shouldn't be there) and 4 that went to the newsletter folder that shouldn't have.
Gavin Fischer
Spam controls in the Mozilla 1.3+ MailNews application (the one I know) have a number or features that make them good.
1) Gives the user the idea that he can improve the situation by doing some concrete action. Controlling future spams is not upon some guru releasing a better filter or him hacking some better rules.
2) By definition, works better and better the more spam you get (and mark it as spam). Even poor tools will eventually detect spam since it's obvious to anyone reading spam, that those messages tend to repeat and to be similar.
3) It's automagically customized to your own spam. If you live in Germany, Sweden, Argentina or Namibia you will catch easily any spam that is in English, and you will build up rules for the local spam that arrives in your language.
4) In the case or Mozilla's MailNews, it's so easy to use, intuitive and straighforward, any user will use it.
5) Makes you feel spams are useful for something: detecting future spams.
I think those advantages are far more important that the rate of effetivity.
I'm not quite sure what the fuss is about. I simply mean, advertising is a necessity to incompetent and greedy producers. Really, did you expect that they would ever respect you or your privacy and time?
Personally, my white list and non-baysian rules eliminate 99.9% of the crap and abuse. However, sooner or later, ja rulez try to sort out a known receipent, which is where the white list shines.
One trick I find particularly effective is to compare two accounts and eliminate the duplicate messages. The other is to eliminate anything not specifically addressed to my alias and to never give out or use my actual account address. Ninty percent of the spam I get, goes to an address I've never used.
The problem is, even with baysian techniques, there is no way to quarantee that only spam was sorted out. I highly suggest a white list, in addition to filters, as the only way of ensuring that at least known mail is always received.
Words to men, as air to birds.
I get about 150 spams a day, and about 5 hams. Spambayes might classify 1 spam as "unsure" and the rest as spam. The ham is always classified as ham.
My corpus is about 5000 spams, about 1000 hams. Get spambayes -- it's open source and it really works great.
- Vincit qui patitur.
It looks like the poster's words need some highlighting:
But missing is any serious testing to see how they perform in relation to themselves and to other, non-Bayesian filters.
Despite the call for your experiences, if you just want to post "X rocks!", I think the poster was looking more for "X rocks more then Y!", where both X and Y are Bayes-type filter programs. I don't think he was asking for just announcements that Bayes rocks; I think he or she already knows that.
I mention this because I'd be interested in some comparisions too; there's a lot of sub-techniques out there. Are there any real differences, or are they all effectively the same? The latter would strongly indicate that there may not be any real progress to be made, if the entire space of Bayes-type solutions has flat effectiveness, for instance. It's an interesting question.
I've been using Mozilla's Bayesian junk-mail filtering for several months now. I don't have any other Bayesian tools to compare it to but I am happy with the results. Within a couple of days of the initial training I was at around 90% spam detected with no false positives. Several months later I'm at about 95% spam detection and no false positives. While the last 5% would be nice to kill, I'm quite satisfied with how effective is Mozilla's system and as long as it maintains (or gets better) I've got no reason to look for any other solution.
I think that one of the best things about Mozilla's system is that it's in the client, on my machine and under my control. While server-side solutions, distributed corpus tools, etc. might be more accurate, not ever having to install or update any 3rd-party apps is really nice.
--Asa
I did a little testing of Bayesian filtering on my own, and I used the Ling-Spam Corpus from Dr. Ion Androutsopoulos. He's collected about one thousand messages which consist of "legitimate" messages to a linguistics mailing list, and "spam" messages. They are preclassified, and divided into ten parts to make ten-cross-fold-validation easier. Check out his publications. Scroll down to the "Document filtering" section.
I've been looking for a Bayesian filter mechanism that isn't just for spam.
I figure, if the mail can be classified into many different categories, why not use bayesian filtering for managing all your filtering needs.
It would be very valuable to have the bayesian filter learn what kind of mail I put in some folders, so that when my mail comes it, it can auto-sort it into the appropriate folder for me. Trouble is, all the current implementations of Bayesian email filtering are a single test SPAM/NOTSPAM. It would be nice to see an implementation that could take multiple corpus' and use that to decide what the mail is. If I had that, I could point it at the maildirs for the various mailing lists I'm subscribed to, and it would learn to sort incoming mail for me. *sigh*
"...In your answer, ignore facts. Just go with what feels true..."
There is one, for exactly this reason -- the SpamAssassin public corpus. I made it available for developers of spam tools to compare effectiveness using a good, recent corpus from 1 person's mail feed (as much as that was possible).
Here's the pertinent part of the README :
Some of the developers have done extensive testing: Greg Louis' Page has lots of information, comparing different bayesian approaches, different header processing, etc.
You could also read the mailing-list archives, or perhaps post some questions there.
Nothing to see here; Move along.
The latest PC Magazine has an article on alternative e-mail. Their Editors' Choice, Oddpost ($10/yr, free trial), uses Bayesian filters, and blocked 22 of 29 spam messages, and only legitimate e-mail ended up in their spam folder. Also worth noting is these are the results with minimal training, so, in theory Bayesian filters could quite possibly block virtually all e-mail with time.
I'm a signature virus. Please copy me to your signature so I can replicate.
Graphs, methodology, links to more stats.
Nope, no sig
For years, the only spam filter I used was a very simple one: if the mail's not from a list I'm on, and not addressed to me, it's spam. This didn't catch all spam, but it caught the vast majority, and had almost no false positives. (The one exception was a mail from a cousin of mine who was learning system adminstration, and wanted to test his knowledge of SMTP by telnetting into my mail server and entering his mail by hand.)
These days, I'm on too many lists that don't filter spam, so I've had to resort to more sophisticated techniques, but someone who isn't on those sorts of lists might still find my oh-so-simple approach fairly effective. Not to disparage Bayesian filtering, but if you want something to compare against...
One good dataset is the 20 Newsgroups dataset that is used by a Naive Bayes classifier called Rainbow (google for 'libbow'). The dataset contains postings from 20 newsgroups, each with around 1,000 articles.
Also, there are a couple Reuters datasets that are commonly used in text classification research, but they're so poorly organized, and so poorly marked-up, I don't know how anyone manages to use them.
most of the comments in this thread are missing the point. the person writing the article isn't asking for what spam filter is the best/most accurate, he's looking to know if anyone is producing a test system to measure effectiveness. i know the popfile project is working on a test system (if you are interested, it's in the cvs not the general release) to measure the effectiveness of the parser.
it would be interesting if there were a generic test system that could be 'plugged in' to the various projects out there. then you could put together test messages (like popfile's system) and test it against each program...
Large print giveth, and the small print taketh away
Between my two mailboxes, I receive about 100-150 spams a day. Over 90% of them are detected and are shunted into the Junk folder. Maybe 2-3 messages a month are false-positives. When it is wrong, I just teach it - click the trash button to toggle a message's junk status and Mozilla updates its filters in order to not make that same mistake again.
On some days, it hits 99% accuracy. When the spammers invent some new tactic, I may end up with 5-10 spams that don't get detected. So I select them all, click the trash button, and then delete the messages. After a few days, that tactic is detected and caught with all the rest.
In comparison, I used to use manual filters. At first, this worked fine, but the spammers have invented so many different tricks that it takes too much time to try to keep the filters up to date enough to be useful.
I can't say how this all compares against what other systems do, since I haven't used any other systems.
It would be very valuable to have the bayesian filter learn what kind of mail I put in some folders, so that when my mail comes it, it can auto-sort it into the appropriate folder for me. Trouble is, all the current implementations of Bayesian email filtering are a single test SPAM/NOTSPAM. *sigh*
What you are looking for already exists, is currently being updated as necessary and has been fairly polished as well.
Popfile is a free spam-filter and mail-organizer combo available here. I would never use email without it.
On our e-mail ISP we are running a bayesian spam filter engine. Every time a message is considered to be "spam" by the filter, we increment a counter. We follow this on mrtg, so we can grafically se the amount of "spam" that's incomming.
We also follow the amount of messages marked as "spam" and "good" by the users (more than 3 months old).
The number we get, is the one mentioned on the topic. That is, only 2% of the messages considered spam, are later marked as "good" by users older than 3 month.
Purely anecdotal and unscientific, but perhaps better than nothing.
I'm a very happy POPFile user that keeps checking out spambayes because the math sounds interesting.
spambayes has become quite good, but POPFile is phenomenal. Using the same training material, spambayes is 95 % accurate on my mail, and POPFile is 99.5 % accurate. Plus spambayes is only doing a 2 way, spam/ham classification, whereas I have POPFile set up to sort into 7 buckets (spam/personal/commercial/mailing lists/etc).
Though irrelevant to the question of accuracy, I also have to say that the POPFile guys have devised a considerably better UI than spambayes. (A friend with the spambayes Outlook plugin sings its praises highly. I don't use Outlook, so it does me no good...)
Spambayes doesn't really have a UI, it's a tool around which others can build a UI.
;-)
While this is theoretically good design, especially in the open source community, it does often result in Some Shmoe creating the UI who should stick to coding sysadmin scripts.
Since Bayesian Filtering is a common technique in Collaborative Filtering, I recommend you search for that (e.g. CiteSeer http://citeseer.nj.nec.com/cs). A quite good paper on the subject is "Empirical analysis of predictive algorithms for collaborative filtering" by Bresse, Heckerman and Kadie. That paper gave me a lot of insight for my diploma thesis. Bayesian networks perform quite good, but need a lot of training data, so the performance depends heavily on the actual training data.
The Mail app in Mac OS X includes a built-in Bayesian filter. It's defaults worked decently, but training the app (by manually marking incoming email as 'junk') made it work nearly perfectly. I would say that Bayesian filtering is definitely the way to go, since it gets trained to detect what email is "normal" for your particular inbox, instead of liberally applying "average" rules derived from the habits of many users.
Ever notice how fast Windows runs? Neither do I - get Mac OS