How Apple's Mail.app Junk Filter Works
fmorgan writes "O'Reilly has now posted the second part on an article about Mac OS X Mail.app spam filtering with more details on what this technology is (and isn't): 'Many myths have emerged about Mail's junk mail filter. No, it's not an extremely complex set of rules, no it doesn't look for keywords, and no, it doesn't use white magic ... Interestingly enough, the technology that underlies the Junk Mail filter began its life as an information retrieval system.'"
it's simple. it uses it's extremely uninsipired app name to scare away spam.
The "Insert Quote Here" line is almost as predictable as inserting an actual quote.
The article mentions...
"In mathematical terms, we would say that every document is a vector of n numbers or a point in a space with n dimensions."
Funny. When I took linear algebra I was wondering if there was a practical approach to this, and I guess there is... to elliminate penis enlargement advertisments.
Yes! I listen to NYC Speedcore and do math at 3AM. I suggest you try it too.
Why wouldn't a similar algorithm work to provide automated moderation? It seems to me that you could certainly identify clusters of words that indicate low-value posts?
Each document is in turn represented by a long string of numbers, one for each word in the corpus. In mathematical terms, we would say that every document is a vector of n numbers or a point in a space with n dimensions. This coordinate is then mapped onto a unique position in the goatse.cx photograph. If it lands in an objectionable region, the message is discarded as spam.
It's an interesting method, but not having Mail.app myself, what I'm wondering is how well it works on the border regions; that is, when it is just barely objectionable. Say, on his leg.
Wow. If your grandma is suggesting you viagra I think your problems go way deeper than Bayesian misfirings..
Dude, you seriously need to seek help for your mail-archiving condition :)
Or if nothing else move some of the mail to a backup directory so the poor little imap server doesn't have to deal with YOUR pack-rat habits!