How Apple's Mail.app Junk Filter Works
fmorgan writes "O'Reilly has now posted the second part on an article about Mac OS X Mail.app spam filtering with more details on what this technology is (and isn't): 'Many myths have emerged about Mail's junk mail filter. No, it's not an extremely complex set of rules, no it doesn't look for keywords, and no, it doesn't use white magic ... Interestingly enough, the technology that underlies the Junk Mail filter began its life as an information retrieval system.'"
and no, it doesn't use white magic...
Black, then?
Or is that reserved exclusively for Microsoft?
The coolest voice ever.
it's simple. it uses it's extremely uninsipired app name to scare away spam.
The "Insert Quote Here" line is almost as predictable as inserting an actual quote.
The article mentions...
"In mathematical terms, we would say that every document is a vector of n numbers or a point in a space with n dimensions."
Funny. When I took linear algebra I was wondering if there was a practical approach to this, and I guess there is... to elliminate penis enlargement advertisments.
Yes! I listen to NYC Speedcore and do math at 3AM. I suggest you try it too.
Why wouldn't a similar algorithm work to provide automated moderation? It seems to me that you could certainly identify clusters of words that indicate low-value posts?
Each document is in turn represented by a long string of numbers, one for each word in the corpus. In mathematical terms, we would say that every document is a vector of n numbers or a point in a space with n dimensions. This coordinate is then mapped onto a unique position in the goatse.cx photograph. If it lands in an objectionable region, the message is discarded as spam.
It's an interesting method, but not having Mail.app myself, what I'm wondering is how well it works on the border regions; that is, when it is just barely objectionable. Say, on his leg.
I'd be willing to bet that its just another bayesian e-mail filter with maybe a few extra bells and whistles.
Umm, how much would you want to bet? I'll take that action!
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
Wow. If your grandma is suggesting you viagra I think your problems go way deeper than Bayesian misfirings..
reading that has cleary shown me for the first time why my friends/family complain when i talk technical about chemistry to them.
And i thought i spoke english!
Input:
Wow, the article just turned me on to the Summary Service. And I just used it to read a short and sweet summary of the article.
If you haven't played with it select a bunch of text (in a Cocoa app) and select Summary from the Services menu.
Very cool...
Output:
Wow, the article just turned me on to the Summary Service. And I just used it to read a short and sweet summary of the article.
If you haven't played with it select a bunch of text (in a Cocoa app) and select Summary from the Services menu.
Wow, look at that! Impressive!
(I actually love Summary Service, but I couldn't resist that joke.)
Mikey-San
Karma: +Eleventy billion (mostly affected by watching Celebrity Jeopardy)
But now imagine two Apple users using Mail Filter...
I know you are psychotic, but please make an effort.
Dude, you seriously need to seek help for your mail-archiving condition :)
Or if nothing else move some of the mail to a backup directory so the poor little imap server doesn't have to deal with YOUR pack-rat habits!
It's Apple. Gotta be good.
;-)
You know Apple INVENTED spamfiltering don't you?
This is Information Retrieval not Information Dispersal...Information Transit got the wrong man. I got the right man. The wrong one was delivered to me as the right man, I accepted him on good faith as the right man. Was I wrong?
My name's Lowry. Sam Lowry. I've been told to report to Mr. Warrenn.
Thirtieth floor, sir. You're expected.
Um... don't you want to search me?
No sir.
Do you want to see my ID?
No need, sir.
But I could be anybody.
No you couldn't sir. This is Information Retrieval.
There you are, your own number on your very own door. And behind that door, your very own office! Welcome to the team, D7-105! Welcome to Information Retrieval
"Music is everybody's possession. It's only publishers who think that people own it." - John Lennon.
I tried that, but my boss got angry when I refused to give him my business address.
Search 2010 Gen Con events
dumbass, pointless markup.
Some people really like using HTML, and everybody should respect that.
Those who read this hoseshit from the command line can just suck it up and deal with it.
Information wants to be anthropomorphized.