Mozilla Adding Spam Filters
ksheka writes "Mozilla mail now has Spam Filters, using Bayesian filtering method, no less. This is a very good thing, because it learns from the spam you receive, and constantly modifies itself, based on new spammer techniques!"
But the spammers will develop Bayesian filters of their own to find the best content that will sneak by your filters.
http://research.microsoft.com/~horvitz/junkfilter. htm
-- Outside of a dog, a book is man's best friend. Inside a dog, it's too dark to read
Bayesian technique is very good for the sort of abstract classification task that spam represents. It would be an interesting hack to try and train a network to categorize based solely on message body... i do however hope that their team has opted for practicality over just hack value and the network will also use such extremely relevant data as header information and comparing address versus address book(an e-mail from someone not in your address book is not necesarrily spam... but it is more likely to be).
lysergically yours
I wonder if a similar technique could be used in the browser. Automatically block images or popups based on previous ones you have blocked.
Now that would be very nifty!
I just switched to Mozilla. Happy to be free of Microsoft for email. It's skinnable, and there are some cool skins--like one which sort of emulates Evolution. I noticed an annoying 'feature' though, which is still there from Netscrap days--if you send an email without a subject, a dialog pops up and goes blah blah blah. I asked the Mozilla newsgroup if there was a way around this, but all I got was the sort of adolescent yammerings that keep me out of unmoderated newsgroups. Nice to see it has a spamfilter now. The only major improvement remaining is to add a spell-check (the Netscrap one was licensed from a 3rd party, and can't be freely distributed).
I assume the filtering statistics live on the client side. What about IMAP? If I open up Mozilla on a new machine, are all my spam statistics lost (presumably rendering the junk mail filtering statistics I've accumulated useless on the new machine).
It would be neat if, with IMAP accounts, Mozilla just stored the statistics in a file on IMAP server instead of on the client.
It's 10 PM. Do you know if you're un-American?
Well, most of my spam is already sent to /dev/null by the SpamAssassin ninja.
But, for those that make it past the email shadow warrior, I guess Bayesian filters are a double whammy they'll never survive... Mwahahahaha!
Kudos to the Mozilla programmers!
The right to offend is far more important than the right not to be offended. (Rowan Atkinson)
What happens when microsoft attempts to enforce this patent
In Outlook Express, I can setup 100 different email accounts and not have a giant list of mail folders.
In Mozilla (last I checked) for every account you setup it creates a new set of folders.
Since I've got a catchall account, I'd like to tie multiple email addresses to one set.
Anybody out there on the Mozilla team listening?
The man who trades freedom for security does not deserve nor will he ever receive either. - Benjamin Franklin
what if in addition to this someone put together a company that the mozilla email client can report back to about what is labelled as span and the filters it created along with the headers of the message (or even the entire spam) and grab filters from others that recieves some spam that you have yet to recieve? it would be like a big distributed computing anti-spam project.. then if we were able to make the filters useable by sendmail to block at the server...
I'm almost thinking a distributed and automated anti-spam system like that could completely crush the spam problem within a 12 month period.
or I may be completely out of my mind.
Do not look at laser with remaining good eye.
Well, ok I am impressed that Mozilla is implementing spam filtering abilities in their MUA. I AM NOT impressed with Bayesian spam filters AT ALL. I've been using Mac OS X's Mail.app since I switched to OS X. It's not my primary MUA but I am letting it POP out a copy of all my mail and "learn" from it. It does a pretty good job of finding maybe 80% of the spam I get. However it has a BAD false-positive rate. I mean hell its been flagging CERT advisories as spam. That kind of crap is really annoying. It's flagged co-workers' mail as spam numerous times (and even though I happen to agree... :) ). The biggest problem I have with Bayesian as a mail admin is that I am constantly dealing with spam. Users forward it to me. I receive a number of spam bounces. I work in spam all that damned time. That's the problem. I need a MUA with Bayesian filters that are smart enough for me to tell them to ignore all mail from certain domains or that went to certain accounts. All of the Bayesian filters built into MUAs I've worked with so far can't do things like that. It's really annoying given the position that I'm in.
This is something that Emacs has in the GNUS client, you score emails up and down and it starts adding filtering rules. Using LISP you could extend this to do some pretty funky moderating.
Every problem is reducable to a previously solved problem or by definition is unsolveable - Church Turing Thesis.
An Eye for an Eye will make the whole world blind - Gandhi
I personally dont really care about all the junk emails I get. I dont get that many, and I can pretty much tell without looking at them. They go straight to /dev/null.
/var/ partition is only 200MB, 50mb free. And the maillog is growing at about 10mb a day. So now Im babysitting this server every day until the spam attempts stop. I dont think theres any way around it unless I get sendmail to check for open proxies. But I dont know how to do that, and I dont think they trust me enough to make such changes to sendmail.
Spam is such a horrible thing though. I work at a webhosting company. Im the one that has to track down the site with the old formmail.pl, removing 'aol.com' and 'yahoo.com' from the hosts to relay for, trying to find out who the hell added them so I can murder them. Im the one clearing out the mail queue with 100,000 mails. Im the one clearing the mail queues of people who thought it was a good idea to check the 'open relay' option in plesk. Im the one that has to deal with people bitching about how their mail isnt working or didnt get through.
Just the other day, I had a raq2 where someone had apparantly put yahoo.com and excite.com in the hosts to relay for. Yay! Thats what attracted the spammers. Now I get a request every second to send mail to 50 people at once. Now that I've removed them, none of them are getting through. But its a raq2, 133 mhz. It has to go through all 50 addresses and say 'relaying denied' and log it. It cant keep up! syslogd is taking up all the cpu and logging things from hours ago because its behind. Quickly, sendmail quits listening on port 25 (but the spam attempts keep coming somehow).
So I get the idea to block their ips, they seem to be using the same ips. But oh guess what, they're using open proxies and have about 400 ips. Well, I did this for about 5 hours, writing scripts to grab the repeated ips out of the maillog, adding them all to my sendmail access lists. Now every time they try to send mail, it blocks them instead of saying relaying denied 50 times for each request. But a minute later, I get a few new ips and it starts all over again. I have an access list about 6 pages long. Its doing ok, blocking about 90% of them, but every once in a while, they get a new ip and sendmail is brought to a stop.
Oh yeah, and my
So oh well, mail is getting lost every day on this server and its been renderred horribly slow for its users.. just because some moron noticed it would send some emails for him and started up his scripts.
Spam causes so many problems on the server level. Its what is making mail an unreliable service. I could care less about spam filters on my mail client. These are the things that make spam evil!
.. should start at the server preventing the offending mail from ever coming into the network in the first place.
Not that localized spam filters are a bad thing (they aren't!) but refusing connections from known spammer IPs and the proper use of blacklists would cut down on a lot of the email traffic. Once the spam is in your inbox, its just an annoyance to you. The cost to the net has already been incurred.
Trolling is a art,
Popup killing and tabbed browsing are the two killer features that have allowed me to spread mozilla widely through my office. People see me surfing and ask what the tabs are or ask where the popup have gone. I tell them about mozilla and show them how easy it is to stop popups. Yes I know about crazybrowser which does both of these, but it does popup killing badly (it's an all or nothing thing, not just unsolicited popups).
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Software that only does mail filtering encourages spammers. The technically knowledgeable people don't get spam, so they stop worrying about it.
All mail filters should also use a service like SpamCop, so that the spammers lose their internet service accounts as the spam is filtered.
I send Spamcop all my spam. Spamcop analyzes it automatically and sends a message to the Internet Service Provider. I use the free Reporting only service.
Well, I certainly have a large volume of SPAM that I plan to use for training purposes. I'm not a big user of personal email, but somehow about 70% of all my incoming personal mail is SPAM. My Dad is much worse off.
I'm glad to see that the software industry is taking the SPAM problem seriously. And it's great to hear that more and more states, like Massachusetts, are enacting laws to curb the abuse of email systems.
I've been dependent on some static rules to curb SPAM (about 90% effective), but I think now it's time to implement more serious anti-spam measures.
Based on the last /. article on Bayesian filtering, I installed SpamProbe. I gave it a folder of about 70 spam emails, and a few hundred good emails I had in various folders. In the past few weeks, it's had one false negative, and a few false positives which were 'semi-spam' mailing list emails from Dell, RedHat, and Amazon. When I moved those emails into the 'recheck as good' folders, it learned its lesson.
It may be naive, but I was very surprised at how well it worked. It's better than SpamAssassin IMO, especially at foreign-language spam.
Essentially, it throws the parsing problem right back in the spammer's faces: They must answer a fuzzy logic question in order to get into your inbox once and for all. It is similar to challenge/response routines in network connection code to prevent spoofing. The most interesting part from the intro:
Bayesian filters to me, seem to work if you are a dull person without many changes in your life. For ex, if you constantly get spams with the word Madam in it and you later on get a sex change, you will need to recalibrate your filters. (Probably not the most pressing thing on your mind, so you'd lose a few authentic mails.)
Just some thoughts.
I'd argue that the time wasted on filtering spam is more valuable than the bandwidth wasted delivering it. This is why I am glad that Apple was able to bring good client-side spam filtering to the people with Mail and that Mozilla will soon provide this feature as well.
I have a website. It's about Macs.