Mozilla Adding Spam Filters

Arms Race by Camel+Pilot · 2002-11-14 05:52 · Score: 3, Interesting

But the spammers will develop Bayesian filters of their own to find the best content that will sneak by your filters.

Re:Arms Race by Lionel+Hutts · 2002-11-14 07:40 · Score: 3, Interesting

That's an arms race the spammers can't win. Sending spam is an ultra-low-margin business: with response rates of a fraction of a basis point, and probably only a fraction of them actually spending any money, the cost and effort per message sent must be very, very, very low for the spammers to make any money at all. Most spam recipients would gladly put in, say, $20 worth of effort to spamproof their addresses; there is no way even a spammer with huge scale could invest even $5 worth of effort for one more address. We will all have different Bayesian rules, remember. Combine that with the fact that I have perfect information about what spam and nonspam I get, and the sender has little or no information about what gets through, and it's clear that even hours of effort by senders wouldn't do much.

And, even if they could afford to keep it up for a while, my spam filter will get better faster than their spam. This is the "Ambassador's criterion" from SDI (briefly: Star Wars won't lead to an arms race if it gets to the point where shooting down an the marginal missile is cheaper than building the marginal missile).

I think we may just win the Spam Wars yet.

--
I Can't Believe It's A Law Firm, LLP does not necessarily endorse the contents of this message.

Re:102 Features IE doesn't have by crossseyed · 2002-11-14 05:55 · Score: 4, Interesting

It doesn't mean they're not thinking about it, though...

http://research.microsoft.com/~horvitz/junkfilter. htm

--
-- Outside of a dog, a book is man's best friend. Inside a dog, it's too dark to read

Filtering by Transient0 · 2002-11-14 05:56 · Score: 5, Interesting

Bayesian technique is very good for the sort of abstract classification task that spam represents. It would be an interesting hack to try and train a network to categorize based solely on message body... i do however hope that their team has opted for practicality over just hack value and the network will also use such extremely relevant data as header information and comparing address versus address book(an e-mail from someone not in your address book is not necesarrily spam... but it is more likely to be).

--
lysergically yours

Re:Filtering by Shamashmuddamiq · 2002-11-14 10:46 · Score: 3, Interesting

I don't believe it was "invented" by Paul Graham. Thoughts of separating spam from real email based on the statistical properties of its content is something that has come to my mind, as well as the minds of many people over the last few years. Just because Paul's page was the first one that you've seen explain it in detail doesn't mean he invented it.
BTW, there are ways of getting around Bayesian filtering. For instance, if you take random words from a large dictionary of long, normal conversational but not-often-used-in-spam words and splatter them throughout your spam, its easy to convince the bayesian filter that it's not spam. Not only will this decrease your false negatives, it has the capability of increasing your false positives. This is because your new spam will be training your bayesian filter, and putting lots of non-spam-like words into its vocabulary. If the spammers keep up with their dictionaries as well as the filters keep up with theirs (and I must assume this will happen), we've still got a big problem on our hands.
Don't get me wrong. I have bogofilter installed on my mail server at home, and it works great for now. But don't expect it to work forever.

--
...just my 2 gil.

Mozilla mail / browser by FrostedWheat · 2002-11-14 05:58 · Score: 4, Interesting

I wonder if a similar technique could be used in the browser. Automatically block images or popups based on previous ones you have blocked.

Now that would be very nifty!

zilla by sstory · 2002-11-14 05:58 · Score: 3, Interesting

I just switched to Mozilla. Happy to be free of Microsoft for email. It's skinnable, and there are some cool skins--like one which sort of emulates Evolution. I noticed an annoying 'feature' though, which is still there from Netscrap days--if you send an email without a subject, a dialog pops up and goes blah blah blah. I asked the Mozilla newsgroup if there was a way around this, but all I got was the sort of adolescent yammerings that keep me out of unmoderated newsgroups. Nice to see it has a spamfilter now. The only major improvement remaining is to add a spell-check (the Netscrap one was licensed from a 3rd party, and can't be freely distributed).

One question... by Hard_Code · 2002-11-14 06:01 · Score: 5, Interesting

I assume the filtering statistics live on the client side. What about IMAP? If I open up Mozilla on a new machine, are all my spam statistics lost (presumably rendering the junk mail filtering statistics I've accumulated useless on the new machine).

It would be neat if, with IMAP accounts, Mozilla just stored the statistics in a file on IMAP server instead of on the client.

--

It's 10 PM. Do you know if you're un-American?

Re:One question... by BroadbandBradley · 2002-11-14 08:36 · Score: 3, Interesting

someday you'll be able to backup and restore your Mozilla Profile, and when that day comes, I hope you'll remember that Mozilla has a House online at ZillaVilla.com

--
"The Most Fun Possible on 4 wheels" is at SunBuggy in Las Vegas

SpamAssassin + Mozilla = Schweet! by Noryungi · 2002-11-14 06:04 · Score: 5, Interesting

Well, most of my spam is already sent to /dev/null by the SpamAssassin ninja.

But, for those that make it past the email shadow warrior, I guess Bayesian filters are a double whammy they'll never survive... Mwahahahaha!

Kudos to the Mozilla programmers!

--
The right to offend is far more important than the right not to be offended. (Rowan Atkinson)

Microsoft's Patent by woboz · 2002-11-14 06:05 · Score: 5, Interesting

What happens when microsoft attempts to enforce this patent

My only complaint... by Mustang+Matt · 2002-11-14 06:09 · Score: 3, Interesting

In Outlook Express, I can setup 100 different email accounts and not have a giant list of mail folders.

In Mozilla (last I checked) for every account you setup it creates a new set of folders.

Since I've got a catchall account, I'd like to tie multiple email addresses to one set.

Anybody out there on the Mozilla team listening?

--
The man who trades freedom for security does not deserve nor will he ever receive either. - Benjamin Franklin

interesting idea... by Lumpy · 2002-11-14 06:28 · Score: 5, Interesting

what if in addition to this someone put together a company that the mozilla email client can report back to about what is labelled as span and the filters it created along with the headers of the message (or even the entire spam) and grab filters from others that recieves some spam that you have yet to recieve? it would be like a big distributed computing anti-spam project.. then if we were able to make the filters useable by sendmail to block at the server...

I'm almost thinking a distributed and automated anti-spam system like that could completely crush the spam problem within a 12 month period.

or I may be completely out of my mind.

--
Do not look at laser with remaining good eye.

Not impressed by macdaddy · 2002-11-14 06:35 · Score: 4, Interesting

Well, ok I am impressed that Mozilla is implementing spam filtering abilities in their MUA. I AM NOT impressed with Bayesian spam filters AT ALL. I've been using Mac OS X's Mail.app since I switched to OS X. It's not my primary MUA but I am letting it POP out a copy of all my mail and "learn" from it. It does a pretty good job of finding maybe 80% of the spam I get. However it has a BAD false-positive rate. I mean hell its been flagging CERT advisories as spam. That kind of crap is really annoying. It's flagged co-workers' mail as spam numerous times (and even though I happen to agree... :) ). The biggest problem I have with Bayesian as a mail admin is that I am constantly dealing with spam. Users forward it to me. I receive a number of spam bounces. I work in spam all that damned time. That's the problem. I need a MUA with Bayesian filters that are smart enough for me to tell them to ignore all mail from certain domains or that went to certain accounts. All of the Bayesian filters built into MUAs I've worked with so far can't do things like that. It's really annoying given the position that I'm in.

Re:Not impressed by tbmaddux · 2002-11-14 08:50 · Score: 4, Interesting

However it has a BAD false-positive rate. I mean hell its been flagging CERT advisories as spam. That kind of crap is really annoying. It's flagged co-workers' mail as spam numerous times..
I had this problem early-on as well. I fixed it by marking the false positives as "Not Junk." You can do these even when it's in "Automatic" mode as opposed to "Training." All the "Automatic" does is enable the filter that send the marked messages to the "Junk" folder.
But it still learns in either mode! Early on my shipping notices from Amazon.com (and even Apple.com, ha ha) were being flagged as Junk, but not anymore. I think it's great and will only improve with time, with others' caveats about client-side email spam checking being flawed noted.

--
Can't you see that everyone is buying station wagons?

Emacs! by MosesJones · 2002-11-14 06:36 · Score: 4, Interesting

This is something that Emacs has in the GNUS client, you score emails up and down and it starts adding filtering rules. Using LISP you could extend this to do some pretty funky moderating.

Every problem is reducable to a previously solved problem or by definition is unsolveable - Church Turing Thesis.

--
An Eye for an Eye will make the whole world blind - Gandhi

Hmm, my spam experiences by krappie · 2002-11-14 06:47 · Score: 5, Interesting

I personally dont really care about all the junk emails I get. I dont get that many, and I can pretty much tell without looking at them. They go straight to /dev/null.

Spam is such a horrible thing though. I work at a webhosting company. Im the one that has to track down the site with the old formmail.pl, removing 'aol.com' and 'yahoo.com' from the hosts to relay for, trying to find out who the hell added them so I can murder them. Im the one clearing out the mail queue with 100,000 mails. Im the one clearing the mail queues of people who thought it was a good idea to check the 'open relay' option in plesk. Im the one that has to deal with people bitching about how their mail isnt working or didnt get through.

Just the other day, I had a raq2 where someone had apparantly put yahoo.com and excite.com in the hosts to relay for. Yay! Thats what attracted the spammers. Now I get a request every second to send mail to 50 people at once. Now that I've removed them, none of them are getting through. But its a raq2, 133 mhz. It has to go through all 50 addresses and say 'relaying denied' and log it. It cant keep up! syslogd is taking up all the cpu and logging things from hours ago because its behind. Quickly, sendmail quits listening on port 25 (but the spam attempts keep coming somehow).

So I get the idea to block their ips, they seem to be using the same ips. But oh guess what, they're using open proxies and have about 400 ips. Well, I did this for about 5 hours, writing scripts to grab the repeated ips out of the maillog, adding them all to my sendmail access lists. Now every time they try to send mail, it blocks them instead of saying relaying denied 50 times for each request. But a minute later, I get a few new ips and it starts all over again. I have an access list about 6 pages long. Its doing ok, blocking about 90% of them, but every once in a while, they get a new ip and sendmail is brought to a stop.

Oh yeah, and my /var/ partition is only 200MB, 50mb free. And the maillog is growing at about 10mb a day. So now Im babysitting this server every day until the spam attempts stop. I dont think theres any way around it unless I get sendmail to check for open proxies. But I dont know how to do that, and I dont think they trust me enough to make such changes to sendmail.

So oh well, mail is getting lost every day on this server and its been renderred horribly slow for its users.. just because some moron noticed it would send some emails for him and started up his scripts.

Spam causes so many problems on the server level. Its what is making mail an unreliable service. I could care less about spam filters on my mail client. These are the things that make spam evil!

Real spam control.. by grub · 2002-11-14 07:00 · Score: 3, Interesting

.. should start at the server preventing the offending mail from ever coming into the network in the first place.

Not that localized spam filters are a bad thing (they aren't!) but refusing connections from known spammer IPs and the proper use of blacklists would cut down on a lot of the email traffic. Once the spam is in your inbox, its just an annoyance to you. The cost to the net has already been incurred.

--
Trolling is a art,

Re:102 Features IE doesn't have by afidel · 2002-11-14 07:03 · Score: 3, Interesting

Popup killing and tabbed browsing are the two killer features that have allowed me to spread mozilla widely through my office. People see me surfing and ask what the tabs are or ask where the popup have gone. I tell them about mozilla and show them how easy it is to stop popups. Yes I know about crazybrowser which does both of these, but it does popup killing badly (it's an all or nothing thing, not just unsolicited popups).

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.

Spam filters should bust the spammers, also. by Futurepower(R) · 2002-11-14 07:04 · Score: 5, Interesting

Software that only does mail filtering encourages spammers. The technically knowledgeable people don't get spam, so they stop worrying about it.

All mail filters should also use a service like SpamCop, so that the spammers lose their internet service accounts as the spam is filtered.

I send Spamcop all my spam. Spamcop analyzes it automatically and sends a message to the Internet Service Provider. I use the free Reporting only service.

Re:"Bayesian filtering" aka "Naive Bayes" by standards · 2002-11-14 07:13 · Score: 3, Interesting

Well, I certainly have a large volume of SPAM that I plan to use for training purposes. I'm not a big user of personal email, but somehow about 70% of all my incoming personal mail is SPAM. My Dad is much worse off.

I'm glad to see that the software industry is taking the SPAM problem seriously. And it's great to hear that more and more states, like Massachusetts, are enacting laws to curb the abuse of email systems.

I've been dependent on some static rules to curb SPAM (about 90% effective), but I think now it's time to implement more serious anti-spam measures.

Re:"Bayesian filtering" aka "Naive Bayes" by ceswiedler · 2002-11-14 07:21 · Score: 3, Interesting

Based on the last /. article on Bayesian filtering, I installed SpamProbe. I gave it a folder of about 70 spam emails, and a few hundred good emails I had in various folders. In the past few weeks, it's had one false negative, and a few false positives which were 'semi-spam' mailing list emails from Dell, RedHat, and Amazon. When I moved those emails into the 'recheck as good' folders, it learned its lesson.

It may be naive, but I was very surprised at how well it worked. It's better than SpamAssassin IMO, especially at foreign-language spam.

tmda.net? by Sludge · 2002-11-14 07:27 · Score: 3, Interesting

Has anyone tried Tagged Message Delivery Agent out? I would be curious to hear the mileage of others who have tried this.

Essentially, it throws the parsing problem right back in the spammer's faces: They must answer a fuzzy logic question in order to get into your inbox once and for all. It is similar to challenge/response routines in network connection code to prevent spoofing. The most interesting part from the intro:

The way TMDA thwarts incoming junk-mail is simple yet extremely effective. You maintain a "whitelist" of trusted contacts which are allowed directly into your mailbox. Messages from unknown senders are held in a pending queue until they respond to a confirmation request sent by TMDA. Once they respond to the confirmation, their original message is deemed legitimate and is delivered to you.

Bayesian filters to me, seem to work if you are a dull person without many changes in your life. For ex, if you constantly get spams with the word Madam in it and you later on get a sex change, you will need to recalibrate your filters. (Probably not the most pressing thing on your mind, so you'd lose a few authentic mails.)

Just some thoughts.

Re:Good example of MS's monopoly abuse by Refrag · 2002-11-14 16:21 · Score: 3, Interesting

The real problem with spam is that it steals bandwidth - blocking spam after it's already sitting in your mailbox is like closing the barn door after the horses have eaten your children - the bandwidth has already been used, so you don't gain anything... having your email client "block" spam isn't really blocking it, it's just an automatic "delete key".. which is what the spammers want (how many of them say spam isn't a problem because you can "just hit delete")

I'd argue that the time wasted on filtering spam is more valuable than the bandwidth wasted delivering it. This is why I am glad that Apple was able to bring good client-side spam filtering to the people with Mail and that Mozilla will soon provide this feature as well.

--
I have a website. It's about Macs.

24 of 464 comments (clear)