Mozilla Adding Spam Filters

Re:Arms Race by TamMan2000 · 2002-11-14 05:55 · Score: 5, Insightful

Interesting thought, but they would have to have a large sample of YOUR valid email to train on...

--
"I'll have a Guinness, no wait, make that a Coors Light" -Grad student I work with, who shall remain anonymous...

Re:Arms Race by ichimunki · 2002-11-14 06:12 · Score: 5, Insightful

Nonsense. It's impossible. First of all, they don't have access to much of the mail I want to let through-- although my mailing list traffic certainly qualifies, so let's assume that's the only mail I get and that they know I am receiving it.

There will still need to be header information and actual spam content in the spams themselves for those mails simply to not be repeats or dada-esque cutups of posts to the mailing list. That is, there must be content unique to the spam that no normal sender on the list will include.

Because of this, and the fact that so-called Bayesian spam filtering works by scoring all the words in an email and then evaluating the email based only on the extremes, there is little likelihood-- since the spam must still contain spam words to have any point at all-- of those words not being on the extreme word list. After all, if the same words are appearing in both spam and not-spam mails, they will be given a spam-probability that is not extreme. So all those words in common will be ignored and only the spam words will be looked at-- and the spam will still be filtered.

--
I do not have a signature

Good example of MS's monopoly abuse by SethJohnson · 2002-11-14 06:18 · Score: 5, Insightful

Sorry if this comes off as a MS-bashing rant. It's not intended as such.

The fact that MS doesn't seem hard at work implementing spam filters in Outlook or popup blockers in IE is a good example of consumers suffering due to Microsoft's monopoly. It also demonstrates how Microsoft is able to leverage its monopoly in one area (mail and web clients) to build profit in another market.

This other market is it's aspring ISP services. The app and mail client development teams aren't implementing these features because the Microsoft ISP wants to be able to tout the ability to filter spam and block popups. If the browsers and email clients used by 90%+ of the internet users had these features, then it wouldn't be a selling point for their ISP. This is a clear example of the company witholding features in the free products so it can profit from the antidote.

It also demonstrates the lack of competitive pressures in the market that normally drives a company to implement features at a rapid pace. Consumers are stagnating with a product for which the developer has no competitive pressure to improve. Hence that list of 102 things Mozilla can do that IE can't do.

--
$5 / month hosted VPS on linux = awesome!

Re:Good example of MS's monopoly abuse by schon · 2002-11-14 06:41 · Score: 5, Insightful

Sorry if this comes off as a MS-bashing rant.

No need to apologize - I love a good MS-bashing rant as much as the next /.'er.. :o)

I do, however, feel that it's not as big a problem as you do..

The app and mail client development teams aren't implementing these features because the Microsoft ISP wants to be able to tout the ability to filter spam and block popups.

This may (or may not - although I'm inclined to agree with your views) be true, but the important thing to understand is that the MTA (ISP)-level is where spam blocking belongs.

The real problem with spam is that it steals bandwidth - blocking spam after it's already sitting in your mailbox is like closing the barn door after the horses have eaten your children - the bandwidth has already been used, so you don't gain anything... having your email client "block" spam isn't really blocking it, it's just an automatic "delete key".. which is what the spammers want (how many of them say spam isn't a problem because you can "just hit delete")

MS's intentions aside, the solution they have is the correct one, even if their motives are suspect.

Client-Side Filtering is Wasteful by divide+overflow · 2002-11-14 06:27 · Score: 5, Insightful

Since you must first download the content for client-side filtering to work you waste bandwidth. If you are truly bombarded by spam you still lose...your mail spool still gets filled up with stuff you don't want, your data transfers compete for bandwidth with the spam, storage hardware works harder storing data that will only be deleted. It raises everyone's costs, including yours.

We need to block undesired mail at the host, not filter it at the client. That way the spam never gets sent, the spammer gets the message that their attempt was futile, and bandwidth is conserved. Many ISPs already provide this service...we need to improve on it. And we need better tools for identifying and dealing with spammers. The current mail standards are woefully inadequate to this task.

The ultimate filters by TigerTime · 2002-11-14 06:36 · Score: 5, Insightful

There needs to be a tiered structure with filters. The main one would be at the ISP level. It would only filter out obvious spam(like spam going to 2000 users at that ISP). The second tier would be at the client side and would have a certain level of intelligence in identifying spam. One feature that I'd like (it might already be available) is if it could automatically send an email back to the sender saying the email address doesn't exist. This should be done at the server level and/or client level. This could possibly help in removing your email from such lists. As far as what to do with the spam at the client level, I think that it should be sent to your main inbox but just marked as spam (maybe greyed out or something). Like new mail is always bold and once you read it it goes to a regular font. Well, spam could be just greyed out. That way you would ever miss something that the spam filter had a false hit on.

Sort by Spam Probability by Krellan · 2002-11-14 07:38 · Score: 5, Insightful

It seems too many people distrust spam filters because of the chance of accidentally blocking an important legitimate message as if it were spam.

Many spam filters are strictly binary: a message is either spam, or not spam. This is not ideal, because "gray area" messages - between these two extremes - will likely not be sorted correctly.

I propose adding a new sort option to email clients.

Sort by Spam Probability

This would be an additional field that can be displayed in a message list, similiar to "To", "From", "Subject", and the like. Like the article, probabilities would range from 99% (almost certain spam) to 1% (most likely an innocent message). Notice that 100% accuracy either way is not claimed.

This way, the user can see up front the messages that are most likely not spam. The spam messages will be relegated to the bottom of the list, possibly colored to indicate their likelihood of being spam. If there is a message in the "gray area", it will most likely appear in the list between the legitimate messages and the spam, so the user will have a chance to see the message and make a decision, without the message being lost in the shuffle.

This would be a great feature. I hope this gets into Mozilla's mail client.

(BTW, another feature that would be great to see in mail clients would be datestamping of the actual time the message was downloaded. Many spammers, and innocent people with misconfigured clocks, send emails with wild dates that are not to be trusted. You can see this in yearly archives of GNU "mailman" mailing lists! Datestamping emails as they are downloaded will also keep mailboxes in order when sorted by date, as newly arrived messages will always be at the bottom, instead of being scattered throughout the inbox. But sorting by spam probability will probably become more popular than sorting by date....)

--

Dr. Demento On The 'Net!

Bayes filters can't adapt to text in images by DuSTman31 · 2002-11-14 07:50 · Score: 4, Insightful

As a popfile user, I'm quite impressed with the catch rate possible with bayes theorem spam filters, however I suspect this will decrease in effectiveness over the long term.

Spammers are likely to respond to filters like this by encoding text in ways the filters can't read but humans can (eg having a .gif file of the text, loaded by a HTML statement in the message).

Statistical filters would need to have some kind of built in OCR routine before it could be effective against that trick, and some respectible mailing lists are using images as well, so you can't just filter all mails with images attatched.

In the long term, therefore, I suspect that filters that use a network database of spam will be more successful.

Re:Filtering by swdunlop · 2002-11-14 09:14 · Score: 5, Insightful

1) How much time do you spend training your paperclip in Office?

How much time are you going to spend on training your spam filter? If you are unwilling to invest a little time and effort in developing a solid set of values that fit your personal pattern of behavior, then Bayesian filters are indeed a poor match for you.

2) What harm is a false positive?

If you are automatically deleting anything that is marked as a positive for spam, then you are playing roulette with your email. I would generally recommend diverting email classified as spam by your filter to a folder, especially one that is relatively new and has had very little experience with your patterns of use. Set an expiry on your spam folder, and check it from time to time to see if something fell through the cracks. Mozilla has a handy feature that allows you to simply conceal spam from view, which works adequately, although I dislike the potential performance hit in a large folder.

Considering how important your email is to you, you should certainly consider applying a little diligence to how you manage it.

--
Weapons of Mass Analysis

9 of 464 comments (clear)