Plan for Spam, Version 2
bugbear writes "I just posted a new version of the Plan for
Spam Bayesian filtering algorithm. The big change is to mark tokens by context. The new version decreases spams missed by 50%, to 2.5 per 1000, even though spam has gotten harder to filter since the summer. I also talk about how spam will evolve, and what to do about it."
Overblown? The fact that you would need more than one email account to keep from having your time wasted by spam proves otherwise.
Spam filters are great, but it seems that only the Net-savvy are using them. Savvy users aren't the people spammers are making all their money from--they are making money off the naive and inexperienced users. These users aren't going to go out and install the latest Bayesian filters on their system, and the major email readers won't (and probably shouldn't) come with them automatically activated.
To make spam cost-ineffective for the spammers, we've got to stop it (or flag it) before it gets to the end-user. It would obviously be a mistake to allow ISP's to automatically delete all email that fails their spam filters, but I think it would be appropriate for them to include something in the headers flagging such email as probable spam. Then future email readers could detect this header and handle it gracefully, like moving it to a "spam" folder on the user's machine. Once this happens and Grandpa no longer gets email asking him to test the latest Viagra alternative, spam may become a thing of the past.
I think I speak for everyone when I say false positives are the only real hinderance to the filtering of spam. I get roughly 20 emails a day, 75% of which are spam. If one of them slips past the filter and I see it, it doesn't bother me so much. Spam is no longer a problem. What is an absolute necessity, though, (and probably less so for me than other people) is that none of my legitimate email is filtered as spam. I'd rather have 100 spams filtered improperly than one legit email.
Whale
Yeah, 2.5 per 1000 getting through is a proof that his ideas are obviously flawed. Having a working system is the best proof that an idea works :)
Travis
Everyone but the folks at SpamAssassin have been focusing on the idea that any one technique for identifying spam is doomed to diminishing returns.
Over at SpamAssassin, they've been busily creating a system that collects "good enough" tests by the dozens and uses them to collectively score a message and determine its general "spamishness". The system relies on a complex scoring system that is determined, not by the whim of human programmers, but on the results of a genetic training system that pits one set of scores against another until equilibrium is reached for a given set of example spam and non-spam.
See my other post here for how Bayesian filtering will be used to allow this system to feed back on itself and improve as it sees more of your spam and non-spam....
In certain ways, the government does and should do precisely that. If I repeatedly call you at 4 AM to ask if your refrigerator is running or deliberately send you virus-laden e-mail, then you have every right to call upon the long arm of the law to slap down the harassment.
Spamming, being a violation of the recipient's property rights, falls into that category.
/. If the government wants us to respect the law, it should set a better example.
"Does anyone think AOL or Hotmail could start using such a system as the one outlined in the article?"
No. My problem's with the senders, not the messages. What Hotmail should do is send back an email saying "Your message has been rejected because you have not been authorized by this user. If you'd like to request authorization, click here and follow the instructions."
When they properly fill out the form, you get a message saying "so'n'so wants to send you a message. Interested?" and you can say yes/no. If you say yes, they get added to your address book and they can email you until you remove them from it.
With this approach, it requires a valid return address before the message can possibly get to you. That means you're able to tell the person to remove you, unlike today's 'send anything to anybody' system.
If Hotmail did that, I'd actually consider paying for their service.
Actually spamassassin has a nice built-in reporting toolAnd if you setup it up to work with with Vipul's Razor for it's all automagically updated.
Stupid Cheap Guitars
Correction.. spam will never stop... ever.
You say that it will stop if it's fully against the law and people bring legal action to stop it.
Last time I checked, murder was illegal, punishable by death in many states, yet it still occurs.
Last time I checked, murder was illegal, punishable by death in many states, yet it still occurs.
People spam because it is rational to do so (or at least spammers make them think so). Very low costs, the possibility of a good return, and nothing to lose since there are virtually no spam laws.
A better comparison than murder is the practice of child labor. While it was legal it was a rational practice to engage in, because the return was high and the risk was low -- if a kid gets eaten by a machine you just find another kid. Now that is illegal the practice is almost completely extinct because it is no longer rational -- the police would come knocking at the door, which impedes the goal of running a profitable business.