New Method of Spam Filtering

← Back to Stories (view on slashdot.org)

Posted by CmdrTaco on Thursday February 19, 2004 @06:32AM from the something-to-read dept.

Alephcat writes "A simple and easily implemented scheme for combating e-mail spam has been devised by two researchers in the United States. P. Oscar Boykin and Vwani Roychowdhury of the University of California, Los Angeles use their method to exploit the structure of social networks to quickly determine whether a given message comes from a friend or a spammer. The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category. The article was published on Nature magazines website earlier today."

7 of 326 comments (clear)

Min score:

Reason:

Sort:

Easily spoofed? by Sam+Ruby · 2004-02-19 06:35 · Score: 5, Insightful

What's to stop the From:, To:, and Cc: fields from being spoofed (like a lot of viruses do)?

--
- Sam Ruby
Re:Sounds interesting... by rjelks · 2004-02-19 06:41 · Score: 4, Insightful

I would agree with that in terms of personal email accounts, but for a business, new contacts are pretty important. Most companies would hope a lot of real email was from new sources.

-

--

Tech News, Reviews and Tutorials
This method will ruin a cool part of the net by The+Wing+Lover · 2004-02-19 06:49 · Score: 5, Insightful

Used to be that one of the cool things about the net was that you would get email from total strangers... "Hi, I'm from {some far away place}. I saw your {Usenet post|web page|profile on some bulletin board site} and really liked your ideas about {something}. I've also been experimenting with {something} and I have some ideas about {whatever}..."

Now, if we only have emails from our (already existing) friends or friends of friends, then how will we ever meet anybody new?

--
- In Capitalist America, law violates YOU!
Mailing lists / newsletters by blorg · 2004-02-19 07:15 · Score: 4, Insightful

A mailing list would have multiple folks in the To: line, which would be easy to spot automatically.
Not necessarily, indeed most professional ones avoid this. While many spams do contain multiple people in the To: field (but also many don't). One way or the other, I don't think this is relevant if we are trying to compare the graph of a mailing list to that of a spammer. To take an example, user slashdot-headlines@newsletters.osdn.com sends thousands of emails to people *who don't know each other*. User enlargeyourdong@hotmail.com has exactly the same pattern. How do you tell these apart?
Some of us rely on e-mail from strangers by beagle72 · 2004-02-19 07:25 · Score: 5, Insightful

The proposed anti-spam clustering technique is of course a variation on whitelisting. While clever, it fails to address a problem I have not often seen addressed. Many people defend themselves from spam by obscuring their e-mail addresses in public places, and perhaps by using whitelists to prefer known senders. This may be effective for many people.

However, some of us can't avoid having a publically available e-mail address. For example, writers such as myself rely on feedback from readers who are, in nearly all cases, strangers (and sometimes strange, but that's another story...) Avoiding false positives from strangers is very important to me. I want their messages. But, since my e-mail address is published frequently (hence no reason to hide it here), I obviously receive a ton of spam.

For the past few months I have experimented with a plug-in called BayesIt! for the Windows email reader The Bat!. As the name implies, it's a bayesian filter. The nice thing about BayesIt is that I could point it to my already-stuffed spam folder and train it on thousands of messages in one go. So far it has worked out rather well. No false positives, and only about 10-20 false negatives per day (out of approx. 400 spams).

Still, in the long run I support proposals that shift the economics of e-mail in ways that have minimal impact on human beings while making spam unprofitable. Changing the economic model of spam is the only sure solution; relying solely on technology will simply keep us locked in an ongoing arms race.

-Aaron
Most newsletters are one-way by blorg · 2004-02-19 07:25 · Score: 4, Insightful

Easy - those thousands of people who don't know each other also send email *back* to the mailing list. Only a few dummies send email back to the spammers.
Most mailinglists and newsletters are one way - I'm not talking about discussion lists or listservs, but rather about the bot that sends me Slashdot headlines, Jakob Nielsens' Alertbox, Fred Langa's newsletter, and even commercial speech that I am signed up to and want to hear such as Komplett's weekly offers, or Ryanair's cheap flights, etc.
Re:How it works - clustering coefficients by orthogonal · 2004-02-19 07:50 · Score: 4, Insightful

The system would be ideal for implementation at a fairly high level, (e.g. the ISP level) where systems can aggregate email headers across many different users in order to come up with meaningful graphs. The advantage it claims of no false positives means that it would be feasible at this level.

Yeah, but I'd consider a high-level analysis of my email headers (either sent or received) to be a violation of my privacy. Whether or not I'm mailing to kinky@alterate.life.styles.com, fringe.politcal.groups.require@free.speech.too.org , unpopular.opinions@free.thinkers.net, or falun.gong@is.banned.by.my.dictator.org, it should be nobody's business but my own.

Someone will undoubtedly argue that since headers are sent in the clear anyway, it shouldn't matter, but keeping a database of who mails what to whom only makes abuse -- by freelance busybodies or government spies and censors -- that much the easier.

This is a case, I think, were the threat inherent in the cure is worse than the disease.

--
Opinions on the Twiddler2 hand-held keyboard?