New Method of Spam Filtering

← Back to Stories (view on slashdot.org)

Posted by CmdrTaco on Thursday February 19, 2004 @06:32AM from the something-to-read dept.

Alephcat writes "A simple and easily implemented scheme for combating e-mail spam has been devised by two researchers in the United States. P. Oscar Boykin and Vwani Roychowdhury of the University of California, Los Angeles use their method to exploit the structure of social networks to quickly determine whether a given message comes from a friend or a spammer. The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category. The article was published on Nature magazines website earlier today."

5 of 326 comments (clear)

Min score:

Reason:

Sort:

So it's just a very good rule, how is that bad? by Smack · 2004-02-19 06:46 · Score: 5, Informative

According to the article, it can make a decision on 53% of the total e-mail, and divide it up into Spam or non-Spam with complete accuracy. The key is that it makes no judgement on the rest of the e-mail.

So you could throw this as a rule into SpamAssassin with a 100 weight on Spam results and a -100 weight on non-Spam results. That could only help your filtering. With zero false-positives.
Re:Easily spoofed? by FauxPasIII · 2004-02-19 06:47 · Score: 5, Informative

There are two 'sender' fields that one is concerned with: The envelope-sender and the From: header. The latter can be spoofed as much as you like. The former cannot be spoofed in most cases, at least the host/domain part (the username can be spoofed if the server uses unauthenticated SMTP, which almost all do).

A typical message would look like this:
From spammer@baddomain.com From: Your friend <yourfriend@gooddomain.org> Subject: Re: your mail Buy our crap ! Click below to be removed. Blah blah.

The first From field is the 'envelope sender' and comes entirely from the servers that have touched the mail. The rest of the fields are just a freeform part of the message, which by convention most (all?) MUA's treat in a special way to add convenient features like having the 'real name' next to your mail address in the visible From: field.

--
25% Funny, 25% Insightful, 25% Informative, 25% Troll
How it works - clustering coefficients by blorg · 2004-02-19 06:57 · Score: 5, Informative

You can read an abstract, and download the full (e.g. original) article here in a variety of formats.
From what I can make out, this system graphs correspondent pairs into correspondence maps, and notes that while normal people all email each other and thus have dispersed graphs, (high clustering coefficient) spammers have a distinct pattern, e.g. 1 person emailing a few million others (low clustering coefficient). There are figures in the article that make this point well.
The system would be ideal for implementation at a fairly high level, (e.g. the ISP level) where systems can aggregate email headers across many different users in order to come up with meaningful graphs. The advantage it claims of no false positives means that it would be feasible at this level.
I'm impressed; it looks like a very clever idea. My only question concerns how this would deal with mailing lists, which must appear to it like spam?
Re:Easily spoofed? by mlefevre · 2004-02-19 07:05 · Score: 5, Informative

The envelope-sender can be just as easily spoofed as the From: header. If you're sending email out through your ISP or corporate email relay, that may well check that the host (or the whole address) is correct.

If you do as most spammers do and connect directly to the receiving server, then you can feed it whatever you like in the envelope sender, and it has no way of checking whether it's genuine or not. This is what stuff like SPF can help with, but as things are currently implemented just about everywhere, the envelope-sender addresses on spam and viruses are generally forged.
Erm, not by Vainglorious+Coward · 2004-02-19 07:06 · Score: 5, Informative

The [envelope-sender] cannot be spoofed in most cases
Simply : untrue. It's as easy to fake the envelope sender as it is the From: header. I think you're getting confused with "Received" headers, where each mail system inserts its own bit of tracking information. The envelope-sender is completely under the control of the sender, and (usually) propagates un-modified as an email is handed between systems (indeed, one of the criticisms of SPF is that by modifying the envelope sender you break forwarding).

--
My next sig will be ready soon, but subscribers can beat the rush