New Method of Spam Filtering

← Back to Stories (view on slashdot.org)

Posted by CmdrTaco on Thursday February 19, 2004 @06:32AM from the something-to-read dept.

Alephcat writes "A simple and easily implemented scheme for combating e-mail spam has been devised by two researchers in the United States. P. Oscar Boykin and Vwani Roychowdhury of the University of California, Los Angeles use their method to exploit the structure of social networks to quickly determine whether a given message comes from a friend or a spammer. The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category. The article was published on Nature magazines website earlier today."

19 of 326 comments (clear)

Min score:

Reason:

Sort:

Interesting by jchawk · 2004-02-19 06:34 · Score: 5, Interesting

It would be interesting if Google could find away for this idea to work with Orkut.com, since users of this service are typically connected to many other people who are not spammers. :-)
Volume by enderanjin · 2004-02-19 06:35 · Score: 4, Interesting

If the filters are effective against only half of the emails, what is preventing spammers from doubling their load in order to control the same amount of spam getting to your inbox as they do now?

--
Anything in parenthesis may (not) be ignored.
huh? by wankledot · 2004-02-19 06:36 · Score: 4, Interesting

It only works for half... but it works great on that half!!! How is that a good filter at all?
Of course one huge downside to this "friend of friends" approach is all the virus spam I get that's sent using someone's address book (thanks Outlook!) Guess what... all those addresses are probably whitelisted because it came from someone I "know."

--
My sig is blank, I typed this by hand.
1. Re:huh? by CeleronXL · 2004-02-19 06:38 · Score: 5, Interesting
  
  Well you can run mail through a system like that first, pulling out the mail that is definitely not spam and shuffling it away to the Inbox. Then run it through a different kind of spam system, such as a system like SpamBayes, and you cut it down even more.
  
  On its own it doesn't sound like it works well, but you can couple it with already-existing systems to boost accuracy.
Viruses? by AntiOrganic · 2004-02-19 06:36 · Score: 3, Interesting

Won't this just inspire more spammers to pursue virus, trojan and spyware-oriented methods of spamming? Granted, this is significantly more difficult than just harvesting email addresses off of Usenet and web pages, but it seems like we're only one step ahead at any given time with our methods of spam prevention.
Bugger Off! by ackthpt · 2004-02-19 06:37 · Score: 5, Interesting

You take food away from a spammer and his children. Don't block spam, or else you hate childeren. You don't hate children... do you?
You know darn well that this will only increase employment in the Spam Technology sector and is a good thing.
Seriously, Spammers are often a step ahead and lately a lot of spam I'm getting is masked to look like Amazon orders or closed ebay auctions. I haven't ordered anything from Amazon (USA) in ages, but I till have to peek to see if someone has cracked my account and ordered something. Just expect the harder they are pressed, the harder spammers will press back by sinking to new lows.

--

A feeling of having made the same mistake before: Deja Foobar
Good idea by Schezar · 2004-02-19 06:38 · Score: 5, Interesting

After reading this, I realized that a good 90% of the email I receive is either from someone I've had previous contact with, or else someone 1 or at most 2 degrees of separation from one of those people. I never get mail worth reading from total strangers. Anything important is always linked back to me in some way.

It should be interesting to see how this method plays out. (Now, I don't know why I even bothered with that last sentence. Everyone says that about every new spam-filtery thing. ((Don't know why I bothered with that last sentence either. Work is slow today I suppose.)) )

--
GeekNights!
Late Night Radio for Geeks!
this doesn't address spoofed email by alpha1125 · 2004-02-19 06:38 · Score: 3, Interesting

What about spoofed messages from people on my list?

Worms, from infected email systems?

The researchers didn't address this.

--
Money cannot buy happiness, but can buy something soo darn close, that you can't really tell the difference
A two tier system? by erick99 · 2004-02-19 06:38 · Score: 4, Interesting

I suppose you could use this as a first pass and let those go directly to the "recycle bin" or whatever deletes mail (if you really can be confident that they are all spam). Then, the balance of your email could go through whatever antispam system you use. Right now I get over 100 spam emails a day. These go into a folder and are sorted by sender so that I can quickly scan through for any "friendly" emails. If would be nice to cut down the amount that has to be manually scanned by a half. Either way, this sounds like it's going in the right direction - towards a system that is close to 100% effective (if that is truly possible).
Happy Trails!
Erick

--
http://www.busyweather.com/
Heading the wrong way by Muddie · 2004-02-19 06:42 · Score: 5, Interesting

This sounds like the whole "Friends and Family" network from AT&T a few years ago, and now Verizon's "In" network thing, but with email and exclusive instead of "Free calls to friends on 'the list'".

Pretty soon, you will have to send an MD5 hash of your DNA from a static IP address that is reversible and supply 5 refrences all in a PGP encrypted letter, along with a copy of your passport and birth certificate.

When it's more work to block spam than stop it, you have to ask what is going wrong. Maybe if we somehow figured out wonderful technologies to *stop* spammers instead of blocking them, we'd be getting towards the ultimate goal. This is much like throwing money at a problem to bandage it, not fix it. The solution, however, also has to be easier for end users, who are doing nothing wrong. Why is every solution harder for end users, but just a 'bump in the road' for spammers? Am I missing something?
Spammers already defeat this (partially) by xleeko · 2004-02-19 06:46 · Score: 5, Interesting

Spammers already sort addresses by site in order to take advantage of this effect. They forge the from address as someone else from your site on the theory that you know them and would whitelist them.
In fact, this has provided me with a kind of "honeypot", since I now check for the addresses of several people who are long gone from my site. If I see their address its gotta be spam!
- Dave
Re:Easily spoofed? by DR+SoB · 2004-02-19 06:49 · Score: 3, Interesting

The issue is recieving.. Yes, you can EASILY block outbound, it's inbound that's an issue.

"We prevent the worm's SMTP engine from working by having MX wildcard records to a logging box only for internal DNS -"

Say what? Why wouldn't you just block outbound port 25 from anyone expect YOUR SMTP server's address? If a worm has it's own SMTP engine (many do, yes), then what's to stop it from doing it's own MX look-ups? It would take about 4 extra lines of code to accomplish this.

--
Mod +5 Drunk
I guess that pigs have wings. by Henry+Stern · 2004-02-19 07:04 · Score: 3, Interesting

I never thought that Slashdot would help me find papers relevant to my research!

I think that their idea is good from a technical point of view, but very bad from a privacy point of view. I am of the opinion that gathering social network information is extremely dangerous. A pertinent example: If your friend is branded a "terrorist," then "they" can exploit the information that you have voluntarily provided to then put you on a "terrorist" watch list.

Another example: Say that someone who knows someone that you know actually buys something from a spam. If the spammer can access the social network information, suddenly your little niche of the network is going to be aggressively spammed. After all, like minds congregate.

There is no doubt in my mind that the black hatters will infiltrate the social network communities and use that information to spy on potential viewers. See this bugzilla thread where the folks from Atriks Professional Email Deployment Service follow SpamAssassin's development and adapt their "ratware" tool accordingly.

The biggest problem with collecting social networks is that once the data has been gathered, it is very hard to control. Those of you using Orkut should think long and hard about it.

In conclusion, I think that this is technically a good idea but it opens a Pandora's box.
Reverse MX DNS querying by germinatoras · 2004-02-19 07:09 · Score: 3, Interesting
I've been thinking about this method for a while - basically, you configure your SMTP server to do this:
- MTA connects to you, gives you a MAIL FROM: xxxxx@somedomain.com
- Your server performs a MX query for somedomain.com, getting a list of IP addresses
- Your server compares the IP of the connecting MTA to the list of IPs in the MX records.
- No match? Connection gets aborted.
This idea is cleary too simple to have not been thought of before - but I have yet to find a good explanation as to why it won't work. Verizon.net uses this exact method - try sending a SMTP message from a host that isn't listed in your domain's MX records, you get a 550 Sorry, you aren't allowed to mail for this domain". or something comparable. How come this method isn't more widely used? Going through my own SMTP server logs show that the vast majority of SMTP servers sending legit mail are also listed in the domain's MX records. The only price is that you require the sender and receiver to be the same within a domain - hardly an unreasonable requirement.
I once had an evil idea by WormholeFiend · 2004-02-19 07:09 · Score: 3, Interesting

to deal with open relays in China...

I would ve harvested the emails of as many members of the ruling communist party as possible, and used those relays to spam them with anti-communist propaganda. I believe the consequences would've been swift and ruthless.

Unfortunately I cant read/write Chinese, and this idea wouldnt work in less repressive regimes...
Re:So it's just a very good rule, how is that bad? by GooberToo · 2004-02-19 07:17 · Score: 4, Interesting

Or simply not process the 53% with other spam detection software, which saves on CPU! In other words, make this the first anti-spam process, whereby, half of your email gets to skip spamassassin (or whatever). The other 50%, you process as usual.
Re:So it's just a very good rule, how is that bad? by GooberToo · 2004-02-19 07:24 · Score: 3, Interesting

Oh ya, in case it's not obvious, that means up to a 50% reduction in the small percent of email which are false-positives. That means, if you have a 5% false-positive, you *may* see that reduced to as little as 2.5%! Technically, it may actually be higher than that. The reason being, it may be that 100% of the false-positives fall into the 50% that this technique properly identifies. Needless to say, that's very exciting. It also means that it creates the possibility to allow people to lower their spam threshold without fear for creating a higher false-positive hit rate. That in turn, means more spam identified with fewer false positives. Let's hope reality false close to my rambling speculations here! ;)

Very interesting indeed!
HOW SPMAMMERS CAN BEAT THIS FILTER by goombah99 · 2004-02-19 08:53 · Score: 4, Interesting

There are three ways one can beat the filter.

The first is trivial and certain to succeed but has a Drawback to spammers: only send e-mail to single recpients. The drawback is this puts a much higher load on their servers since every message is sent individually.

The second method is to always include dummy addresses in the mailing list that the recpients probably have in their address books. For example, add the following names to the to-field: notifications@paypal.com and list-notication@ebay.com.
Any recpieint that of the spam message that also has recieved e-mail from e-bay or pay-pal will trust the message.

One can do even better by planning ahead when harvesting e-mails. For example, if you harvest a set of e-mails from a pqarticular bulliten board you can make note of message cliques at the time of harvesting, and send messages in the same groupings. for good measure you also send the addresses of the buliten board admins as well.

Third, all the spammer really has to do is to know is one recipient you have gotten messages from. Thus either buy mailing lists from legitimate companies people actually do bussniess with. Or create your own loss-leader messages. For example, send out some political action alert or anything that has some vlaue or use to most people, maybe a lottery drawing for a prize, or a discount subsciption to time magazine, so they will accpet the message. the sender does not have to be the same as your spammer address. Now you know someone in the adress book of the victim. Now you spam the crap out of them while including the trojan address in the to: field.

--
Some drink at the fountain of knowledge. Others just gargle.
1. Re:HOW SPMAMMERS CAN BEAT THIS FILTER by kirkjobsluder · 2004-02-19 17:19 · Score: 3, Interesting
  
  The first is trivial and certain to succeed but has a Drawback to spammers: only send e-mail to single recpients. The drawback is this puts a much higher load on their servers since every message is sent individually.
  
  True this method is strongest against dictionary spam and does not work against non-dictionary spam.
  
  [i]The second method is to always include dummy addresses in the mailing list that the recpients probably have in their address books. For example, add the following names to the to-field: notifications@paypal.com and list-notication@ebay.com.
  Any recpieint that of the spam message that also has recieved e-mail from e-bay or pay-pal will trust the message.[/i]
  
  Um, did you RTFA? (And perhaps most importantly, did anybody modding this article RTFA.)
  
  The algorithm has nothing do do with addressbooks. Instead, it looks at friend of a friend networks as identified by mail headers.
  
  For example, I work on a project with Bob, and Susan. A typical email message about the project will include my address, and their addresses in the header. The algorithm assumes that three first degree relationships exist:
  me-bob
  me-susan
  susan-bob
  
  There are also three second-degree (friend of a friend relationships.
  me-susan-bob
  me-bob-susan
  susan- me-bob
  
  The high ratio of second-degree/first-degree relationships gives susan and bob a higher score (3/3=1), and puts them on the whitelist.
  
  With paypal.com, there is only one first-degree relationship: (paypal.comme) and no secondary relationships. The algorithm handles single relationship networks as a special case, and defines them as ambiguous.
  
  With a typical dictionary attack, a spam comes with 50 email addresses in the header. However, because a dictionary attack relies on sequential or randomly generated usernames, the number of recipients who are part of my social network is low. So we have 50 first degree relationships, and lets say the spammer gets lucky and nails Susan and Bob as well. It still gets a low score. (2/50=.04)
  
  One can do even better by planning ahead when harvesting e-mails. For example, if you harvest a set of e-mails from a pqarticular bulliten board you can make note of message cliques at the time of harvesting, and send messages in the same groupings. for good measure you also send the addresses of the buliten board admins as well.
  
  This is a slightly better strategy. However, this only works if you use email from a member of the clique, and limit the recipient list to members of the clique.
  
  But there is a serious problem with the strategy. The stated goal of the authors (did you RTFA?) is to increase the costs of spamming to the point where spamming is no longer economically profitable. Such a strategy would require research which is expensive.
  
  Or create your own loss-leader messages. For example, send out some political action alert or anything that has some vlaue or use to most people, maybe a lottery drawing for a prize, or a discount subsciption to time magazine, so they will accpet the message. the sender does not have to be the same as your spammer address. Now you spam the crap out of them while including the trojan address in the to: field.
  
  Once again RTFA. The algorithm has nothing to do with addressbooks. But you did raise one possible threat: spoofing. A spammer could not get integrated into my social network by offering a loss-leader (for the same reason that messages from ebay.com would not be whitelisted). A spammer could spoof a member of my social network. (For example, using Bob's address.) However, the problem here is economics. Bob would probably only be auto-whitelisted by 50 people. Thus spoofing Bob would only get you access to a small population, which defeats the entire economic rationale for spamming.