New Method of Spam Filtering
Alephcat writes "A simple and easily implemented scheme for combating e-mail spam has been devised by two researchers in the United States. P. Oscar Boykin and Vwani Roychowdhury of the University of California, Los Angeles use their method to exploit the structure of social networks to quickly determine whether a given message comes from a friend or a spammer. The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category. The article was published on Nature magazines website earlier today."
You take food away from a spammer and his children. Don't block spam, or else you hate childeren. You don't hate children... do you?
He was probably sick of people like me mistaking his name for a made up spam "from" line.
It would be interesting if Google could find away for this idea to work with Orkut.com, since users of this service are typically connected to many other people who are not spammers. :-)
What's to stop the From:, To:, and Cc: fields from being spoofed (like a lot of viruses do)?
- Sam Ruby
Spammers suck, right? And their children have obviously inherited the spamming gene. So, by starving the children to death, we're preventing the spam gene from spreading. It may sound wrong, but we're actually helping society.
You know darn well that this will only increase employment in the Spam Technology sector and is a good thing.
Seriously, Spammers are often a step ahead and lately a lot of spam I'm getting is masked to look like Amazon orders or closed ebay auctions. I haven't ordered anything from Amazon (USA) in ages, but I till have to peek to see if someone has cracked my account and ordered something. Just expect the harder they are pressed, the harder spammers will press back by sinking to new lows.
A feeling of having made the same mistake before: Deja Foobar
After reading this, I realized that a good 90% of the email I receive is either from someone I've had previous contact with, or else someone 1 or at most 2 degrees of separation from one of those people. I never get mail worth reading from total strangers. Anything important is always linked back to me in some way.
It should be interesting to see how this method plays out. (Now, I don't know why I even bothered with that last sentence. Everyone says that about every new spam-filtery thing. ((Don't know why I bothered with that last sentence either. Work is slow today I suppose.)) )
GeekNights!
Late Night Radio for Geeks!
Well you can run mail through a system like that first, pulling out the mail that is definitely not spam and shuffling it away to the Inbox. Then run it through a different kind of spam system, such as a system like SpamBayes, and you cut it down even more.
On its own it doesn't sound like it works well, but you can couple it with already-existing systems to boost accuracy.
If it doesn't use bullets, I don't want to hear about it.
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
Can't stop the friend-of-a-friend idiot who hits "reply to all."
It might not be "spam" but I filter it now. I'll stick with my procmail filters.
This sounds like the whole "Friends and Family" network from AT&T a few years ago, and now Verizon's "In" network thing, but with email and exclusive instead of "Free calls to friends on 'the list'".
Pretty soon, you will have to send an MD5 hash of your DNA from a static IP address that is reversible and supply 5 refrences all in a PGP encrypted letter, along with a copy of your passport and birth certificate.
When it's more work to block spam than stop it, you have to ask what is going wrong. Maybe if we somehow figured out wonderful technologies to *stop* spammers instead of blocking them, we'd be getting towards the ultimate goal. This is much like throwing money at a problem to bandage it, not fix it. The solution, however, also has to be easier for end users, who are doing nothing wrong. Why is every solution harder for end users, but just a 'bump in the road' for spammers? Am I missing something?
In fact, this has provided me with a kind of "honeypot", since I now check for the addresses of several people who are long gone from my site. If I see their address its gotta be spam!
- Dave
According to the article, it can make a decision on 53% of the total e-mail, and divide it up into Spam or non-Spam with complete accuracy. The key is that it makes no judgement on the rest of the e-mail.
So you could throw this as a rule into SpamAssassin with a 100 weight on Spam results and a -100 weight on non-Spam results. That could only help your filtering. With zero false-positives.
Used to be that one of the cool things about the net was that you would get email from total strangers... "Hi, I'm from {some far away place}. I saw your {Usenet post|web page|profile on some bulletin board site} and really liked your ideas about {something}. I've also been experimenting with {something} and I have some ideas about {whatever}..."
Now, if we only have emails from our (already existing) friends or friends of friends, then how will we ever meet anybody new?
- In Capitalist America, law violates YOU!
From what I can make out, this system graphs correspondent pairs into correspondence maps, and notes that while normal people all email each other and thus have dispersed graphs, (high clustering coefficient) spammers have a distinct pattern, e.g. 1 person emailing a few million others (low clustering coefficient). There are figures in the article that make this point well.
The system would be ideal for implementation at a fairly high level, (e.g. the ISP level) where systems can aggregate email headers across many different users in order to come up with meaningful graphs. The advantage it claims of no false positives means that it would be feasible at this level.
I'm impressed; it looks like a very clever idea. My only question concerns how this would deal with mailing lists, which must appear to it like spam?
Simply : untrue. It's as easy to fake the envelope sender as it is the From: header. I think you're getting confused with "Received" headers, where each mail system inserts its own bit of tracking information. The envelope-sender is completely under the control of the sender, and (usually) propagates un-modified as an email is handed between systems (indeed, one of the criticisms of SPF is that by modifying the envelope sender you break forwarding).
My next sig will be ready soon, but subscribers can beat the rush
The proposed anti-spam clustering technique is of course a variation on whitelisting. While clever, it fails to address a problem I have not often seen addressed. Many people defend themselves from spam by obscuring their e-mail addresses in public places, and perhaps by using whitelists to prefer known senders. This may be effective for many people.
However, some of us can't avoid having a publically available e-mail address. For example, writers such as myself rely on feedback from readers who are, in nearly all cases, strangers (and sometimes strange, but that's another story...) Avoiding false positives from strangers is very important to me. I want their messages. But, since my e-mail address is published frequently (hence no reason to hide it here), I obviously receive a ton of spam.
For the past few months I have experimented with a plug-in called BayesIt! for the Windows email reader The Bat!. As the name implies, it's a bayesian filter. The nice thing about BayesIt is that I could point it to my already-stuffed spam folder and train it on thousands of messages in one go. So far it has worked out rather well. No false positives, and only about 10-20 false negatives per day (out of approx. 400 spams).
Still, in the long run I support proposals that shift the economics of e-mail in ways that have minimal impact on human beings while making spam unprofitable. Changing the economic model of spam is the only sure solution; relying solely on technology will simply keep us locked in an ongoing arms race.
-Aaron