New Method of Spam Filtering
Alephcat writes "A simple and easily implemented scheme for combating e-mail spam has been devised by two researchers in the United States. P. Oscar Boykin and Vwani Roychowdhury of the University of California, Los Angeles use their method to exploit the structure of social networks to quickly determine whether a given message comes from a friend or a spammer. The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category. The article was published on Nature magazines website earlier today."
>/dev/null
Cretin - a powerful and flexible CD reencoder
You take food away from a spammer and his children. Don't block spam, or else you hate childeren. You don't hate children... do you?
He was probably sick of people like me mistaking his name for a made up spam "from" line.
It would be interesting if Google could find away for this idea to work with Orkut.com, since users of this service are typically connected to many other people who are not spammers. :-)
What's to stop the From:, To:, and Cc: fields from being spoofed (like a lot of viruses do)?
- Sam Ruby
If the filters are effective against only half of the emails, what is preventing spammers from doubling their load in order to control the same amount of spam getting to your inbox as they do now?
Anything in parenthesis may (not) be ignored.
Whoopy, that is nice! I have an antispam solution that works for half of my emails at work, too - I only accept email from my company's domain! Brilliant! No external spammers can get me!!
Of course one huge downside to this "friend of friends" approach is all the virus spam I get that's sent using someone's address book (thanks Outlook!) Guess what... all those addresses are probably whitelisted because it came from someone I "know."
My sig is blank, I typed this by hand.
isn`t this somewhat similar to thunderbirds function not to mark those in your mailinglist as spam ?
Doolittle :
Bomb no.20 : To explode of course.
Spammers suck, right? And their children have obviously inherited the spamming gene. So, by starving the children to death, we're preventing the spam gene from spreading. It may sound wrong, but we're actually helping society.
Won't this just inspire more spammers to pursue virus, trojan and spyware-oriented methods of spamming? Granted, this is significantly more difficult than just harvesting email addresses off of Usenet and web pages, but it seems like we're only one step ahead at any given time with our methods of spam prevention.
Hmm... Wouldn't a random number generator give you the same result?
But they're going to have to make it work better, say 75% of all email received. Hell, most legitimate mail you receive are emails from people/orgs you're corresponded with previously, so why doesn't it work on more than 50% of emails?
Spam control without charging for E-mail? This can't be - no way!
The article is great and all, but it doesn't mention if this method is actually being implemented anywhere yet. So, yeah, great theory, but I want to see it in practice.
You know darn well that this will only increase employment in the Spam Technology sector and is a good thing.
Seriously, Spammers are often a step ahead and lately a lot of spam I'm getting is masked to look like Amazon orders or closed ebay auctions. I haven't ordered anything from Amazon (USA) in ages, but I till have to peek to see if someone has cracked my account and ordered something. Just expect the harder they are pressed, the harder spammers will press back by sinking to new lows.
A feeling of having made the same mistake before: Deja Foobar
After reading this, I realized that a good 90% of the email I receive is either from someone I've had previous contact with, or else someone 1 or at most 2 degrees of separation from one of those people. I never get mail worth reading from total strangers. Anything important is always linked back to me in some way.
It should be interesting to see how this method plays out. (Now, I don't know why I even bothered with that last sentence. Everyone says that about every new spam-filtery thing. ((Don't know why I bothered with that last sentence either. Work is slow today I suppose.)) )
GeekNights!
Late Night Radio for Geeks!
What about spoofed messages from people on my list?
Worms, from infected email systems?
The researchers didn't address this.
Money cannot buy happiness, but can buy something soo darn close, that you can't really tell the difference
Happy Trails!
Erick
http://www.busyweather.com/
If I understand the technique correctly, it relies on information specific to individual users. Unless there is a way for users to export their information, that means that the filtering can only be done after the email reaches its destination, not by the ISP or central mail server. So it may be helfpul to individual users, but unlike some proposed techniques, it won't cut down on total email traffic.
For me as an ISP, I don't care if the email gets filtered between me and my customers. It hurts and costs me more for bandwidth to receive the emails, then store them, and then support the users that want me to clear their pop3 accounts when they are on dialup. Spam Filtering should take place at the Hub Cities on edge servers so it never gets to my mail server in the first place and I do not have the bandwidth charges. In exchange, I will filter all my outgoing mail on the mail server for spam outgoing. BTW, my mother likes spam. It is a good hobby of hers just to read through it. She gets very entertained by the content.
wow wrong thread.. note to self.. less beer...
Jeoin
If it doesn't use bullets, I don't want to hear about it.
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
Can't stop the friend-of-a-friend idiot who hits "reply to all."
It might not be "spam" but I filter it now. I'll stick with my procmail filters.
This seems to be a good start, but it still requires software on the user side. And that software must work with their mail client...
I guess it seems this is where the focus has become. While some spam can be blanketed and deleted, it's really up to the RECIPIENT to judge whether its spam or not.
But then again, do we trust the user? Do we trust Joe and Jane (our loving SixPack couple) to make the right decision? Sure, it might be prudent in a company of 5-50, but what about 500-5000? Deploy and manage copies of these program to see if it's going right or not?
I'm a sysadmin and I prefer the server based solution. Blacklists, SpamAssassin, et. al. Easier to fix one machine than 5000 desktops.
Comments?
When modding "Informative", please make sure it both has a source and IS actually informative.
Delete everything and wait for people to contact you by phone or snail-mail to see whassssup. This method has the side-effect of giving a well-deserved cold shoulder to people you want to exclude from your social circle.
50% is still better than Yahoo's filtering scheme. The problem w/ spam filters is that they are in reaction to spam, so spammers will always have the upper hand. Like CAN-SPAM, spammers found loop-holes before it even went into effect.
Im glad
Ninnle Linux has had very effecive spam-control in place for years! This isn't anything new at all. Wake up, Slashdot, and start noticing the developments tha take place in this cutting edge distro!
The article talks about a 'blacklist of spammers'... but... we ALL know this won't work, of course, since spam rarely, if ever, has a legitimate 'from' address.
Also, this kind of solution will ONLY work if it's not widely used. Once it DOES become widely used, the spammers will simply update their huge network of zombie machines so that the spamming software on those machines sends spam from friends to friends, utilising the available address books and previous recipient list on the infected machine.
In other words, while the 'friends network' will turn the EXISTING spamming procedures against them, then spammers will then turn the anti-spam software against itself, by turning the 'friends network' into a 'spamming network'.
So... nice work, needs thought.
This sounds like the whole "Friends and Family" network from AT&T a few years ago, and now Verizon's "In" network thing, but with email and exclusive instead of "Free calls to friends on 'the list'".
Pretty soon, you will have to send an MD5 hash of your DNA from a static IP address that is reversible and supply 5 refrences all in a PGP encrypted letter, along with a copy of your passport and birth certificate.
When it's more work to block spam than stop it, you have to ask what is going wrong. Maybe if we somehow figured out wonderful technologies to *stop* spammers instead of blocking them, we'd be getting towards the ultimate goal. This is much like throwing money at a problem to bandage it, not fix it. The solution, however, also has to be easier for end users, who are doing nothing wrong. Why is every solution harder for end users, but just a 'bump in the road' for spammers? Am I missing something?
I would like to share in all humility my own method of spam filtering:
;-)
I use a super-extra-secret e-mail that I give only to my friends.
Have you Meta Meta Moderated lately?
Member of the Stop Fucking Saying 'M$' army
Right, from now on, it's "micros~1" for me.
These idiots have forgotten the basic rule of dealing with spammers (and other mail miscreants) which is:
They lie in the HELO, they lie in the MAIL FROM:, in the headers, etc. etc. etc.Any method that depends on this kind of data is doomed to a quick failure in the real world.
Does bayesian filtering do this to a certain degree, and more so? Ie. Checking for common tokens, including what's in the To and From fields?
As we're all well too aware, the spammers will find a way to counter this. Keeping in mind they don't care one whit about how many messages they send they'll probably just starting sending out their spam more -- once to use every address they're sending to as a from address. Sure this filter will only let that one through, but the amount of spam E-mail will jump exponentially.
I think this will be questionably usefull for a lot of people. For instance, I get "mass emails" from friends/relatives, but I usually don't know half the people on the list - they're 2nd cousins thrice removed from my friend, I may have seen them once at my friend's wedding, but I don't know them from Adam.
Also, as people have already mentioned, spoofed addresses (eg, from viruses reading address books and then sending them back to spammers) may render this useless quickly.
Bayesian filtering (also some static rules) seems to catch 90%+ of my spam currently (via SAproxy). Having the software "learn" the spam vs. the ham seems to work very well for me and others I've talked to.
Though I'm no fan of Microsoft or Bill Gates, the solution proposed by them - one where a complicated math calculation is required for every mail they send - is on the right track because at least, in theory, it becomes expensive to send mail and therefore spammers are at a disadvantage. If this is to be a really workable solution, only time will tell - and given the MS tradition of hype ... who knows.
Schemes that make it expensive for the handlers (networks, ISPs) or the recipients, are not the way to go. After reading the article, it seems that this is just another one of those.
I've been swashdotted -- Elmer Fudd
for (i = 0, i++, inboxMessages.count - 1){ if (i mod 2) = 0{ deleteEmail() } }
The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category
That has to be one of the most ridiculous statements I've heard in a while. That's like saying I've got a great new burglar alarm system. Now, it only works about half of the time, but when it does work it catches the crook with a 100% success rate!
Who's buying?
In fact, this has provided me with a kind of "honeypot", since I now check for the addresses of several people who are long gone from my site. If I see their address its gotta be spam!
- Dave
This may be a reasonablesolution to the drive-by spaming that occurs onlivejournal.you can easily create a web-o-trust given the closedfriendly nature of the 'friends' networks.
According to the article, it can make a decision on 53% of the total e-mail, and divide it up into Spam or non-Spam with complete accuracy. The key is that it makes no judgement on the rest of the e-mail.
So you could throw this as a rule into SpamAssassin with a 100 weight on Spam results and a -100 weight on non-Spam results. That could only help your filtering. With zero false-positives.
It only works on 50%, but it claims *no false positives* on that 50%. That means that that 50% can be deleted immediately; no-one has to check in case there is a false positive. By contrast, Bayesean filters *will* produce the occasional false positive, so you have to trawl through your spam folder occasionally to check against this. If I could reduce my spam folder checking from 200 mails a day to 100, I'd be very happy.
I'd like to see more about the mechanism for coming up with the score.
Also, others have mentioned the spoofing problem with To:, From: and CC:. It would be interesting to see how well it would work with the "social network" consisting of the mail servers sending the mail, or with a combination of IP address and To:-etc information.
People who disagree with you are not automatically evil, greedy, or stupid.
The Spam Gene is actually a regressive gene, not likely it appeared in the parents or ofspring. It's affect is similar to fouling the nest or pissing on food before eating.
A feeling of having made the same mistake before: Deja Foobar
Many people need to receive email from people they've never met, like prospective customers.
How did this get in to Nature? There are far better anti-spam tools like spamassassin & popfile that are far more effective against spam than this technique.
Used to be that one of the cool things about the net was that you would get email from total strangers... "Hi, I'm from {some far away place}. I saw your {Usenet post|web page|profile on some bulletin board site} and really liked your ideas about {something}. I've also been experimenting with {something} and I have some ideas about {whatever}..."
Now, if we only have emails from our (already existing) friends or friends of friends, then how will we ever meet anybody new?
- In Capitalist America, law violates YOU!
The actual paper that describes this technique can be found here
Although interesting, the system would seem to need access to a centralized database of senders and recipients (visibility only a large segment of email traffic). If the system does not have enough records on each sender's other e-mails' it cannot construct an estimate of the social network.
The scheme might work for people inside a very big network, like AOL. The system would easily notice that one address (either inside or ouside AOL) has sent inordinate numbers of emails to AOL addresses without prior traffic from those AOL addresses to that spammer/sender.
Bottomline - this is a ISP level solution and wil never be usueful to individuals or small businesses (unless they sell subcriptions to the blacklist).
Two wrongs don't make a right, but three lefts do.
The remaining half of the e-mail then has to be filtered in a more sophisticated way. But by then the scale of the problem has been cut in half.
Solving "half" of the problem is pretty useless. Spammers -- assuming this technology is ever be widely adopted -- wouldn't be long to find a way to get their messages in the unfiltered heap. The only ones to suffer damage will be the legit email senders.
Says the Cat, "Instead of counting all the stars in the sky, you could just count half of them and multiply the number by two. You just halved the problem there."
Email addresses aren't as strongly fixed as say, a mailing address (and even those change). Your friend may get a new address and neglect to inform everyone. Or, he may email you from his new address and it doesn't fall in the 50% because it is unknown.
Another possibility is that somebody new contacts you that doesn't know your friends, or somebody whom you haven't talked to in a long time. I have some friends that I am in/out of contact with for year periods. What if somebody pulled your email off a business card... you'd want them to be able to contact that email. This is where whitelists are a pain, and thankfully I have a website where if I ever do implement one, I can put a responder that says "your address is unknown, please send initial email using page X on my website" for initial confirmation.
The Bayesian rule is just a mechanism for combining multiple independent estimates into an overall estimate.
This is clearly an independent estimate, and a good mechanism to improve the overall detection probability.
What we need is a "meta-Bayesian" process that appropriately weights and combines other spam prediction estimates, not just word counts.
People who disagree with you are not automatically evil, greedy, or stupid.
So it works 100% of the time in 50% of the cases? There is only a 25% chance that I would be interested in something like this.
What in the crap? Are you to tell me U of Cali actually thinks this is University level material? "Block spam by checking if you've recieved email from sender before"? And the award for MOST BLATANT TECHNOLOGY GOES TO:::!!!
Mod +5 Drunk
The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category.
We don't need a Band-Aid. We need a real solution. This may be an interesting solution, but honestly, its not acceptable. I really believe buddy lists is probably the way to go (i.e. white lists). At least for email going directly into your inbox, they should be approved senders or friends of approved senders. When we get a solution that can block 99.9% of all spam and can catch up with new exploits as they come up, then I'll be impressed. Everything else is just mental masturbation.
Quack, quack.
From what I can make out, this system graphs correspondent pairs into correspondence maps, and notes that while normal people all email each other and thus have dispersed graphs, (high clustering coefficient) spammers have a distinct pattern, e.g. 1 person emailing a few million others (low clustering coefficient). There are figures in the article that make this point well.
The system would be ideal for implementation at a fairly high level, (e.g. the ISP level) where systems can aggregate email headers across many different users in order to come up with meaningful graphs. The advantage it claims of no false positives means that it would be feasible at this level.
I'm impressed; it looks like a very clever idea. My only question concerns how this would deal with mailing lists, which must appear to it like spam?
Their implementation sends all email to /dev/null. It works great on the half you don't want to see.
Try the link at the bottom of the page:h tml
Sniffing stools speeds diarrhoea diagnosis
19 February 2004
http://www.nature.com/nsu/040216/040216-13.
The average system:
1) Accept all email
2) Filter
3) Hope an important email isn't filtered improperly. If it its, go digging through the trash/junk folder looking for it.
4) Read
A better option:
1) Deny all accept from the addresses and/or domains I have specifically chosen to accept.
2) Read
Wouldn't that make more sense?
Left 4 Dead Gaming Group - http://www.l4dgg.com
After reading much of the debates here and elsewhere on spam, I think it all comes down to ignorance and stupidity.
/rant
I mean, you can make something illegal and provide for harsh legal punishment for any activity, and some moron will still find a way to do it.
Just look at pyramid schemes. You'd have to have been living under a rock to not know these things 1) dont work and 2) are illegal. Yet...
I think that spam is here to stay, just based on the fact that it's impossible to eliminate ignorance and stupidity.
Hell, in some States, despite having the death penalty, murders still happen.
>The e-mail clusters can be mapped out by
:
>inspecting the 'from', 'to' and 'cc' fields in a
>user's inbox. An automated system can quickly
>build up a blacklist of spammers, as well as a
>'whitelist' of approved sources.
hmm but the sender can decide whatever he wants to put on those fields...
in business critical mails
1. blacklisting is never a good idea.
2. whitelisting is only sometimes a good idea
i am not giving any better solutions here, but this is not the ultimate one what we are waiting.
class he-man extends man!
This idea could be helpful in a [relatively] closed social environment, but would be disastrous in a business environment where a fair percentage of incoming mail might actually be from strangers. We call these strangers "potential customers"...
I want to drag this out as long as possible. Bring me my protractor.
If your time is "worthless".
(And your bandwith, and your storage, and your CPU...)
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
While this may work for teenagers, it has no use in the business world. In the last week, I've gotten two dozen vital emails from people I did not previously know (professors at various grad programs). In that period, I haven't gotten a single message from people I know (or who know someone I know), because I have conversations with friends them face-to-face, over the phone, or through instant messages. This sort of filtering just removes the most important reason for the existence of email, which is replacing snail-mail, not replacing conversations.
G
I never thought that Slashdot would help me find papers relevant to my research!
I think that their idea is good from a technical point of view, but very bad from a privacy point of view. I am of the opinion that gathering social network information is extremely dangerous. A pertinent example: If your friend is branded a "terrorist," then "they" can exploit the information that you have voluntarily provided to then put you on a "terrorist" watch list.
Another example: Say that someone who knows someone that you know actually buys something from a spam. If the spammer can access the social network information, suddenly your little niche of the network is going to be aggressively spammed. After all, like minds congregate.
There is no doubt in my mind that the black hatters will infiltrate the social network communities and use that information to spy on potential viewers. See this bugzilla thread where the folks from Atriks Professional Email Deployment Service follow SpamAssassin's development and adapt their "ratware" tool accordingly.
The biggest problem with collecting social networks is that once the data has been gathered, it is very hard to control. Those of you using Orkut should think long and hard about it.
In conclusion, I think that this is technically a good idea but it opens a Pandora's box.
If you get a message from Bob that was also CC'ed to Alice, then it knows that you, Alice, and Bob a cluster and are likely to be friends. Emails from Alice would be whitelisted because of this.
To work, this means that your friends have to know each other and send out group emails using the CC field. You can see why it would only be able to whitelist about 50% of your email.
It also appears that spammers could fool this by adding another of their addresses to the CC field, sending you spam, and then sending you spam from the other address. At that point, the other address would be whitelisted. Although it may work now, once this starts to be widely used, spammers will find ways to pollute it.
Simply : untrue. It's as easy to fake the envelope sender as it is the From: header. I think you're getting confused with "Received" headers, where each mail system inserts its own bit of tracking information. The envelope-sender is completely under the control of the sender, and (usually) propagates un-modified as an email is handed between systems (indeed, one of the criticisms of SPF is that by modifying the envelope sender you break forwarding).
My next sig will be ready soon, but subscribers can beat the rush
We provide an automated graph theoretic method for identifying individual users' trusted networks of friends in cyberspace. We routinely use our social networks to judge the trustworthiness of outsiders, i.e., to decide where to buy our next car, or to find a good mechanic for it. In this work, we show that an email user may similarly use his email network, constructed solely from sender and recipient information available in the email headers, to distinguish between unsolicited commercial emails, commonly called "spam", and emails associated with his circles of friends. We exploit the properties of social networks to construct an automated anti-spam tool which processes an individual user's personal email network to simultaneously identify the user's core trusted networks of friends, as well as subnetworks generated by spams. In our empirical studies of individual mail boxes, our algorithm classified approximately 53% of all emails as spam or non-spam, with 100% accuracy. Some of the emails are left unclassified by this network analysis tool. However, one can exploit two of the following useful features. First, it requires no user intervention or supervised training; second, it results in no false negatives i.e., spam being misclassified as non-spam, or vice versa. We demonstrate that these two features suggest that our algorithm may be used as a platform for a comprehensive solution to the spam problem when used in concert with more sophisticated, but more cumbersome, content-based filters.
All this work to stop spam, and ICQ's done it for years.
Frankly, a series of filters is probably the worst approach at stopping SPAM. It's a game of "make the filter, defeat the filter, and risk not getting important mail." Why bother? The solution lies in a different approach. Authorization. There needs to be authorization layers in order to defeat spam. We need buddy lists, we need blacklists, we need the ability to request authorization, etc.
I realize that fixing this problem isn't a simple one given the scale in which it's used. But man, I really wish somebody'd figure out how to do the transitory work. I'm almost completely reliant on ICQ and Private Messaging on forums in order to keep up with everybody.
"Derp de derp."
http://www.arxiv.org/abs/cond-mat/0402143
Geez, that article was written very poorly.
This was the link at the bottom of the article
giving much more technical detail.
I will admit, on my first read, i did not quite understand it. Hopefully after reading some informative posts it will clear it up.
and while we are discussing spam, i would like to mention spamhaus.org is a very shoddy blacklist. Their policies are anti business. I would recommend not using them anymore.
But there was spam on BBSs. Any number of times people would attempt to post/mail some kind of Make Money Fast scam on my BBS. The lameness filter would almost always catch it, but they kept trying.
One line blog. I hear that they're called Twitters now.
Not only is this well known and widely implemented - for example Apple mail's junk filter automatically accepts messages from people in your address book - but spammers are already countering it. They regularly forge the source address of messages based on the sending domain. But they can do better. The viruses they're already using to relay spam have access to the information about the "network of friends" in people's address books.
We can already do much much better than this. If you're prepared to lose a little mail from people who couldn't be bothered jumping through a minimal hoop, token-based spam blocking (such as challenge-response, signed messages, or the token RISKS posts require) can give you near 100% protection with minimal cost.
- In the category, classified as in the category (correct)
- In the category, classified as being not in the category (error)
- Not in the category, classified as being in the category (error)
- Not in the category, classified as being not in the category (correct)
So, there are two types of 'correct' situations, and there are two types of 'incorrect' situations.Depending on what e-mail is being used for, it may be acceptable to one person to lose an e-mail message, if it means that there are 100 spam messages they never see. For others, it may be 1:10. For others, there may be no acceptable level of lost mail that justifies spam filtering.
They claim that they are able to do that last item -- they can correctly identify spam, with absolutely no false positives. Of course, I have no idea how well this works on the 'remember me from high school?' type messages [I've only gotten 4 of them in 10 years].
As for the viruses -- viruses tend to vary much less than spam messages, and are much, much easier to block, and to prevent false positives on. Although you might get some virused messages at first, once the definition is updated, they do not trash good messages. [when they're done correctly... I think there was some bad virus definition a few years back that triggered on 'p' being in the body, or something stupid like that]
Spam is a much more subjective thing, which is why it hasn't yet been eliminated. [and yes, I suspect that all that will happen from this is that spammers will write viruses to mail your addressbook to them, so that they can write better spam. Or modify a virus, so that for each mail you send, you also send a spam to your recipient].
Build it, and they will come^Hplain.
I've been thinking about this method for a while - basically, you configure your SMTP server to do this:
This idea is cleary too simple to have not been thought of before - but I have yet to find a good explanation as to why it won't work. Verizon.net uses this exact method - try sending a SMTP message from a host that isn't listed in your domain's MX records, you get a 550 Sorry, you aren't allowed to mail for this domain". or something comparable. How come this method isn't more widely used? Going through my own SMTP server logs show that the vast majority of SMTP servers sending legit mail are also listed in the domain's MX records. The only price is that you require the sender and receiver to be the same within a domain - hardly an unreasonable requirement.
to deal with open relays in China...
I would ve harvested the emails of as many members of the ruling communist party as possible, and used those relays to spam them with anti-communist propaganda. I believe the consequences would've been swift and ruthless.
Unfortunately I cant read/write Chinese, and this idea wouldnt work in less repressive regimes...
50% of spam stopped sounds good, but what if 50% is 350 Billion email messages? Spammers only have to double their messages to go around this 'filter' to produce the same volume tomorrow as they produce today.
What I would like to see is a spam signature sharing, Spam Detection Servers SDS would collect hash per spam email sent within a time period. An email will have to be stopped on any email server and verified against an SDS to see if it is not spam before sending it further. How would these SDSs collect the signatures? Feedback from email users, black lists, good filters etc. All email servers will have to register with SDSs, or they become black listed.
But you probably can tell me why this is not going to work, can you?
You can't handle the truth.
web of trust + web of familiarity via correspondence?
members are seeing something, your seeing an ad
Wouldn't this have to make my address book public in order to work? The trend recently (among receivers of email, at least) has been to hide email addresses, not publish them and annotate them with personal information.
Not necessarily, indeed most professional ones avoid this. While many spams do contain multiple people in the To: field (but also many don't). One way or the other, I don't think this is relevant if we are trying to compare the graph of a mailing list to that of a spammer. To take an example, user slashdot-headlines@newsletters.osdn.com sends thousands of emails to people *who don't know each other*. User enlargeyourdong@hotmail.com has exactly the same pattern. How do you tell these apart?
These people don't seem to realize how SMTP works. The RCPT command doesn't distinguish between types of recipients, it's up to the sending process to "play nice" and put that information in properly created headers.
A spammer could manipulate the To and CC headers as necessary to fool filters that analyze them, without affecting the ACTUAL list of email addresses to which the email is sent.
I don't think spam can be stopped without replacing or overhauling SMTP, and then ceasing to support "old" SMTP. But that ain't gonna happen anytime soon. (sigh)
assert(birth_date<time-86400)
You all know what I mean. Your idiot(friend, parent, coworker,or spouse) that can't help but send you mail you don't want but you are socially not in a position to refuse will get through because they also send you legit mail you DO want/need to read. POPfile sorts this out for me very well. Some how I doubt this system would do as well.
Bud do his daughters suck?!!
[nt]
----
In post-9/11 America, the CIA interrogates YOU!
Spam is so annoying at this point that I'm going to start doing something similar to the approach used in the article and start using my saved messages and contacts in Evolution as a whitelist and just trashing all other mail.
As way to get on my whitelist, what I would love to have is a standard for sending unsolicited 'calling card' messages to me. For example, a calling card message could simply be a normal e-mail with an empty body and a restricted subject line (to, say, 64 characters). Assuming I could at least ping the address from which the calling card (nominally) came, the callling card would be added to my Calling Card Inbox.
That way, if a person that I've never interacted with before wants to correspond via e-mail, they would send a calling card message to me. If I want to accept e-mail from them, I could include them to my white list and let them with a (standard) response to their calling card.
This isn't really all that different from the handshaking that mailing lists perform when you first subscribe to them.
So, anyone else interested in a calling card standard?
You all know what I mean. Your idiot(friend, parent, coworker, or spouse) that can't help but send you mail you don't want but you are socially not in a position to refuse will get through because they also send you legit mail you DO want/need to read. POPfile sorts this out for me very well. Some how I doubt this system would do as well.
Note this is a double post. In a moment of stupidity I posted it AC. Someone please mod it down redundant.
Slashdot, home of supporters of free software, free music, and free speech.Except for Moderators that disagree with you.
There are other existing antispam solutions that automatically filter out 80-95% of spam with very low false positives.
;).
Heck even the static filters I set up on my email program do far better than 50% (though probably not as good as spamassassin or stuff like that). Not telling you how I do it tho
The proposed anti-spam clustering technique is of course a variation on whitelisting. While clever, it fails to address a problem I have not often seen addressed. Many people defend themselves from spam by obscuring their e-mail addresses in public places, and perhaps by using whitelists to prefer known senders. This may be effective for many people.
However, some of us can't avoid having a publically available e-mail address. For example, writers such as myself rely on feedback from readers who are, in nearly all cases, strangers (and sometimes strange, but that's another story...) Avoiding false positives from strangers is very important to me. I want their messages. But, since my e-mail address is published frequently (hence no reason to hide it here), I obviously receive a ton of spam.
For the past few months I have experimented with a plug-in called BayesIt! for the Windows email reader The Bat!. As the name implies, it's a bayesian filter. The nice thing about BayesIt is that I could point it to my already-stuffed spam folder and train it on thousands of messages in one go. So far it has worked out rather well. No false positives, and only about 10-20 false negatives per day (out of approx. 400 spams).
Still, in the long run I support proposals that shift the economics of e-mail in ways that have minimal impact on human beings while making spam unprofitable. Changing the economic model of spam is the only sure solution; relying solely on technology will simply keep us locked in an ongoing arms race.
-Aaron
Most mailinglists and newsletters are one way - I'm not talking about discussion lists or listservs, but rather about the bot that sends me Slashdot headlines, Jakob Nielsens' Alertbox, Fred Langa's newsletter, and even commercial speech that I am signed up to and want to hear such as Komplett's weekly offers, or Ryanair's cheap flights, etc.
It'd still be bayesian, except that word frequencies and graph connectivity of sender would _both_ be considered for additional spam probability. I don't have a filter to check, but don't most Bayesian classifiers also include other metrics besides top 20 word frequency, like length or presence of attachments, etc.?
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category.
... half of the time, anyway.
This spam filter has 100% accuracy!
I know that THEY might know the list of people I email to, but creating a centralised structure?! I think I prefere spam.
I am just picturing some pimple faced hacker, rubbing his hands: so, finnally, Jane and John got to know each other... let's spy some more.
Or, maybe, I am just paranoid. What do you think?
Bite my shiny metal... oops... Nevermind!
This property is guarded by Smith and Wesson 3 nights a week.
You guess which three
It is not a perfect solution, but is sure cuts out a ton of crap from making it into my server.
I also came up with a great way to correctly sort 50% of email into spam/non-spam catagories.
if random_number >= 0.50 it's spam.
else
it's not spam.
The libertarian solution to the failures of capitalism is to apply more capitalism til the failures are fixed.
A system I have been using for some years now beats any approach I have seen, whether it be Bayesian, blacklisting or whatever. As soon as I get spam on an email address I terminate that address, create a new one and inform all relevant parties, explaining that my address has been compromised. People understand. It works. Since November I have not received one single spam message and I get at least ten emails per day.
It works. Period. Say goodbye to spamheads forever. The occasional one that gets by can be quickly reported and blocked PERMANENTLY. The $30/yr is worth every single penny, IMHO. My spam count per Year went from thousands to low forties, now at zero. It's a godsend.
"Why is spam so bad when its done via email but when you cut down a tree and print it out, its okay? You can just delete an email in 1 second, but mail just ends up the floor causing pollution." -jekz
Isn't this scheme the perfect use for the wide-ranging social network information being collected by Plaxo?
It makes sense - they certainly haven't annouced a revenue stream yet, and "keeping your address book up-to-date," even in a wireless and multiplatform world just doesn't seem like a big enough idea to justify the huge amounts of data collected.
So is that the annoucement that's coming from Plaxo, the unveiling of a broad Spam solution that used 'degrees of separation' data from your address book and the address books of your friends to implement a spam filtering solution?
If I may say, it does seem like the killer app for their unique data set.
-------
Believe me, I'm as surprised by my comment as you are.
I send you and your sister a spam. While both of you are getting the spam, to both of you I am an unknown and therefore the system would flag me. ONLY if I send the spam to you while pretending to be your sister would the system break. I would need to know both your email and the email of someone you know. This would not be impossible to harvest with virusses stealing addressbooks but is not what is currently happening. Currently email address lists used by spammers are very simple flat text files. Of course nothing complex would be needed. Simply a similar text file but now with two emails per line. The first the recipient, the second the person to forge as the sender. Simple but more work.
So it looks like a pretty clever idea. Especially for work place email where most mail is by people you know and very little email from outside usually arrives. And even when it is done it is usually from a known domain namely a client or supplier.
Will it work? Who knows. Gotta be worth a try. Unless you want to wait for Bill Gates to fix it. We all know how well the security problems in windows were fixed eh?
There is not going to be a magic bullet that fixes spam. We will just have to use a lot of ordinary lead ones. Don't worry Bush says they are safe.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
"The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category"
Translation: For the half that it works on, it works.
So basicaly, it works when it works - for some reason I'm not impressed.
FOAF is an open XML/RDF standard for describing these social networks, it seems like that would be a good way to implement this. Plus, since it uses SHA1 sums of email addresses it would be possible to check addresses without giving them up to spammers.
A lot of sites like Tribe.net and my own project SongBuddy are working on integrating FOAF into the site, so that you won't have to worry about the mechanics of it unless you want to. Seems like an easy way to build these kind of white lists.
Why can't I moderate something "Wrong" or at least "Grossly Misinformed"?
Next thing I know all my email is going to have a reply-to: Kevin Bacon.
If you don't have any friends, then every e-mail that you send will come up as a false-positive, and you'll be blacklisted forever. I wrote up a good e-mail to SomethingAwful, too, only to have it returned with a spam score of 1.2!
I would hope that once you root out all the cases where it doesn't work, that all you have left is cases where it does work.
Fnord.
Note to moderators: I really try to keep things positive, but these guys should know better. Not a troll here, just felt this needed to be said.
I always suspected this was the case, but now I have hard evidence. Nature is pimped out to private interests. It used to be a voice of the scientific community.
Spam isn't science, people. It might be network warfare, but that makes it more about power than it is about knowledge. So now Nature is just another magazine.
Note that the only give one footnote, and it is a self-serving link to a preprint. "Boykin, P. O. & Roychowdhury, V. Personal email networks: an effective anti-spam tool. Preprint, http://www.arxiv.org/abs/cond-mat/0402143, (2004)."
There are real scientific journals out there. With tricks like this, Nature apparently isn't among them.
http://tinyurl.com/4ny52
The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category.
Am I the only one who read this sentence and said "huh??"
I thought sorting into the right category was THE determining factor of whether a spam filter works.
Of course 100% of the times it worked it sorted into the right category. The only stat that is important is that 50% of the total messages DIDN'T sort into the right category (the 50% that didn't work)
There are three ways one can beat the filter.
The first is trivial and certain to succeed but has a Drawback to spammers: only send e-mail to single recpients. The drawback is this puts a much higher load on their servers since every message is sent individually.
The second method is to always include dummy addresses in the mailing list that the recpients probably have in their address books. For example, add the following names to the to-field: notifications@paypal.com and list-notication@ebay.com.
Any recpieint that of the spam message that also has recieved e-mail from e-bay or pay-pal will trust the message.
One can do even better by planning ahead when harvesting e-mails. For example, if you harvest a set of e-mails from a pqarticular bulliten board you can make note of message cliques at the time of harvesting, and send messages in the same groupings. for good measure you also send the addresses of the buliten board admins as well.
Third, all the spammer really has to do is to know is one recipient you have gotten messages from. Thus either buy mailing lists from legitimate companies people actually do bussniess with. Or create your own loss-leader messages. For example, send out some political action alert or anything that has some vlaue or use to most people, maybe a lottery drawing for a prize, or a discount subsciption to time magazine, so they will accpet the message. the sender does not have to be the same as your spammer address. Now you know someone in the adress book of the victim. Now you spam the crap out of them while including the trojan address in the to: field.
Some drink at the fountain of knowledge. Others just gargle.
TMDA certainly isn't for everyone. By sending out challenges to unknown senders, you shift the burden from yourself to the joe-job victim. Nice for you, not so nice for the poor people on the receiving ends - each spam you receive generates a spam for the joe-job victim. Not even that nice for you either, since for every spam you receive, you double the bandwidth it has consumed by generating an outgoing message. If everybody did use TMDA, our systems would all be clogged under the flurry of challenges. TMDA has its place, but that place is not as a general purpose spam-reduction technique.
My next sig will be ready soon, but subscribers can beat the rush
I would send my spam with spoofed addresses taken from the same pool of victim addresses who will receive my crap. This way, I will be generating a faked social network, surpasing this algorithm...
You mean that someone has come up with a solution for Spam, while the rest of us smart people were thinking really hard about the problem for the past 5 years and could not come up with the silver bullet? Let's see...
Fatal flaw #1: With spam, you can't trust the From: header, and frequently the To: header either.
Fatal flaw #2: A "blacklist of spammers?" That's a hoot! How much disk have you got? I have an equivalent idea: since all e-mail addresses are forged, why not skip the inspection of the user's inbox, and just conjure up random combinations from /usr/share/dict/words? That would be just as effective.
- Garbage in
- apply algorithm
- Garbage out
- ???
- Profit!
(Always wanted to do one of those:)Hey, Windows users, there is no such thing as "forward" slash, there is only slash and backslash.
But it has to be faked with the correct information for a particular recipient. You can't just put some random name there and get by this filter. Spammer has to know that Mary knows Tom. If Mary gets email from Jim, whom she doesn't know, the email is flagged. It's the pairing of From and To headers that matters, not the individual entities.
suppose a spammer harvests from a social network site and spoofs their source address to be from harvested addresses... it's pretty likely 2 people on the same social network site will be within eachother's threshhold if only the to/from/cc headers are used...
maybe more sophistocated techniques to include the source IP subnet or something? Some sender verification would be required.
It's an extension of whitelist mechanism with some graph theory included. ( and quite too much theory for something so simple...)
As stated in the article:
It sounds like BCC: and there is nothing special about it. It's quite widely used. Most of my spam comes without CC: or multi recipients.
It would certainly be very interesting for ISP, because they can track many emails at a time. The main issue would be the size of the graph induced (less CPU intensive -> more MEM intensive) and privacy.
But it's definitely not for individuals.
I keep this for generating examples for bayesian analysis, that's all:
Also nice graphs... perhaps to big at the end...
You give the mailing list a special email that always goes through, or you just whitelist (i.e. add them to your "buddy" list before they send you email) the mailing list. Whitelisting is better but doesn't work if your list sends from people you've never seen previously. If the list always sends from itself (i.e. listname@listserv.com), then a whitelist is the way to go.
TMDA also supports throwaway email addresses that you can use to register at a site that sends an email confirmation. The email address will stop working after a while and the site can't spam you. Think real.com for an example of why this is necessary. You can also get throwaway email addresses from spamgourmet.com without TMDA.
I've been doing this for over a year, commercially, and it works very well. Funny, when I implemented it it seemed so obvious to me I didn't want to make a big deal about it, just added it to the service description and the options.
It's not enough by itself but it makes a huge difference. Guess the cat's out of the bag now, eh?
(Sorry, AC post, as my poor little server couldn't handle a slashdotting right now. I'm still just a small anti-spam biz, though an effective one.)
I can understand how in the beginning of the internet people would be suckeder into all these sorts of spam deal but now.. it's pretty wildly known and like 80% of spam hits the trash without anyone seeing it. The rest, well it's not even opened or looked at .. right into the trash too.
So all these spammers are sending more and more messages, costing more and more, but getting less response from them. Wouldn't they figure that the game is up and just go back to the cesspool they climbed out of ??
I had the idea of doing a mail server auth service, sort of like DNS but you have to pay to register your mail server, 100% spam free network and mail servers registered only receive emails from other who are registered... this would allow anyone who is legit to register cable modem servers and that sort of thing but yet keep spammers out.. soon as one system gets compromised it gets removed from the list, and no other server will talk to it... closed network system..
who wants to fund this project??
-b
I have my own domain, and run my own mail server for personal email. The ONE thing that I have done to reduce incoming spam drastically(i.e. I only get 5% as much now), is to refuse incoming connections to the mail server from any machine that does not have a valid rDNS value. I may miss email from someone, but, they'll have gotten a(n) (somewhat) informative message telling them why their email did not succeed. They can either complain to their ISP and get their rDNS fixed (like I did :-) or call me/send me a letter.
Tom.
I have 78 filters in KMail and I have to deal w/ at most 10 pieces of spam a week not getting caught... and about 40-80 a day going straight to the trash!!!
/. brothers.
My method is fairly simple, I put everyone I regularly communicate with and some keywords that suggest something is not spam or may be important on a filter to various folders sorting out people into groups (coworkers, undergrad classmates, med school class mates, family, documents, other)
these filters come first so that if any of these emails contain a word blocked by further filters such as marketing, promotions, values, singles, etc... they will still get through. I then also have a filter that checks for my name in the header (which it would if it was a reply.
The last set of filters are for blocked words in the header such as the aboce mentioned marketing, promotions... but some words are blocked from the complete message
It took a little time to get the system of filters in place, but believe me, emptying the trash once a day sure beats having to go through all the fluff.
and it takes care of people on mailing lists too, cuz they can just make a positive filter for it.
Just my 2 cents. I'm sure I am not the first to do this as this is the reason such filtering schemes were invented... but w/ all this talk of anti-spam stuffs... I just wanted to remind people that its right at your finger tip for free.
Since I have stuck w/ KMail for some time I don't know any other mail software 'cept Pine... so your mileage may very with your client.
peace my
As I see it, the biggest problem is that of verifying the sender. This sounds easy for a corporate relay where you can validate users from the internal network.
But what about my case? I own a domain. It does nothing but my own e-mail. Sadly, that address was available on the internet for longer than spam has been a real problem, so it gets hit hard. But the point is, the server is a linux box attached to my cable modem. I can't relay out through it. To cut back on spam, my ISP blocks SMTP out. I have to relay through their smtp server. So they have to allow me to send from any number of e-mails. Granted, some places use a login/password to tie that to a specific cable modem account, but even with that, there is no way for them to verify the validity of the address I supply.
I'm not a crypto expert, but the only way I can see it working is if that relay server can compare some key I provide with a key that it gets from the dns record for my domain. But the real trick will be making certain the key I provide can't just be copied and used again. Maybe if it is linked with the timestamp?
It's not an easy problem. And, all the SMTP servers need to agree on a standard to make it work.
Got Apathy?
The error that you are getting is telling you that you are trying to relay through a mail server, i.e. that the To email address is not associated with that mail server and that you haven't met its standards to send from the From email (in your case, it sounds like it requires you to have appropriate DNS records for the sending IP; it could also use SMTP authentication as well--same concept). All correctly configured mail servers will do this in some manner. In fact, one of the spam blocking techniques is to set the server to reject email from any server that is on one of the lists as an "open relay" (meaning that it is not properly configured to reject unproven senders to outside domains). You won't get that error if you try to send from an outside domain to one that Verizon manages.
A more common method is to check for a PTR record for the IP that is sending you the email. If it doesn't have a PTR record, then your mail server rejects the mail. Checking for an MX record is overly restrictive and will blacklist many large organizations.
There is also a method called SPF that actually does allow organizations to "whitelist" their mail servers as appropriate senders for their domain. I just found out about it today, but I have my host looking into adding the appropriate DNS entry for me. The great part about it is that it is a whitelist method at the domain level, i.e. it makes individual domains responsible for authenticating their mail sending servers. Combined with a blacklist of open relays, this allows you to at least apportion blame. If spam is sent, then that domain can fix it, because it is caused by a failure in their authentication system.
and the other 50% gets through...
nice work guys!
"Just Smile and Nod." --Huck
If your Bayesian filter is intelligently tokenizing, thenit shoudl be able to see the 'CC' header. WIth proper training, 'ham' e-mails which happen to be CC'ed to a lot of friends or co-workers will actually start having those e-mail addresses tokenized as being pro-spam and that should contribute to their 'haminess'.
And, like I think I understood in the article, those e-mails without a lot of CC's won't have that extra 'haminess', but you can't guarantee that they're 'spam' -- so that half will have to rely on non-social-network properties to determine its 'haminess' or 'spaminess'.
I get paid for my time, better me getting paid than some software megacorp.
But the idea of social clamps is not dead, it just must be implemented ddifferently. How? We already hundreds times discussed it here:
Every message must be signed with the key deployed to some reliable key/CA server.
Where to get such reliable CA server? Easy! The answer is actually in the article. IMHO community of email users should sign certificates of each others. And exchange such trust tickets.
So, if I've got email that signed by the key that is trusted by other friends of the same community - I accept to read it. If it signed with the key that I don't have any trust information - it should be marked as "Untrusted" and wait my free time in some low-priority mailbox. If it's signed with the key I trust my self - accept it immidiately. If it's signed with the key I revoked - reject it. If it's unsigned - autorespond with an advise to sign it.
Less is more !
"The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category."
Yes, indeed, getting it right in every one of half of all cases is quite an advance over getting it wrong in every one of the remainder.
ZZ
How long till the spammers come up with a way of infiltrating Orkut, and inviting random people to be their friends?
Hello folks,
:-)
I've read the full article at Arxiv.org and it sounds promising. But I see no code... and the algorithm description looks much too complicated for me to bother trying to implement it just to see how well it works.
It would be really nice at this stage to have some working code to throw some real messages on.
I've scoured the author's personal pages (which seem to be here and here)
but can't find anything there either...
Hello Misters Boykin & Roychowdhury, what about some working code?
And please don't forget: I may be lazy, but you are ugly and I can always try working harder...
as subject says...