Slashdot Mirror


New Method of Spam Filtering

Alephcat writes "A simple and easily implemented scheme for combating e-mail spam has been devised by two researchers in the United States. P. Oscar Boykin and Vwani Roychowdhury of the University of California, Los Angeles use their method to exploit the structure of social networks to quickly determine whether a given message comes from a friend or a spammer. The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category. The article was published on Nature magazines website earlier today."

326 comments

  1. My favorite filter by krog · · Score: 2, Funny

    >/dev/null

    1. Re:My favorite filter by catdevnull · · Score: 3, Insightful

      my namesake! spam assassin on our mail servers helps bunches. x-headers that we add are so easy to filter. gets about 99% of the spam. your milage may vary.

      --

      I might know what I'm talkin' about, but then again, this is Slashdot...
    2. Re:My favorite filter by Dimensio · · Score: 1

      Not as effective, IMO. For me, an ideal filter would be one that automatically forwards a complaint to both the ISP that owns the sending IP address and the host for any websites advertised in the junk email, and continues to send complaints until the websites are dead. Until that happens, I'm stuck with doing that manually.

    3. Re:My favorite filter by Anonymous Coward · · Score: 0

      Yeah, like how I simlink the mail file to /dev/null.

    4. Re:My favorite filter by kousik · · Score: 1

      Sigh. I get so much spam daily my /dev/null overflows.

  2. Everytime you filter spam... by Anonymous Coward · · Score: 5, Funny

    You take food away from a spammer and his children. Don't block spam, or else you hate childeren. You don't hate children... do you?

    1. Re:Everytime you filter spam... by etLux · · Score: 1

      Take food away from spammers? Heavens! I certainly would not want to do that! What I'd really like to do is remove all of their external appendages... including the one in their pants.

  3. Vwani Roychowdhury by Anonymous Coward · · Score: 5, Funny

    He was probably sick of people like me mistaking his name for a made up spam "from" line.

    1. Re:Vwani Roychowdhury by kc3lai · · Score: 3, Funny

      you mean "from: Anonymous Coward"?

  4. Interesting by jchawk · · Score: 5, Interesting

    It would be interesting if Google could find away for this idea to work with Orkut.com, since users of this service are typically connected to many other people who are not spammers. :-)

  5. Easily spoofed? by Sam+Ruby · · Score: 5, Insightful

    What's to stop the From:, To:, and Cc: fields from being spoofed (like a lot of viruses do)?

    --
    - Sam Ruby
    1. Re:Easily spoofed? by cavebear42 · · Score: 4, Informative

      as i understand it, they would have to spoof to someone who you know, a virus could easily do that (after it has your address book) but not so much for spam.

    2. Re:Easily spoofed? by Anonymous Coward · · Score: 3, Informative

      The fact that competant mail admins know how to prevent such stupidity from happening.

      Every wonder why worms use their own SMTP engine? Because those of us that are competent have one mail relay that only accepts messages from the internal domain. We prevent the worm's SMTP engine from working by having MX wildcard records to a logging box only for internal DNS - this ensures that any message sent from an internal box that gets out goes through the relay, which authenticates the user.

    3. Re:Easily spoofed? by jhunsake · · Score: 1

      Exactly. Mod this shit up. And it doesn't even have to be a virus, such as spammers will frequently spam from one University address to another.

    4. Re:Easily spoofed? by imbaczek · · Score: 2, Informative

      Viruses are a different kind of spam. They actually come from someone you know (or might know.) Regular spam has those headers forged (and getting those right would rise costs of a single message, which is good.)

    5. Re:Easily spoofed? by SydShamino · · Score: 2, Interesting

      This certainly needs to be combined with a revamped SMTP system (or complete replacement) that enforces DNS-style From: lookups.

      So no, this certainly isn't a solution all by itself. It's the best one I've seen so far that doesn't involve more laws, though.

      Most of the other ideas surrounding DNS lookups are to enforce accurate From: lines. But then the ideas break down, with the best suggestions to be new laws to punish the sender of the spam. With the proposal here today, it can be done with technology instead of waiting for legislation.

      --
      It doesn't hurt to be nice.
    6. Re:Easily spoofed? by grofty · · Score: 1

      The point of the method is to tackle the bulk mailers. In that case, there would still be a high degree of other people in the to: cc: and Bcc: fields not at all related to other mail in the recipient's inbox and other folders. It's the fact that the bulking mailing sends a single message to some many people of such a diverse nature that triggers the sorting. What you describe probably falls into the roughly 50% of all mail received that does not get specifically identified.

    7. Re:Easily spoofed? by Anonymous Coward · · Score: 0

      Spam needs to be stopped at the source, not at the receiving end.

      Start prosecuting. Forged headers equate with mail fraud as far as I'm concerned, put the bastards in jail.

    8. Re:Easily spoofed? by FauxPasIII · · Score: 5, Informative

      There are two 'sender' fields that one is concerned with: The envelope-sender and the From: header. The latter can be spoofed as much as you like. The former cannot be spoofed in most cases, at least the host/domain part (the username can be spoofed if the server uses unauthenticated SMTP, which almost all do).

      A typical message would look like this:

      From spammer@baddomain.com
      From: Your friend <yourfriend@gooddomain.org>
      Subject: Re: your mail

      Buy our crap ! Click below to be removed. Blah blah.


      The first From field is the 'envelope sender' and comes entirely from the servers that have touched the mail. The rest of the fields are just a freeform part of the message, which by convention most (all?) MUA's treat in a special way to add convenient features like having the 'real name' next to your mail address in the visible From: field.

      --
      25% Funny, 25% Insightful, 25% Informative, 25% Troll
    9. Re:Easily spoofed? by DR+SoB · · Score: 3, Interesting

      The issue is recieving.. Yes, you can EASILY block outbound, it's inbound that's an issue.

      "We prevent the worm's SMTP engine from working by having MX wildcard records to a logging box only for internal DNS -"

      Say what? Why wouldn't you just block outbound port 25 from anyone expect YOUR SMTP server's address? If a worm has it's own SMTP engine (many do, yes), then what's to stop it from doing it's own MX look-ups? It would take about 4 extra lines of code to accomplish this.

      --
      Mod +5 Drunk
    10. Re:Easily spoofed? by Anonymous Coward · · Score: 0

      Why wouldn't you just block outbound port 25 from anyone expect YOUR SMTP server's address?

      We do.

      what's to stop it from doing it's own MX look-ups?

      Absolutely nothing. It gets the worm nowhere, though, except on our logging server for hitting the wildcard, which leads to remediation. Please try to pay attention.

    11. Re:Easily spoofed? by mlefevre · · Score: 5, Informative

      The envelope-sender can be just as easily spoofed as the From: header. If you're sending email out through your ISP or corporate email relay, that may well check that the host (or the whole address) is correct.

      If you do as most spammers do and connect directly to the receiving server, then you can feed it whatever you like in the envelope sender, and it has no way of checking whether it's genuine or not. This is what stuff like SPF can help with, but as things are currently implemented just about everywhere, the envelope-sender addresses on spam and viruses are generally forged.

    12. Re:Easily spoofed? by Anonymous Coward · · Score: 1, Informative

      The parent is overrated.

      Forging an envelope sender is trivial: "telnet mailhost 25" and break out your best SMTP rap.

    13. Re:Easily spoofed? by yerfatma · · Score: 1
      That's a nice thought, but remember how much we love the fact "The Internet can't be regulated" and that no government controls it? Spammers in different jurisdictions make that difficult. The best way to stop it at the source is to make the incentive as small as possible by either driving up the cost (in money/ time/ effort/ smarts) to successfully spam or reducing the payoff.

      So if you want to stop it at the source, knock off your Aunt Bertha who keeps buying shit from spammers and telemarketers.

    14. Re:Easily spoofed? by crymeph0 · · Score: 3, Insightful

      Easily 30% of the spam I've received over the last few months has been addressed to several people in my office (and not to anyone outside the office). I'm guessing this a result of viruses harvesting emails off people's computers, then it's a simple matter of finding all known emails in a given domain. Would this break the system described here?

      --
      It should be illegal to say that freedom of speech should be limited.
    15. Re:Easily spoofed? by Anonymous Coward · · Score: 0

      Nothing. This method counts on spammers not knowing who your friends are. All it takes is another worm which not only sends itself to the people in your address book but also includes all the other addresses in long CC lines. It could also look for incoming emails by itself and quote them, including the CC lines. Just hope nobody who falls for stupid bait texts knows you.

    16. Re:Easily spoofed? by FauxPasIII · · Score: 3, Insightful

      > If you do as most spammers do and connect directly to the receiving server, then you can feed it
      > whatever you like in the envelope sender, and it has no way of checking whether it's genuine or not.

      Isn't it typical for the receiver to reverse-lookup the sender's IP, or at least forward-lookup whatever you hand it in the HELO to make sure you're legit ? I could be mistaken here, but that's always been my perception.

      --
      25% Funny, 25% Insightful, 25% Informative, 25% Troll
    17. Re:Easily spoofed? by DR+SoB · · Score: 2, Insightful

      "Please try to pay attention."

      I'll try..

      Your assuming too much dude.. Your assuming it's going to try and access your default DNS server, but it could be hardcoded to try any DNS server (i.e. use akadns.yahoo.com for lookups)..

      Also, some SMTP's don't even bother to do MX look-ups, they just assume it will be either:

      MAIL.[domain].[whatever]
      or
      MAIL1.[domain].[wh atever]

      And it will be correct 80% of the time. (Yes I picked 80% off the top of my head, but let's just say I've seen enough mail server's to know..)..

      --
      Mod +5 Drunk
    18. Re:Easily spoofed? by Vainglorious+Coward · · Score: 4, Informative
      Isn't it typical for the receiver to reverse-lookup the sender's IP, or at least forward-lookup whatever you hand it in the HELO to make sure you're legit ?

      Some systems do this, but any sensible system will not reject solely on this basis because it breaks delivery of some legitimate messages. In particular, nowhere does it say that mail "from" a particular domain has to emanate from a particular host (there's no analogue to MX for *sending* hosts). That's what SPF and similar techniques are trying to impose - registered "senders" for a particular domain.

      --
      My next sig will be ready soon, but subscribers can beat the rush
    19. Re:Easily spoofed? by gnu-generation-one · · Score: 2, Interesting

      "as i understand it, they would have to spoof to someone who you know, a virus could easily do that (after it has your address book) but not so much for spam."

      And virus-infected machines are being used to send spam, they're also capable of swapping email address details between machines?

      Coincidence? You'd better hope the spammers think so.

    20. Re:Easily spoofed? by peteforsyth · · Score: 1

      The problem with this system, as I understand it, is that it relies on a large number of email addresses being visible in the To: or CC: fields. All spammers would have to do to get around it is use the BCC field, or use a program that sends out the message individually to different people. Without a large number of email addresses in the header, it seems this system will not apply any rules.

    21. Re:Easily spoofed? by Anonymous Coward · · Score: 0

      Your assuming too much dude..

      you are assuming we allow external DNS lookups.

    22. Re:Easily spoofed? by AnotherBlackHat · · Score: 1

      What's to stop the From:, To:, and Cc: fields from being spoofed (like a lot of viruses do)?


      Little point in spoofing the To: or Cc: headers, but yes, spammers could spoof the From: (and the envelope from) quite easily.

      They can. They are. And what's worse, spammers don't have to do it perfectly -
      they can send each spam "from" thousands of people and see which ones get through.
      Keeping a list of from-to pairs is just as easy as keeping a list of to addresses.

      This can be fixed by digitally signing email, but if you prefer a quick fix that works with the existing legacy,
      then use the From: plus the IP address.
      (for a slightly better but more complex approach, you can group all outbound IPs for an ISP together.)
      It's possible to spoof an IP address, but it's several orders of magnitude more difficult than spoof just the From:

      -- this is not a .sig
    23. Re:Easily spoofed? by klang · · Score: 1

      too easy ..

      the to-field will simply equal the from-field.

      I am not on my own whitelist for this very reason.

      The whitelist of friends, everything else is scanned manually, works well enough for me.

    24. Re:Easily spoofed? by Kainaw · · Score: 1

      So, the solution is to send spam through a virus that opens your address book and spams all your friends. Just tack on an attachment that says: "Open this cool [whatever]" and you'll be spreading your spam in no time.

      --
      The previous comment is purposely vague and generalized, but all of the facts are completely true.
  6. Volume by enderanjin · · Score: 4, Interesting

    If the filters are effective against only half of the emails, what is preventing spammers from doubling their load in order to control the same amount of spam getting to your inbox as they do now?

    --
    Anything in parenthesis may (not) be ignored.
    1. Re:Volume by Dukael_Mikakis · · Score: 2, Insightful

      And from the sounds of it, what makes it different from black(or white)lists? True, it's more sophisticated because it uses the whitelists of those on your whitelists, but why not just use a plain whitelist anyway?

      And how does this allow email from internet transactions or other non-social sources through? The article didn't seem to address that so clearly.

    2. Re:Volume by ComputerSlicer23 · · Score: 2, Interesting
      It'd be novel to see how this worked, when implemented at say the ISP level. Possible an intra-ISP level, where they ended up exchanging information.

      Then when I get a random e-mail from a friend, of a friend that isn't on my white list, it's a lot more likely to show up in my filtered mail. It's an easy way of having a white list built for you. Besides, I hate maintaining a white list. Anytime someone changes e-mail addresses, I have to go play with the white list. It's not terrible convienent. I'd be much happier if they could be intelligently built by an automated system (with a weighting, and me maintaining possibly another white list).

      However, in the end, they are building a bass ackwards version of a "chain of trust". I mean, all you'd have to do is build a chain of trust of "From:" addresses you trust. However, if it is available to the public, it's probably a spammers dream. Which means it'll need some type of method of verifing that the From: headers are legitimate. As soon as that is done, spam filtering will be pretty easy, but it will create a whole slew of problems for generating e-mails from automated systems.

      Kirby

    3. Re:Volume by Dukael_Mikakis · · Score: 1

      I'm not sure how flushed out the whole thing is (the article wasn't terribly detailed, but it seems like this sort of thing relies on you having an up-to-date whitelist. If somebody changes their email and your whitelist isn't updated, then if that person sends you (or anybody that they know through you) an email, it won't get through because your whitelist is wrong.

      It seems to me that they are trying to use the idea of UserNames (ala Friendster or whatever) where a person's email may change daily, but your whitelist will remain updated because the User Name or the User Id is in your network. So in Friendster you can change your location, email, user name, whatever, but still be connected to your friends by some immutable Id. So what makes this different from Friendster, then?

      And I'm not sure if I follow the thing about the "cumbersomeness of other spam filters". I use a Mozilla Bayesian and it works beautifully, I only see maybe 3 spams a day, which are promptly marked "Junk".

    4. Re:Volume by ComputerSlicer23 · · Score: 1
      I'm not convinced it's a well conceived idea either. However, I think I got the "cumbersomeness of other spam filters".

      I believe the concept there, is that if this successfully identifies spam, known, and uncertain with virtually no computation (that's an assumption on my part). It appears you find the "From:", "To:" and "CC:" lines, parse them a little bit, update a small database, do a small comutation.

      Bayesian filtering, and SpamAssassin aren't precisely lite on a e-mail server. Running bogofilter on a large mail spool is rough on the box. If you could whittle down the amount of e-mail that has to go thru that process, it's a net win on spam filtering computation. I think when they say "Cumbersome", they really mean, "amount of work that is done to figure out if it's spam or not, by a computer", not the amount of work done by the human to set it up.

      Finally, Baysian filtering is a tool of the gods. I've told everyone I know about its wonders, and how much I really think they should use it.

      Kirby

    5. Re:Volume by kirkjobsluder · · Score: 1

      And from the sounds of it, what makes it different from black(or white)lists? True, it's more sophisticated because it uses the whitelists of those on your whitelists, but why not just use a plain whitelist anyway?

      The actual article that describes the algorithm explicitly states that it is a way to create whitelists based on existing information from your mailbox.

      And how does this allow email from internet transactions or other non-social sources through? The article didn't seem to address that so clearly.

      The algorithm has a fudge factor that assigns an ambiguous score for cases where the clustering coefficient can't be calculated.

  7. It only works for half of the emails received? by Anonymous Coward · · Score: 0

    Whoopy, that is nice! I have an antispam solution that works for half of my emails at work, too - I only accept email from my company's domain! Brilliant! No external spammers can get me!!

  8. huh? by wankledot · · Score: 4, Interesting
    It only works for half... but it works great on that half!!! How is that a good filter at all?

    Of course one huge downside to this "friend of friends" approach is all the virus spam I get that's sent using someone's address book (thanks Outlook!) Guess what... all those addresses are probably whitelisted because it came from someone I "know."

    --
    My sig is blank, I typed this by hand.
    1. Re:huh? by CeleronXL · · Score: 5, Interesting

      Well you can run mail through a system like that first, pulling out the mail that is definitely not spam and shuffling it away to the Inbox. Then run it through a different kind of spam system, such as a system like SpamBayes, and you cut it down even more.

      On its own it doesn't sound like it works well, but you can couple it with already-existing systems to boost accuracy.

    2. Re:huh? by nick_davison · · Score: 4, Funny

      Hey, don't knock a filter that can correctly sort mail in to two piles fifty percent of the time. CoinToss 1.0 has been a real innovation!

    3. Re:huh? by PossibleMat · · Score: 1
      I agree with the first answer to this post. What they are suggesting isn't necessarily an end solution. It's a method that can be included in packages that use a combination of methods.
      To quote:
      ...their method should prove highly effective when paired up with more sophisticated, but more cumbersome, filtering methods.
      --
      Have you Meta Meta Moderated lately?
    4. Re:huh? by Geancanach · · Score: 1

      It sorted 53% into categories of spam or not-spam. The rest of the email was uncategorized. The important thing is that the emails that were categorized were ALWAYS categorized correctly - no legitimate emails in the spam group. You could perform a separate anti-spam technique on the rest, although that separate method probably would make some errors.

    5. Re:huh? by feepness · · Score: 2, Interesting

      It only works for half... but it works great on that half!!! How is that a good filter at all?

      No, it works PERFECTLY on that half.

      Important distinction. Now instead of needing need to troll through for spam yourself to generate the Bayesian filter you can set this to automatically generate your Bayesian filter. Not only would this be easier, but it would reduce false negatives/positives by 50%.

    6. Re:huh? by Mr_Silver · · Score: 1
      Of course one huge downside to this "friend of friends" approach is all the virus spam I get that's sent using someone's address book (thanks Outlook!)

      Whilst I don't want to look like I'm fighting for Outlook (because when it comes to virus propogation, it certainly does do that job well), I do think it's worth pointing out that its trivially easy to extract the names and numbers out of other email clients' address book.

      In fact, if you compaire it to something like, say Pine, it's actually harder to get at the numbers because you have to use the Outlook object. With Pine you can just read a plain text file.

      It also means that Microsoft can (and have) written hooks so that if something tries to access the Outlook object, Windows can jump in and stop it from doing so.

      I don't believe Linux can stop any application reading a file short of changing its permissions, which means the anything could access the Pine address book and no-one would be any the wiser.

      Of course, the problem with viruses (and the issues with Outlook/Windows/IE that helps them) is a whole lot bigger than just getting the addresses, but something i thought I'd mention.

      --
      Avantslash - View Slashdot cleanly on your mobile phone.
    7. Re:huh? by johnynek · · Score: 2, Interesting

      It is also really good at looking for false-positives or false-negatives of existing solutions (like spamassassin or crm114).

      --
      jabber: johnynek@jabber.org
    8. Re:huh? by Anonymous Coward · · Score: 0

      It only works for half... but it works great on that half!!! How is that a good filter at all?

      Mail would come out as (a) Spam (b) Not Spam (c) Indeterminate.

      50% would be in either (a) or (b). Among those in the two groups, it is accurate.

    9. Re:huh? by ajs · · Score: 1

      In other words, this would work ok as an element of a larger system that could deal with the other 50%.

      Interestingly, this is EXACLTY how SpamAssassin works, running a variety of testing engines from simple header-regexps to basian analysis to DNS-blacklist checks. SA is essentially just a framework for pulling diverse testing schemes together and using all of their input to evaluate mail. This technique seems like it would be a good addition.

    10. Re:huh? by Tor · · Score: 1

      This test alone would not do, of course. But integrated into filtering software such as SpamAssassin, it would probably be one of the more reliable (high positive/negative score) tests.

      -tor

    11. Re:huh? by timeOday · · Score: 1
      ...it would reduce false negatives/positives by 50%.
      Not necessarily, that assumes the results of this filter and whatever you augment it with are uncorrelated. In practice it's likely that different anti-spam filters have difficulty with the same messages - the very clever spams, or the friend of a friend who only emails you once in a while.
    12. Re:Huh? by TheoMurpse · · Score: 2, Informative

      no what they meant was that 50% of all email messages are sorted into "friends" or "spam" correctly...the other 50% aren't sorted into either, but rather considered "undetermined"

    13. Re:huh? by goofy183 · · Score: 1

      A new method would be cool but speed is more what I'd look for. Using SpamPal http://www.spampal.org for intelligent white/black listing and DSNBL with the http://spampalbayes.sourceforge.net/ Bayesian plugin seems to be working 99.9% for me. I'm still convinced that the Bayesian text based filtering methods are THE BEST way to filter spam. A well trained filter with some inteligent rules to whitelist & blacklist email address works wonders.

    14. Re:Huh? by aonaran · · Score: 1

      uh huh,
      Isn't that what I said?
      It's still a nonsensical statement.

      If it didn't sort it, it didn't work, and if it worked, it sorted it.
      To say it works 50% of the time and in 100% of the times it works it works right is just silliness.

      Either it works or it doesn't. Still 50% effectivness is good as an additional layer of spam protection it is certainly welcome.

    15. Re:huh? by PetWolverine · · Score: 1

      A system that's right half the time would be a coin toss, but this system is right all the time--it just only gives an answer half the time. So while it doesn't get rid of all spam, it cuts it down so that other filters have less to deal with.

      --
      I found the meaning of life the other day, but I had write-only access.
    16. Re:Huh? by jonesvery · · Score: 2, Funny
      The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category.

      Am I the only one who read this sentence and said "huh??"

      Oh, no -- makes perfect sense to me. I applied that logic to quite a few exams when I was in college: "My score on this exam is perfect...I could only come up with answers to half of the questions, but every one that I answered was correct! a+ for me!"

      My professors were the bastards who didn't understand...

      --

      * * *
      It is a dada story -- it has no moral.

  9. hm.. by arabagast · · Score: 2, Interesting

    isn`t this somewhat similar to thunderbirds function not to mark those in your mailinglist as spam ?

    --
    Doolittle : ...What is your one purpose in life?
    Bomb no.20 : To explode of course.
  10. Cleaning up the gene pool by Anonymous Coward · · Score: 5, Funny

    Spammers suck, right? And their children have obviously inherited the spamming gene. So, by starving the children to death, we're preventing the spam gene from spreading. It may sound wrong, but we're actually helping society.

  11. It would be interesting... by Anonymous Coward · · Score: 1
    It would be interesting if Google could find away for this idea to work with Orkut.com, since users of this service are typically connected to many other people who are not spammers. :-)

    ...if Google could find away to get Orkut.com to work, first.

  12. Viruses? by AntiOrganic · · Score: 3, Interesting

    Won't this just inspire more spammers to pursue virus, trojan and spyware-oriented methods of spamming? Granted, this is significantly more difficult than just harvesting email addresses off of Usenet and web pages, but it seems like we're only one step ahead at any given time with our methods of spam prevention.

    1. Re:Viruses? by Xzzy · · Score: 2, Insightful

      > Won't this just inspire more spammers to pursue
      > virus, trojan and spyware-oriented methods of
      > spamming?

      Fine by me.. that puts them soundly into the lawbreaking category. Which means that after you track them down and actually find someone operating inside the borders of your country, you can DO something about it.

      Since the laws being passed in the US are clearly indicative that spam is and will always be in an impossible to regulate grey area, the next best solution is to make spamming so difficult that only outlaws can do it.

    2. Re:Viruses? by MoogMan · · Score: 3, Insightful

      That isnt necessarily a bad thing, forcing users to clue up on good practices regarding viruses etc by automatically blackmailing their email address otherwise. If this is coupled with a decent system to stop the from/to/cc from being filtered then it may start solving two problems at once.

    3. Re:Viruses? by tunabomber · · Score: 1
      Won't this just inspire more spammers to pursue virus, trojan and spyware-oriented methods of spamming?

      ...which will encourage friends to educate their friends about opening strange attachments or running an unpatched version of Outlook. If you're getting tons of spam from anonymous proxies running on your friends' computers, you will be more motivated to teach them how to clean up their machines or give them the ultimatum "lose your viruses, or lose your email communication channel with me".


      Eventually only non-technical social circles (groups of people without a single techie in their address books) will suffer seriously from viruses and spam.

      --

      pi = 3.141592653589793helpimtrappedinauniversefactory71 ...
    4. Re:Viruses? by apt142 · · Score: 1

      Won't any anti-spam software? If what you are implying is that we won't be able to ever filter them all or always stay ahead of the game, I agree. But if you are implying that we should give up the ghost, I totally disagree.

      In the end, I don't think that we can solve the spam problem with a technology solution. I think the solution will have to be a sociological one. Whether it's jail time, fines or execution. But, until something gets done in that area, I'm all about filtering this crap out and making it very hard on spammers to continue.

    5. Re:Viruses? by Syberghost · · Score: 2, Funny

      Which means that after you track them down and actually find someone operating inside the borders of your country, you can DO something about it.

      Screw that; if they send even one spam to an FBI agent, they're interfering with his ability to do his job, and thus providing aid and comfort to terrorists.

    6. Re:Viruses? by AntiOrganic · · Score: 1

      Oh, I wholeheartedly agree -- it's just that the context of the article purported this new solution as a be-all-end-all spam problem solver, which just doesn't seem to be the case. Already we're seeing trojans opening people's computers as spam relays, I'm just wondering how long it takes before they just start spoofing the infected's source address as well.

    7. Re:Viruses? by j_matthews · · Score: 2
      Won't this just inspire more spammers to pursue virus, trojan and spyware-oriented methods of spamming?

      Yes. But what that does is degrade spammers not to people with annoying business models, but to criminals. This is good. Criminals can be locked up. Criminals can have restraining orders placed on them. Criminals can be fined. Yes I know that a lot of spammers use international borders to hide behind, but I don't think there is a government in the world that wants to be associated with crime syndicate protection just in case they get labelled TERRORIST or other politically correct name calling.
  13. Random number generator is just as good by Anonymous Coward · · Score: 0
    The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category.

    Hmm... Wouldn't a random number generator give you the same result?

    1. Re:Random number generator is just as good by jejones · · Score: 1

      No. Flipping a coin would trash half of the nonspam mail; the method under discussion is claimed in the article not to suffer from false positives.

    2. Re:Random number generator is just as good by Anonymous Coward · · Score: 1, Informative

      No.

      This system can vouch for half of your email, that it's either friend or spam. This means it correctly categorizes half of the email, and leaves the other half unknown.

      A random number generator could assign all of your email to friend or spam, randomly. But it wouldn't do it all correctly.

      Duh.

  14. Sounds interesting... by herrvinny · · Score: 1

    But they're going to have to make it work better, say 75% of all email received. Hell, most legitimate mail you receive are emails from people/orgs you're corresponded with previously, so why doesn't it work on more than 50% of emails?

    1. Re:Sounds interesting... by rjelks · · Score: 4, Insightful

      I would agree with that in terms of personal email accounts, but for a business, new contacts are pretty important. Most companies would hope a lot of real email was from new sources.

      -

  15. What??? by Entry-Level+Loser · · Score: 1

    Spam control without charging for E-mail? This can't be - no way!

    1. Re:What??? by pether · · Score: 0

      Sure spam couln't be that a big problem that you rather pay for email than hit the delete button once in a while.

      A good configured spamassasin catch most of the spam anyway..

  16. Implementation?? by NineNine · · Score: 1

    The article is great and all, but it doesn't mention if this method is actually being implemented anywhere yet. So, yeah, great theory, but I want to see it in practice.

  17. Bugger Off! by ackthpt · · Score: 5, Interesting
    You take food away from a spammer and his children. Don't block spam, or else you hate childeren. You don't hate children... do you?

    You know darn well that this will only increase employment in the Spam Technology sector and is a good thing.

    Seriously, Spammers are often a step ahead and lately a lot of spam I'm getting is masked to look like Amazon orders or closed ebay auctions. I haven't ordered anything from Amazon (USA) in ages, but I till have to peek to see if someone has cracked my account and ordered something. Just expect the harder they are pressed, the harder spammers will press back by sinking to new lows.

    --

    A feeling of having made the same mistake before: Deja Foobar
    1. Re:Bugger Off! by Anonymous Coward · · Score: 0

      Sounds like a perfect application for digital signatures...why Amazon etc don't sign their emails and post a key somewhere is beyond me.

  18. Good idea by Schezar · · Score: 5, Interesting

    After reading this, I realized that a good 90% of the email I receive is either from someone I've had previous contact with, or else someone 1 or at most 2 degrees of separation from one of those people. I never get mail worth reading from total strangers. Anything important is always linked back to me in some way.

    It should be interesting to see how this method plays out. (Now, I don't know why I even bothered with that last sentence. Everyone says that about every new spam-filtery thing. ((Don't know why I bothered with that last sentence either. Work is slow today I suppose.)) )

    --
    GeekNights!
    Late Night Radio for Geeks!
    1. Re:Good idea by Anonymous Coward · · Score: 1, Funny
      ... someone I've had previous contact with, or else someone 1 or at most 2 degrees of separation ...

      When you get to 6, say hey to Kevin Bacon for me would ya?

    2. Re:Good idea by JustinMWard · · Score: 1

      You clearly don't have to answer email addressed to webmaster, postmaster, admin, etc.. I would say that at least a quarter of my (ham) email comes from people I've never heard of before.

      Even if it still effectively filters 50% of the remaining 75% of my email.. that's what, about 38%? SpamAssassin stops more than that before I even train its Bayesian brain. I'm also doubtful of its zero false positive claims.

      Very simply, I don't see how this could work for people who conduct business via email. You just get too much email from people who are completely outside your social network, that you absolutely cannot block.

    3. Re:Good idea by bobintetley · · Score: 1

      ...90% of the email I receive is either from someone I've had previous contact with...

      Well, that's great for you, but I run a number of F/OSS projects and 90% of the email I receive is from people I've never heard from before - and might never hear from again. I still want to talk to them though, as I get useful feedback, patches and help from the kindness of these total strangers.

      How does this thing cope with mailling lists? I am a member of a lot of mailling lists and the From address is always the person who sent the mail (rather than the list address) - won't this effectively render this method useless?

    4. Re:Good idea by michael_cain · · Score: 1
      Anything important is always linked back to me in some way.

      When I've done academic-style work in the past, some number of the documents end up being available online, along with e-mail contact information. A small but significant number of e-mails arrive whose only link to me is a common interest in a particular class of problems (and solutions). Some of these turn out to be important, but a previous linkage could be extremely difficult to establish -- e-mail from a university student in South Africa is one real-life example. Of course, almost all of these could be identified by a filter with sufficient sophistication and a database of my posted papers. Hard to work a reference to run-time feature interaction management or real-time network impairment emulation into a Viagra ad :^)

    5. Re:Good idea by MyFourthAccount · · Score: 1

      I never get mail worth reading from total strangers.

      Funny how that works. I have a small website where I advertise consultancy software development. I'm not interested to come up with complex systems for people to contact me, I'm not an "HTML programmer", I do very different kind of stuff. So I just have a (spam-safe) email link on the contact page. So the email I get from people that look at that site is probably the most important email I ever get. It pays for the bills and stuff.

      Just to show how things can be totally different for different people.

    6. Re:Good idea by adrianbaugh · · Score: 1

      I used to think that. But occasionally mail worth reading does come from perfect strangers. For example, today I received an email from someone contributing to a project I used to contribute to. He joied after I stopped contributing, so I've never heard from him before, but it's still an interesting email.

      --
      "'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
      - JRR Tolkien.
    7. Re:Good idea by kirkjobsluder · · Score: 1

      Even if it still effectively filters 50% of the remaining 75% of my email.. that's what, about 38%? SpamAssassin stops more than that before I even train its Bayesian brain. I'm also doubtful of its zero false positive claims.

      The stated intent of this method (and am I the only person who has actually read the full 6-page proposal) is to reduce the computational overhead of filtering mail. The way this is intended to be used is to create whitelists and blacklists. Searching the header for a whitelist is many cases is less processor intensive that searching the whole message multiple times for spam phrases, or a fulltext baysian analysis.

  19. this doesn't address spoofed email by alpha1125 · · Score: 3, Interesting

    What about spoofed messages from people on my list?

    Worms, from infected email systems?

    The researchers didn't address this.

    --
    Money cannot buy happiness, but can buy something soo darn close, that you can't really tell the difference
    1. Re:this doesn't address spoofed email by humuhumunukunukuapu' · · Score: 1

      how many of those [those meaning spoofed messages from know people] do you get every day? .2? .5? .00002?

      --
      i saw the baby, and the baby looked at me
  20. A two tier system? by erick99 · · Score: 4, Interesting
    I suppose you could use this as a first pass and let those go directly to the "recycle bin" or whatever deletes mail (if you really can be confident that they are all spam). Then, the balance of your email could go through whatever antispam system you use. Right now I get over 100 spam emails a day. These go into a folder and are sorted by sender so that I can quickly scan through for any "friendly" emails. If would be nice to cut down the amount that has to be manually scanned by a half. Either way, this sounds like it's going in the right direction - towards a system that is close to 100% effective (if that is truly possible).

    Happy Trails!

    Erick

    --
    http://www.busyweather.com/
    1. Re:A two tier system? by Sentosus · · Score: 1

      Ever think it could be because of the email address in your post? "by erick99 (743982) on Thursday February 19, @01:38PM (#8329171) " I really hope that this spam filtering does not force spammers to use open wireless networks as a tool..

  21. email still has to get to user by belmolis · · Score: 3, Insightful

    If I understand the technique correctly, it relies on information specific to individual users. Unless there is a way for users to export their information, that means that the filtering can only be done after the email reaches its destination, not by the ISP or central mail server. So it may be helfpul to individual users, but unlike some proposed techniques, it won't cut down on total email traffic.

    1. Re:email still has to get to user by grofty · · Score: 1

      not necessarily true. There are many filtering solutions that employ user level on the mail server itself. IT this filter is able to update the server's filter lists dynamically from the findings of the end user, it would work to lessen mail server congestion as well.

  22. End user's access is not the issue. by Sentosus · · Score: 3, Insightful

    For me as an ISP, I don't care if the email gets filtered between me and my customers. It hurts and costs me more for bandwidth to receive the emails, then store them, and then support the users that want me to clear their pop3 accounts when they are on dialup. Spam Filtering should take place at the Hub Cities on edge servers so it never gets to my mail server in the first place and I do not have the bandwidth charges. In exchange, I will filter all my outgoing mail on the mail server for spam outgoing. BTW, my mother likes spam. It is a good hobby of hers just to read through it. She gets very entertained by the content.

    1. Re:End user's access is not the issue. by phishtrader · · Score: 1

      Maybe we could all just use your mother as a spam filter.

    2. Re:End user's access is not the issue. by Sentosus · · Score: 1

      Won't you all just argue over who gets her first?

    3. Re:End user's access is not the issue. by Anonymous Coward · · Score: 0

      BTW, my mother likes spam. It is a good hobby of hers just to read through it. She gets very entertained by the content.

      Tell your Mom thanks for keeping the spammers in business.

  23. Re:He is practicing changing history oops by jeoin · · Score: 1

    wow wrong thread.. note to self.. less beer...

    --
    Jeoin
  24. Spam filtering by eclectro · · Score: 5, Funny


    If it doesn't use bullets, I don't want to hear about it.

    --
    Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
    1. Re:Spam filtering by DoubleD · · Score: 1

      What about a rabbit with a switchblade?

      http://www.sluggy.com/daily.php?date=031014

      --
      "He is no fool who gives what he cannot keep in order to gain what he cannot lose."
    2. Re:Spam filtering by g0at · · Score: 1

      How about this, then. It is a good idea because:

      • It filters spam
      • Shiny things
      • Etc.

      :D

    3. Re:Spam filtering by dedalus2000 · · Score: 1
      "If it doesn't use bullets, I don't want to hear about it."
      You must work in marketing.

      --
      My keyboads not woking popely.
  25. I don't always like my friends' friends by Clemence · · Score: 5, Funny

    Can't stop the friend-of-a-friend idiot who hits "reply to all."

    It might not be "spam" but I filter it now. I'll stick with my procmail filters.

    1. Re:I don't always like my friends' friends by LordK2002 · · Score: 1
      Can't stop the friend-of-a-friend idiot who hits "reply to all."
      Actually I find that these are great sources of fascinating tid-bits of information from your friend's friends, which they expected to remain private but ended up being delivered promptly to your inbox.

      K

  26. Good Start by ticklemeozmo · · Score: 2, Interesting

    This seems to be a good start, but it still requires software on the user side. And that software must work with their mail client...

    I guess it seems this is where the focus has become. While some spam can be blanketed and deleted, it's really up to the RECIPIENT to judge whether its spam or not.

    But then again, do we trust the user? Do we trust Joe and Jane (our loving SixPack couple) to make the right decision? Sure, it might be prudent in a company of 5-50, but what about 500-5000? Deploy and manage copies of these program to see if it's going right or not?

    I'm a sysadmin and I prefer the server based solution. Blacklists, SpamAssassin, et. al. Easier to fix one machine than 5000 desktops.

    Comments?

    --
    When modding "Informative", please make sure it both has a source and IS actually informative.
    1. Re:Good Start by johnynek · · Score: 1

      Actually, it can run great on the server. In a mail environment that runs something like Procmail, this algorithm can construct the relationships between *EACH* user and his contacts. As the message arives, the algorithm can update the network. The network only changes after an email is received, so a server based system would be great.

      The server could then make a whitelist and a blacklist for each user, without user intervention.

      I hope that something like this could help in helping systems like Hotmail, Yahoo or AOL automate whitelist building for their users.

      --
      jabber: johnynek@jabber.org
  27. Even better method by Anonymous Coward · · Score: 0

    Delete everything and wait for people to contact you by phone or snail-mail to see whassssup. This method has the side-effect of giving a well-deserved cold shoulder to people you want to exclude from your social circle.

  28. Better than nothing by grendel_x86 · · Score: 1

    50% is still better than Yahoo's filtering scheme. The problem w/ spam filters is that they are in reaction to spam, so spammers will always have the upper hand. Like CAN-SPAM, spammers found loop-holes before it even went into effect.

    --
    Im glad /. isnt the real world, that would really suck..
  29. Ninnle has this... by Anonymous Coward · · Score: 0

    Ninnle Linux has had very effecive spam-control in place for years! This isn't anything new at all. Wake up, Slashdot, and start noticing the developments tha take place in this cutting edge distro!

  30. 'Blacklist of spammers' ?? by Chmarr · · Score: 1

    The article talks about a 'blacklist of spammers'... but... we ALL know this won't work, of course, since spam rarely, if ever, has a legitimate 'from' address.

    Also, this kind of solution will ONLY work if it's not widely used. Once it DOES become widely used, the spammers will simply update their huge network of zombie machines so that the spamming software on those machines sends spam from friends to friends, utilising the available address books and previous recipient list on the infected machine.

    In other words, while the 'friends network' will turn the EXISTING spamming procedures against them, then spammers will then turn the anti-spam software against itself, by turning the 'friends network' into a 'spamming network'.

    So... nice work, needs thought.

  31. Heading the wrong way by Muddie · · Score: 5, Interesting

    This sounds like the whole "Friends and Family" network from AT&T a few years ago, and now Verizon's "In" network thing, but with email and exclusive instead of "Free calls to friends on 'the list'".

    Pretty soon, you will have to send an MD5 hash of your DNA from a static IP address that is reversible and supply 5 refrences all in a PGP encrypted letter, along with a copy of your passport and birth certificate.

    When it's more work to block spam than stop it, you have to ask what is going wrong. Maybe if we somehow figured out wonderful technologies to *stop* spammers instead of blocking them, we'd be getting towards the ultimate goal. This is much like throwing money at a problem to bandage it, not fix it. The solution, however, also has to be easier for end users, who are doing nothing wrong. Why is every solution harder for end users, but just a 'bump in the road' for spammers? Am I missing something?

    1. Re:Heading the wrong way by Anonymous Coward · · Score: 1, Funny

      Maybe if we somehow figured out wonderful technologies to *stop* spammers instead of blocking them, we'd be getting towards the ultimate goal

      Such as spam-seeking missiles??

    2. Re:Heading the wrong way by Dimensio · · Score: 1

      When it's more work to block spam than stop it, you have to ask what is going wrong. Maybe if we somehow figured out wonderful technologies to *stop* spammers instead of blocking them, we'd be getting towards the ultimate goal.

      And yet people scoff at my suggestion of employing the death penalty (with painful torture) against spammers.

    3. Re:Heading the wrong way by Anonymous Coward · · Score: 0

      I mostly agree. This isn't the way to fix spam. Ultimately it needs to be killed (I mean that figuratively of course) at the source.

      This research is worthwhile though. Automating the contruction of a web of trust and letting the computer do a lot of the work of figuring out who you trust, who you're indifferent about, and who you consider scum of the earth is very useful. This web could be used to rank lots of things besides spam.

    4. Re:Heading the wrong way by stewby18 · · Score: 1

      Maybe if we somehow figured out wonderful technologies to *stop* spammers instead of blocking them, we'd be getting towards the ultimate goal.

      Of course, the problem is that everybody knows that spam can't be solved by technology; legislation is the only hope.

      Except that everyone knows that legislation is hopless, and the societal causes have to be fixed if we are ever going to end spam.

      But then again, everyone knows that it's impossible to prevent everyone from being taken in by spam, so the only we we'll stop spam is with a technological solution.

      --

      The issue is that, like you, everyone can say what's wrong with proposed method x, and that some nebulous better method y is the way to go... the problem is that no-one has yet proposed such a y that everyone agrees will actually work.

      Until the magical silver bullet is found, let's celebrate promising partial solutions--who knows which of them might evolve to become (part of) the ultimate solution?

  32. My own method by PossibleMat · · Score: 2, Interesting

    I would like to share in all humility my own method of spam filtering:

    I use a super-extra-secret e-mail that I give only to my friends. ;-)

    --
    Have you Meta Meta Moderated lately?
    1. Re:My own method by Patrik_AKA_RedX · · Score: 1

      That works great!
      Until one of your friends decides to send you a virtual birthday card or something like that.

    2. Re:My own method by leonardluen · · Score: 1

      i have found that some of my non-computer-geeky friends/family send me most of the spam i see in my inbox.

      my solution i use whitelists and mark those members of my family as spam

  33. (OT sig response) by jridley · · Score: 4, Funny

    Member of the Stop Fucking Saying 'M$' army

    Right, from now on, it's "micros~1" for me.

    1. Re:(OT sig response) by Anonymous Coward · · Score: 0

      You are aware that the "micros~1" problem only effected people who did stupid things.. like run DOS 6.22 on a Win95 machine.

      If you use the products the way they are meant to be used, you don't run into the "micros~1" problem. Get it?

    2. Re:(OT sig response) by Anonymous Coward · · Score: 0

      Tell that to my boss who had to automate some things with .bat files... somethings have to be changed (like progra~1 for "program files").

      Btw, quite impressed that other people picked up on the joke I posted the other day on the lin---s story.

      (going to post this anonymously again, because I did that originally).

    3. Re:(OT sig response) by Anonymous Coward · · Score: 0

      Don't feed the trolls.

      (...ya Windows loser) ;-)

    4. Re:(OT sig response) by drinkypoo · · Score: 1
      Part of the allure of Windows is that you can run dos applications on your Windows 2003 Server machine. The level of backwards compatibility is astounding. And now you can use Microsoft Virtual PC (heh heh) to run DOS on Windows, too, though of course there's dosbox, bochs, and vmware, but the point is that the OS itself has support for binaries from long, long ago. Try running some of those old MacOS 6 binaries on OS9, I guarantee that most of them will explode. So will most of the applications written for System 7 for that matter (System 7 was the biggest pile ever.) The MICROS~1 thing is a feature, not a bug.

      But, it's still funny :)

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    5. Re:(OT sig response) by recursiv · · Score: 1
      Tell that to my boss who had to automate some things with .bat files... somethings have to be changed (like progra~1 for "program files").


      Can I also tell your boss that the only reason he had to use that naming style is because "program files" has a space in it? And can i tell him that putting quotes around filenames allows you to use any long filename in a batch file, such as:

      cd "\program files"
      --
      I used to bulls-eye womp-rats in my pants
  34. Spam from Co-workers? by Titusdot+Groan · · Score: 2, Insightful
    These guys are way behind the curve. A growing percentage of the spam I get appears to be coming from my coworkers.

    These idiots have forgotten the basic rule of dealing with spammers (and other mail miscreants) which is:

    They LIE!
    They lie in the HELO, they lie in the MAIL FROM:, in the headers, etc. etc. etc.

    Any method that depends on this kind of data is doomed to a quick failure in the real world.

    1. Re:Spam from Co-workers? by johnynek · · Score: 2, Informative
      If you read the paper on the archive you will see that there is a method to deal with this problem.

      Namely, when someone joins a spam and non-spam component of the network.

      PS: This method was tested on email boxes from the "Real World", but of course, we could use more email boxes to test with. Please send me a tarball of all your email and I will tune the algorithm! :)

      --
      jabber: johnynek@jabber.org
    2. Re:Spam from Co-workers? by feepness · · Score: 1

      How do the spammers know the addresses of your co-workers?

    3. Re:Spam from Co-workers? by onion2k · · Score: 2, Funny

      Do you work for a penis enlargement company? Coz that'd explain a whole lot..

    4. Re:Spam from Co-workers? by Anonymous Coward · · Score: 0

      > These idiots have forgotten the basic rule of dealing with spammers (and other mail miscreants) which is:

      They LIE!


      You know, I was thinking the same thing. I was thinking why do people's brain shut down and believe everything they are told, once the source is identified as "researchers?" Hell yeah, a biochemical researcher is by far much better suited than most of us here at finding some chemical equation, or whatever they do which we do not, but in this field one is officially labeled a researcher, in most cases, because that's what one was after in life (the label, like those guys that get certs; hardly innovators). Some of the best minds in this here field have no need for any labels or formal recognition by traditional bodies.

      So, some "researcher" found a new method of combating spam that has obvious huge flaws which are readily apparent to anyone who didn't have their brain disabled by the secret code word "researcher." Wow... (in a very Visine/Ben Stein way)

  35. How is this different from bayesian? by blumpy · · Score: 1

    Does bayesian filtering do this to a certain degree, and more so? Ie. Checking for common tokens, including what's in the To and From fields?

  36. I foresee a nasty counter-measure to this by Maestro4k · · Score: 1

    As we're all well too aware, the spammers will find a way to counter this. Keeping in mind they don't care one whit about how many messages they send they'll probably just starting sending out their spam more -- once to use every address they're sending to as a from address. Sure this filter will only let that one through, but the amount of spam E-mail will jump exponentially.

  37. Questionable usefullness by Anonymous Coward · · Score: 0

    I think this will be questionably usefull for a lot of people. For instance, I get "mass emails" from friends/relatives, but I usually don't know half the people on the list - they're 2nd cousins thrice removed from my friend, I may have seen them once at my friend's wedding, but I don't know them from Adam.

    Also, as people have already mentioned, spoofed addresses (eg, from viruses reading address books and then sending them back to spammers) may render this useless quickly.

    Bayesian filtering (also some static rules) seems to catch 90%+ of my spam currently (via SAproxy). Having the software "learn" the spam vs. the ham seems to work very well for me and others I've talked to.

  38. The key is the cost by h00pla · · Score: 2, Interesting
    People who propose anti-spam measures should keep one thing clearly in mind, it seems to me. Spam will decrease as the cost of sending it increases.

    Though I'm no fan of Microsoft or Bill Gates, the solution proposed by them - one where a complicated math calculation is required for every mail they send - is on the right track because at least, in theory, it becomes expensive to send mail and therefore spammers are at a disadvantage. If this is to be a really workable solution, only time will tell - and given the MS tradition of hype ... who knows.

    Schemes that make it expensive for the handlers (networks, ISPs) or the recipients, are not the way to go. After reading the article, it seems that this is just another one of those.

    --
    I've been swashdotted -- Elmer Fudd
    1. Re:The key is the cost by taustin · · Score: 1

      How much are you willing to pay to be on every mailing list you subscribe to?

    2. Re:The key is the cost by G27+Radio · · Score: 1

      I think it would be relatively simple to have a whitelist for mailing lists, and people that you expect to receive a lot of e-mail from regularly. People on your whitelist wouldn't need to do any calculations to send you an e-mail.

      BTW, I think using the processing time to strongly encrypt the e-mail would be a great idea. E-mail should be sent encrypted by default.

  39. translated into pseudo code by Luveno · · Score: 1

    for (i = 0, i++, inboxMessages.count - 1){ if (i mod 2) = 0{ deleteEmail() } }

    1. Re:translated into pseudo code by PhxBlue · · Score: 1

      Too predictable. The spammers would just send each recipient two messages. Try this:

      void main() {<br>
      int random_int;<br>
      for (i = 0; i == inboxMessages.Count - 1, i++) {<br>
      random_int = int(2.0 * rand() / (RAND_MAX + 1.0));<br>
      if (random_int == 2) {<br>
      deleteEmail()<br>
      }<br>
      }<br>
      } // end main()

      Oh, and double-check your code next time. That wouldn't compile in any C++ compiler I know about. :)

      --
      !#@%*)anks for hanging up the phone, dear.
    2. Re:translated into pseudo code by Anonymous Coward · · Score: 0

      what language did you write that in? I admit I am only familiar with languages with 'C' like syntax

      but shouldnt that be

      for(i=0; inboxMessages.count-1; i++){ .... }

      also I prefer ++i since its more memory efficient in this cirumstance, there is no need to return the previous value of i.

      yes I am at work, and yes I am bored as shit, is it that noticable?

  40. New math? by WD · · Score: 2, Insightful

    The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category

    That has to be one of the most ridiculous statements I've heard in a while. That's like saying I've got a great new burglar alarm system. Now, it only works about half of the time, but when it does work it catches the crook with a 100% success rate!
    Who's buying?

    1. Re:New math? by Anonymous Coward · · Score: 0

      RTFA. It has 3 categories. Reject, Accept, and Don't Know. It's always accurate but 50% of the time it tells you it doesn't know and you have to do further filtering of the email.

    2. Re:New math? by PhxBlue · · Score: 1

      I had my house burgled three times in eighteen months before I had an alarm installed, which was pretty stupid in retrospect. But given the frequency of break-ins I've had to deal with, I'd be happy with a 50% success rate.

      --
      !#@%*)anks for hanging up the phone, dear.
  41. Spammers already defeat this (partially) by xleeko · · Score: 5, Interesting
    Spammers already sort addresses by site in order to take advantage of this effect. They forge the from address as someone else from your site on the theory that you know them and would whitelist them.

    In fact, this has provided me with a kind of "honeypot", since I now check for the addresses of several people who are long gone from my site. If I see their address its gotta be spam!

    - Dave

    1. Re:Spammers already defeat this (partially) by Anonymous Coward · · Score: 0

      Spammers already sort addresses by site in order to take advantage of this effect. They forge the from address as someone else from your site on the theory that you know them and would whitelist them.

      Spamassassin has a function whitelist_from_rcvd, which only whitelists a From: address if the IP address or domain name also matches.

      I've received spam that claims to be from my friends, but from a wrong IP address.

    2. Re:Spammers already defeat this (partially) by FattMattP · · Score: 1
      In fact, this has provided me with a kind of "honeypot", since I now check for the addresses of several people who are long gone from my site. If I see their address its gotta be spam!
      Same here. I use the spamassassin blacklist_to command to blacklist emails where I'm CC'd with users that don't exist.
      --
      Prevent email address forgery. Publish SPF records for y
    3. Re:Spammers already defeat this (partially) by xleeko · · Score: 1

      The blacklist_to must be a recent addition. When I last checked only blacklist_from was available (which annoyed me because of the lack of symmetry)

      It wasn't a big deal though, since a quick procmail rule did the trick.

    4. Re:Spammers already defeat this (partially) by stefanb · · Score: 1
      I was quite confused when I received spam from myself: apparently, whatever sorting had been done led to the faked message coming from my work email address, and be directed at my personal account. And the local part wasn't even similar.

      While the proposed method might provide some additional heuristics for a spam filter, it certainly is not the magical cure...

  42. livejournals by Anonymous Coward · · Score: 1, Interesting

    This may be a reasonablesolution to the drive-by spaming that occurs onlivejournal.you can easily create a web-o-trust given the closedfriendly nature of the 'friends' networks.

  43. So it's just a very good rule, how is that bad? by Smack · · Score: 5, Informative

    According to the article, it can make a decision on 53% of the total e-mail, and divide it up into Spam or non-Spam with complete accuracy. The key is that it makes no judgement on the rest of the e-mail.

    So you could throw this as a rule into SpamAssassin with a 100 weight on Spam results and a -100 weight on non-Spam results. That could only help your filtering. With zero false-positives.

    1. Re:So it's just a very good rule, how is that bad? by GooberToo · · Score: 4, Interesting

      Or simply not process the 53% with other spam detection software, which saves on CPU! In other words, make this the first anti-spam process, whereby, half of your email gets to skip spamassassin (or whatever). The other 50%, you process as usual.

    2. Re:So it's just a very good rule, how is that bad? by GooberToo · · Score: 3, Interesting

      Oh ya, in case it's not obvious, that means up to a 50% reduction in the small percent of email which are false-positives. That means, if you have a 5% false-positive, you *may* see that reduced to as little as 2.5%! Technically, it may actually be higher than that. The reason being, it may be that 100% of the false-positives fall into the 50% that this technique properly identifies. Needless to say, that's very exciting. It also means that it creates the possibility to allow people to lower their spam threshold without fear for creating a higher false-positive hit rate. That in turn, means more spam identified with fewer false positives. Let's hope reality false close to my rambling speculations here! ;)

      Very interesting indeed!

  44. Only 50%, but no false positives by blorg · · Score: 2, Interesting

    It only works on 50%, but it claims *no false positives* on that 50%. That means that that 50% can be deleted immediately; no-one has to check in case there is a false positive. By contrast, Bayesean filters *will* produce the occasional false positive, so you have to trawl through your spam folder occasionally to check against this. If I could reduce my spam folder checking from 200 mails a day to 100, I'd be very happy.

    1. Re:Only 50%, but no false positives by ichimunki · · Score: 2, Insightful

      The reason it's not giving you any false positives is because it's giving up on about half of the attempts. In my mind those are false negatives because they require additional effort (i.e. the filter errs on the side of accepting the maisl)... and at a 50% rate that's not much help. I don't think I've ever seen a Bayesian filter that was allowed to just give up on 50% of all inputs... and if it was, I'd bet good money that it wouldn't generate any false positives either.

      Paul Graham kind of got everybody thinking about statistical filtering techniques, but people haven't really picked apart his algorithm or looked at ways to tighten it up. Personally I think that path is a lot more promising.

      --
      I do not have a signature
    2. Re:Only 50%, but no false positives by DjReagan · · Score: 1

      So you layer the filtering. Use this as the first run. Throw 50% of the spam directly to /dev/null. In the stuff that gets through, you run your bayesians and your spamasassins. That filters into your spam folder, which you can then scan for false positives - it reduces by half the overall load of manual checking from the situation before, where you were just bayesian and spamfiltering.

      --
      "When I grow up, I want to be a weirdo"
    3. Re:Only 50%, but no false positives by ichimunki · · Score: 1

      And I'm saying that really doesn't help much. If the "Bayesian" algorithm were properly implemented (Paul's "Plan" was a rough draft and the implementations out there are essentially by rote), you would already be able to toss 50% or more of the email as being certainly spam... leaving you with 50% or less "probably spams" to check manually. This method does not sound promising by comparison, since it won't help you even get to the "probably" stage with the remaining emails.

      You could already accomplish a lot of this by using S::A score ranges as a type of triage system to sort incoming mail into high, medium, and low spam probabilities (not that I'd know for sure, I stopped using S::A because I got burned on a few important emails using the default settings).

      --
      I do not have a signature
  45. Algorithm? by daves · · Score: 1

    I'd like to see more about the mechanism for coming up with the score.

    Also, others have mentioned the spoofing problem with To:, From: and CC:. It would be interesting to see how well it would work with the "social network" consisting of the mail servers sending the mail, or with a combination of IP address and To:-etc information.

    --
    People who disagree with you are not automatically evil, greedy, or stupid.
  46. Scorched Earth:Cleaning up the gene pool by ackthpt · · Score: 3, Funny
    Spammers suck, right? And their children have obviously inherited the spamming gene. So, by starving the children to death, we're preventing the spam gene from spreading. It may sound wrong, but we're actually helping society.

    The Spam Gene is actually a regressive gene, not likely it appeared in the parents or ofspring. It's affect is similar to fouling the nest or pissing on food before eating.

    --

    A feeling of having made the same mistake before: Deja Foobar
  47. Useless method... by Anonymous Coward · · Score: 1, Interesting

    Many people need to receive email from people they've never met, like prospective customers.

    How did this get in to Nature? There are far better anti-spam tools like spamassassin & popfile that are far more effective against spam than this technique.

    1. Re:Useless method... by William+Tanksley · · Score: 1

      If you'd read the article, you'd know. This system, unlike the well-known whitelist system, doesn't classify unknown email at all.

      How did this get in to Nature? There are far better anti-spam tools like spamassassin & popfile that are far more effective against spam than this technique.

      This technique, like all the other anti-spam techniques we have, is a tool, not a system. You don't STOP using all the other things when you start using it. This one, unlike all the others that I know of, can work well with the others because when it has a classification, it knows for SURE. If it says an email is spam, the ISP can throw the email away, and not waste time and resources storing it (and therefore not waste the user's resources doing a Bayesian scan, which is actually relatively computationally expensive). If it says an email is NOT spam, further analysis for generic spam isn't needed (although, of course, other types of categorization are often called for, as with the person who wants to block massively forwarded jokes).

      -Billy

  48. This method will ruin a cool part of the net by The+Wing+Lover · · Score: 5, Insightful

    Used to be that one of the cool things about the net was that you would get email from total strangers... "Hi, I'm from {some far away place}. I saw your {Usenet post|web page|profile on some bulletin board site} and really liked your ideas about {something}. I've also been experimenting with {something} and I have some ideas about {whatever}..."

    Now, if we only have emails from our (already existing) friends or friends of friends, then how will we ever meet anybody new?

    --

    - In Capitalist America, law violates YOU!

    1. Re:This method will ruin a cool part of the net by RobertB-DC · · Score: 1

      Used to be that one of the cool things about the net was that you would get email from total strangers... "Hi, I'm from {some far away place}. I saw your {Usenet post|web page|profile on some bulletin board site} and really liked your ideas about {something}. I've also been experimenting with {something} and I have some ideas about {whatever}..."

      This won't go away, unless you like getting email from people who "really liked" *your* web page *and* the hundreds of other web page their spider surfed.

      Remember, since you *did* RTFA, you read that the tool can only say "yea" or "nay" to about half the mail. The messages you're describing, from people that aren't obvious friends or obvious spammers, is in that "other" 50%.

      --
      Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
    2. Re:This method will ruin a cool part of the net by saderax · · Score: 1

      Citing a post from the top of the thread:
      From what I can make out, this system graphs correspondent pairs into correspondence maps, and notes that while normal people all email each other and thus have dispersed graphs, (high clustering coefficient) spammers have a distinct pattern, e.g. 1 person emailing a few million others (low clustering coefficient). There are figures in the article that make this point well.

      To me this would infer that spammers have a high ration of outedges to inedges... IE They send more than they recieve. If Joe Shmoe in another country sends me an email, from his normal email address, chances are his outedge/inedge ratio is going to be less than one (more mail received than sent) or slightly near one (equal amounts of mail sent and received) making it a likely occurence that he is not a spammer.

    3. Re:This method will ruin a cool part of the net by darkmeridian · · Score: 1

      The strength of this technology is that it ranks your friend as well. If he sends out a lot of e-mail and is rebuked, then he is considered a spammer. If you are the only person that he e-mails that was not in his link, then he will not be considered to be a spammer. This is quite a remarkably simple and effective idea.

      --
      A NYC lawyer blogs. http://www.chuangblog.com/
  49. Link to the Research Paper by Nepre · · Score: 4, Informative

    The actual paper that describes this technique can be found here

  50. Centralized System good for ISPs by G4from128k · · Score: 1

    Although interesting, the system would seem to need access to a centralized database of senders and recipients (visibility only a large segment of email traffic). If the system does not have enough records on each sender's other e-mails' it cannot construct an estimate of the social network.

    The scheme might work for people inside a very big network, like AOL. The system would easily notice that one address (either inside or ouside AOL) has sent inordinate numbers of emails to AOL addresses without prior traffic from those AOL addresses to that spammer/sender.

    Bottomline - this is a ISP level solution and wil never be usueful to individuals or small businesses (unless they sell subcriptions to the blacklist).

    --
    Two wrongs don't make a right, but three lefts do.
    1. Re:Centralized System good for ISPs by jfengel · · Score: 1

      As another poster pointed out, SpamAssassin should look into this.

      However, the difference between SpamAssassin and an ISP is that SpamAssassin usually gets only incoming emails. My emails to you don't necessarily go through SA unless you also happen to subscribe. Perhaps SA could offer an authenticated, SPF'ed SMTP server (which would have other advantages for whitelisting, too) to individuals and small businesses whose ISPs don't participate.

      Of course, at that point the line between SpamAssassin and the ISPs begins to blur.

  51. Problem halved -- Yarright by ZakMcCracken · · Score: 2, Insightful

    The remaining half of the e-mail then has to be filtered in a more sophisticated way. But by then the scale of the problem has been cut in half.

    Solving "half" of the problem is pretty useless. Spammers -- assuming this technology is ever be widely adopted -- wouldn't be long to find a way to get their messages in the unfiltered heap. The only ones to suffer damage will be the legit email senders.

    Says the Cat, "Instead of counting all the stars in the sky, you could just count half of them and multiply the number by two. You just halved the problem there."

    1. Re:Problem halved -- Yarright by Anonymous Coward · · Score: 0

      That is how the stars are counted though...

    2. Re:Problem halved -- Yarright by saiha · · Score: 2, Interesting

      This is right on the mark. I think that if this system was widely implemented then we would begin to see more email virus based spamming. Essentually using the infected people to do the spamming to all of the people in their address book. This would in a sense defeat the whitelist method.

      In response to the quote aobut counting the stars, you could use a monte carlo method to count a few stars in random portions of the sky to get a fairly accurate count of all the visible stars.

  52. Main problem by phorm · · Score: 1

    Email addresses aren't as strongly fixed as say, a mailing address (and even those change). Your friend may get a new address and neglect to inform everyone. Or, he may email you from his new address and it doesn't fall in the 50% because it is unknown.

    Another possibility is that somebody new contacts you that doesn't know your friends, or somebody whom you haven't talked to in a long time. I have some friends that I am in/out of contact with for year periods. What if somebody pulled your email off a business card... you'd want them to be able to contact that email. This is where whitelists are a pain, and thankfully I have a website where if I ever do implement one, I can put a responder that says "your address is unknown, please send initial email using page X on my website" for initial confirmation.

    1. Re:Main problem by kirkjobsluder · · Score: 1

      In which case, it gets put into the chunk that is filtered through other means.

  53. More fodder for the mill by daves · · Score: 2, Interesting

    The Bayesian rule is just a mechanism for combining multiple independent estimates into an overall estimate.

    This is clearly an independent estimate, and a good mechanism to improve the overall detection probability.

    What we need is a "meta-Bayesian" process that appropriately weights and combines other spam prediction estimates, not just word counts.

    --
    People who disagree with you are not automatically evil, greedy, or stupid.
    1. Re:More fodder for the mill by ceswiedler · · Score: 2, Informative

      SpamAssassin does this. They use a genetic algorithm to calculate the best weights to give all of the tests they have, where 'best' = least false positives and most accurate positives (on their 'standard' spam/ham corpus).

  54. The Joy Of Statistics by tds67 · · Score: 2, Funny
    The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category.

    So it works 100% of the time in 50% of the cases? There is only a 25% chance that I would be interested in something like this.

  55. University of California = Working on the obvious by DR+SoB · · Score: 1

    What in the crap? Are you to tell me U of Cali actually thinks this is University level material? "Block spam by checking if you've recieved email from sender before"? And the award for MOST BLATANT TECHNOLOGY GOES TO:::!!!

    --
    Mod +5 Drunk
  56. 50% is an F... by msimm · · Score: 1

    The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category.

    We don't need a Band-Aid. We need a real solution. This may be an interesting solution, but honestly, its not acceptable. I really believe buddy lists is probably the way to go (i.e. white lists). At least for email going directly into your inbox, they should be approved senders or friends of approved senders. When we get a solution that can block 99.9% of all spam and can catch up with new exploits as they come up, then I'll be impressed. Everything else is just mental masturbation.

    --
    Quack, quack.
    1. Re:50% is an F... by jfengel · · Score: 1

      I monitor several info@... addresses, which receive mail on spec. Such addresses can't afford whitelists. Unfortunately, these addresses are also generally posted to web sites where they can be scraped. (I'm currently experimenting with putting them up as images, which is inconvenient and probably costs me business, but I'll see what kind of feedback I get.)

      I use Bayes filters on these addresses, they do a pretty good job, though hardly perfect. I would be interested in a technique that halved the spam and halved the false-positives. (Though I doubt it would be quite that high.)

      It is, as you point out, only a band-aid, but a band-aid keeps you from bleeding while serious medical help arrives. (OK, the analogy sucks, but you get the gist.) I have high hopes for serious help some day (SPF, prosecution of the MAY^H^H^HCAN-SPAM act, various payment systems), but in the meantime I'll take partial solutions.

    2. Re:50% is an F... by msimm · · Score: 1

      I agree with you. I'd like to see the concept of email split into two categories: secure email (white listing at the very least, maybe pgp signatures if they can be maded UF/transparent enough, encryption against eavesdroppers) and insecure (basically old fashioned email. This way you'd have you primary account(s) which are just for REAL communication and some secondary accounts you'd have to run filters on and deal with a little bit of junk. I would never advocate for removing our ablity to recieve email from anyone at all. But I'm pretty dead-set against using another lame filter system that will cut 100 of the 200 pieces of junk I recieve every day..

      That said I use bluebottle.com which is now back from the dead (they use a tmda like system with the whitelist/greylist/blacklist and auto-add features). They work pretty good and are the closest thing I've seen so far.

      --
      Quack, quack.
  57. How it works - clustering coefficients by blorg · · Score: 5, Informative
    You can read an abstract, and download the full (e.g. original) article here in a variety of formats.

    From what I can make out, this system graphs correspondent pairs into correspondence maps, and notes that while normal people all email each other and thus have dispersed graphs, (high clustering coefficient) spammers have a distinct pattern, e.g. 1 person emailing a few million others (low clustering coefficient). There are figures in the article that make this point well.

    The system would be ideal for implementation at a fairly high level, (e.g. the ISP level) where systems can aggregate email headers across many different users in order to come up with meaningful graphs. The advantage it claims of no false positives means that it would be feasible at this level.

    I'm impressed; it looks like a very clever idea. My only question concerns how this would deal with mailing lists, which must appear to it like spam?

    1. Re:How it works - clustering coefficients by stratjakt · · Score: 1

      A mailing list would have multiple folks in the To: line, which would be easy to spot automatically.

      --
      I don't need no instructions to know how to rock!!!!
    2. Re:How it works - clustering coefficients by gnu-generation-one · · Score: 2, Insightful

      "My only question concerns how this would deal with mailing lists, which must appear to it like spam?"

      Well mailing lists are, by definition, identical to spam, so far as an automated program looking at each messagae is concerned. Whenever there's a test of spam-filtering programs the "false positives" are mailing lists that the tester forgot to tell the spamfilter about.

      It would be useful to have some way of publishing a list of mailing lists who have permission to send you email -- I'll leave it up to the "all you need is a system of public keys..." crowd to start shouting suggestions.

      And for the people who'll suggest whitelisting based on the From field, don't forget that the spammers can easily put "bugtraq@securityfocus.com" as the sender.

    3. Re:How it works - clustering coefficients by gatekeep · · Score: 2, Insightful

      Whitelist on the from field, and enforce SPF.

    4. Re:How it works - clustering coefficients by edrugtrader · · Score: 2, Insightful

      NO NO NO NO NO NO NO.

      do not filter ANYTHING at the ISP level.

      this is not a suggestion, it is a demand.

      --
      MARIJUANA, SHROOMS, X: ONLINE?! - E
    5. Re:How it works - clustering coefficients by orthogonal · · Score: 4, Insightful

      The system would be ideal for implementation at a fairly high level, (e.g. the ISP level) where systems can aggregate email headers across many different users in order to come up with meaningful graphs. The advantage it claims of no false positives means that it would be feasible at this level.

      Yeah, but I'd consider a high-level analysis of my email headers (either sent or received) to be a violation of my privacy. Whether or not I'm mailing to kinky@alterate.life.styles.com, fringe.politcal.groups.require@free.speech.too.org , unpopular.opinions@free.thinkers.net, or falun.gong@is.banned.by.my.dictator.org, it should be nobody's business but my own.

      Someone will undoubtedly argue that since headers are sent in the clear anyway, it shouldn't matter, but keeping a database of who mails what to whom only makes abuse -- by freelance busybodies or government spies and censors -- that much the easier.

      This is a case, I think, were the threat inherent in the cure is worse than the disease.

    6. Re:How it works - clustering coefficients by wart · · Score: 1

      This new system could be used as part of a larger spam filtering system like SpamAssassin, which assigns weights based on various selection criteria. This would just one more check in determining if a piece of mail is spam or not.

      Does Bayes classify your mail properly? Give it a higher weight. Does this new system work reliably for you? Give the new system a higher weight. If you use a whitelist for your mailing list, then spamassassin won't classify it as spam. This way you can easily let through valid mailing list messages, while other messages with high clustering coefficients get marked up accordingly.

    7. Re:How it works - clustering coefficients by orthogonal · · Score: 2, Insightful

      Yeah, but I'd consider a high-level analysis of my email headers (either sent or received) to be a violation of my privacy

      And in reply to myself. ;)...

      Since the whole point of this is to build social-connection-webs, it's ideal for government crackdown via the guilt by association angle: not only can you find everybody who is emailing to dump.ashcroft@new.american.revolution.org, you can also find -- and investigate -- all the friends of the dissenter, too.

      And for anyone who isn't worried that the FBI occasionally oversteps it bounds in investigating dissent, just consider that the social affinity networks of p2p traders could also be subpoenaed: we know Joe uploads mp3s, let's subpoena his email "buddy list" and investigate all those people too.

    8. Re:How it works - clustering coefficients by mdfst13 · · Score: 2, Insightful

      As someone who used to sysadmin a mail server, I can tell you that this (saving info about who emailed who) is already required. I forget what the limit was, but we were supposed to keep the mail logs (which carry from who to who info) for at least six months. We actually archived them to our write only backup system on a regular basis. AFAIK, they stayed there forever (of course, it's anyone's guess whether or not we would have been able to retrieve them; our backup system had issues--thus the write only tag).

      This proposal does not involve collecting or saving new info. It involves *using* the existing info at a summary data level. Also, understand that it would be the *recipient's* ISP who would do this, not your ISP. This means that they could only collect info on what you send to email addresses on that server, not cross reference it with all the email that you send.

      It's also worth noting that other ISP-level SPAM filters already process this info as well. This isn't a new concept. The new part is that it is trying to use the patterns *before* putting it in the receiver's mail box rather than after it is identified as SPAM by the receiver.

    9. Re:How it works - clustering coefficients by edbarrett · · Score: 3, Funny
      We actually archived them to our write only backup system
      /dev/null?
    10. Re:How it works - clustering coefficients by boelthorn · · Score: 1

      This requires processing e-mail headers and saving who is in communication with whom. I can only speak for Germany, but I think this violates existing law (Datenschutzgesetz). Anyone familiar with law who is able to clear this up?

    11. Re:How it works - clustering coefficients by Tomble · · Score: 2, Interesting
      SPF? Very neat, hadn't heard of that before. About time somebody did something about the whole header forgability issue- IMO that alone (unless I've misunderstood what it's designed to stop) would be enough to deal with most spam anyways.

      Before I saw your posting, I was thinking that perhaps one way to deal with it would be for a similar approach to the "social networks" and "web of trust" ones to be applied to the servers and networks themselves: each network could keep a list of mail servers on other networks that they trust to not be open relays or spam hosts, etc, and for mail sent from other servers, they could check the lists that other trusted networks keep. They could then choose to add those servers to their own lists too if they turned out to be OK. Some means would need to be made for new servers to be able to get on somebody's list, of course...

      But the point is ultimately, that dealing with the Spam issue by filtering on the content is just stupid, it's a losing battle as they keep finding new stupid ways to get past the filters, and the filters will always have some risk of blocking legitimate emails. What if I send a parody of a spam to a friend as a joke? And if we only use filters at the user's end, the burden of the traffic is still felt by our ISPs and email providers. There HAS to be a way to block it at the source.

      --
      Be careful! New moon tonight.
    12. Re:How it works - clustering coefficients by Anonymous Coward · · Score: 0

      Use anonymous remailers.

    13. Re:How it works - clustering coefficients by mrogers · · Score: 1
      not only can you find everybody who is emailing to dump.ashcroft@new.american.revolution.org, you can also find -- and investigate -- all the friends of the dissenter, too.

      In the UK they can already do this without a court order, under the Regulation of Investigatory Powers Act and emergency powers enacted after Sept. 11 2001. They can also look at web server logs, details of phone calls made and received (although not the content - whoop dee do), and find out which base stations your mobile phone communicated with (effectively allowing them to track your movements). Since all this can be done without a court order, I assume a comprehensive "social map" of the UK has already been drawn up.

      The best part is that "they" aren't just MI5 or the police - the Home Office was considering giving every government employee access to this data. Know a government employee? Know someone you'd like to stalk? Small world, isn't it?

  58. Oh, I get it. by Anonymous Coward · · Score: 0

    Their implementation sends all email to /dev/null. It works great on the half you don't want to see.

  59. for a MUCH more interesting read... by germinatoras · · Score: 2, Funny

    Try the link at the bottom of the page:
    Sniffing stools speeds diarrhoea diagnosis
    19 February 2004
    http://www.nature.com/nsu/040216/040216-13.h tml

  60. A better option by netfool · · Score: 1

    The average system:
    1) Accept all email
    2) Filter
    3) Hope an important email isn't filtered improperly. If it its, go digging through the trash/junk folder looking for it.
    4) Read

    A better option:
    1) Deny all accept from the addresses and/or domains I have specifically chosen to accept.
    2) Read

    Wouldn't that make more sense?

    --
    Left 4 Dead Gaming Group - http://www.l4dgg.com
  61. my conclusions by WormholeFiend · · Score: 1

    After reading much of the debates here and elsewhere on spam, I think it all comes down to ignorance and stupidity.

    I mean, you can make something illegal and provide for harsh legal punishment for any activity, and some moron will still find a way to do it.

    Just look at pyramid schemes. You'd have to have been living under a rock to not know these things 1) dont work and 2) are illegal. Yet...

    I think that spam is here to stay, just based on the fact that it's impossible to eliminate ignorance and stupidity.

    Hell, in some States, despite having the death penalty, murders still happen. /rant

  62. yeah right by lemody · · Score: 1

    >The e-mail clusters can be mapped out by
    >inspecting the 'from', 'to' and 'cc' fields in a
    >user's inbox. An automated system can quickly
    >build up a blacklist of spammers, as well as a
    >'whitelist' of approved sources.

    hmm but the sender can decide whatever he wants to put on those fields...

    in business critical mails :

    1. blacklisting is never a good idea.
    2. whitelisting is only sometimes a good idea

    i am not giving any better solutions here, but this is not the ultimate one what we are waiting.

    --


    class he-man extends man!
  63. Personal vs Business contacts by sczimme · · Score: 1


    This idea could be helpful in a [relatively] closed social environment, but would be disastrous in a business environment where a fair percentage of incoming mail might actually be from strangers. We call these strangers "potential customers"...

    --
    I want to drag this out as long as possible. Bring me my protractor.
  64. SPAM IS "FREE" by Thud457 · · Score: 1

    If your time is "worthless".
    (And your bandwith, and your storage, and your CPU...)

    --

    the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff

  65. Bigger Issue... by glpierce · · Score: 3, Insightful

    While this may work for teenagers, it has no use in the business world. In the last week, I've gotten two dozen vital emails from people I did not previously know (professors at various grad programs). In that period, I haven't gotten a single message from people I know (or who know someone I know), because I have conversations with friends them face-to-face, over the phone, or through instant messages. This sort of filtering just removes the most important reason for the existence of email, which is replacing snail-mail, not replacing conversations.

    --
    G
    1. Re:Bigger Issue... by stephens_domain · · Score: 1

      I communicate with my friends through coded slashdot posts.

      --

      ..
  66. I guess that pigs have wings. by Henry+Stern · · Score: 3, Interesting

    I never thought that Slashdot would help me find papers relevant to my research!

    I think that their idea is good from a technical point of view, but very bad from a privacy point of view. I am of the opinion that gathering social network information is extremely dangerous. A pertinent example: If your friend is branded a "terrorist," then "they" can exploit the information that you have voluntarily provided to then put you on a "terrorist" watch list.

    Another example: Say that someone who knows someone that you know actually buys something from a spam. If the spammer can access the social network information, suddenly your little niche of the network is going to be aggressively spammed. After all, like minds congregate.

    There is no doubt in my mind that the black hatters will infiltrate the social network communities and use that information to spy on potential viewers. See this bugzilla thread where the folks from Atriks Professional Email Deployment Service follow SpamAssassin's development and adapt their "ratware" tool accordingly.

    The biggest problem with collecting social networks is that once the data has been gathered, it is very hard to control. Those of you using Orkut should think long and hard about it.

    In conclusion, I think that this is technically a good idea but it opens a Pandora's box.

    1. Re:I guess that pigs have wings. by jfengel · · Score: 1

      The data exists, whether you like it or not. Your ISP knows who you email, and who emails you. As long as it bills you, it has some sort of meatspace link back to you. I'm not sure if it's legal for them to be saving that data, but I suspect that it is. Encrypting the contents of your messages is easy, but developing truly anonymous drop-boxes relies on trust.

      Even anonymizing services leave some traces to you: they can record your IP address. Access it from home or the office and you can be traced. It all depends on just how hard the governments wish to work.

    2. Re:I guess that pigs have wings. by Henry+Stern · · Score: 1

      Gathering all of that information on all of those people is very difficult. However, in this situation, the user does all of the work and publishes it in a centralized location. If you're making the information publically available, you can no longer reasonably expect it to be private.

  67. It looks like it relies on CC by DeadSea · · Score: 1
    It looks like it relies on the CC field.

    If you get a message from Bob that was also CC'ed to Alice, then it knows that you, Alice, and Bob a cluster and are likely to be friends. Emails from Alice would be whitelisted because of this.

    To work, this means that your friends have to know each other and send out group emails using the CC field. You can see why it would only be able to whitelist about 50% of your email.

    It also appears that spammers could fool this by adding another of their addresses to the CC field, sending you spam, and then sending you spam from the other address. At that point, the other address would be whitelisted. Although it may work now, once this starts to be widely used, spammers will find ways to pollute it.

  68. Erm, not by Vainglorious+Coward · · Score: 5, Informative
    The [envelope-sender] cannot be spoofed in most cases

    Simply : untrue. It's as easy to fake the envelope sender as it is the From: header. I think you're getting confused with "Received" headers, where each mail system inserts its own bit of tracking information. The envelope-sender is completely under the control of the sender, and (usually) propagates un-modified as an email is handed between systems (indeed, one of the criticisms of SPF is that by modifying the envelope sender you break forwarding).

    --
    My next sig will be ready soon, but subscribers can beat the rush
    1. Re:Erm, not by MyFourthAccount · · Score: 2, Funny

      I think you're getting confused with "Received" headers, where each mail system inserts its own bit of tracking information

      Which, for all completeness, is now also totally useless since spammers use compromised boxen to do the dirty work from them (hence you can only track it back to some worm-infected box owned by grandma who's just been taken to the hospital with a severe cramp in the left side of the body after pressing the 'Ctrl' key 4,523,098 times when the computer said 'press any key to continue'. This, of course, after the RMA'd keyboard arrived, which yet again did not contain the 'any' key, but did come with a friendly letter clearing up the issue.)

      All seriousness aside, as an owner of a common word domain name, I get to be the target of many a spammer. Not in the To field, but in the From field.

      For said domain, I receive everything that is sent to *@mydomain.tld. I used this to keep track of which people would sell my email address. For example if I had to register with shavedpussy.com, I'd give them the email address: shavedpussy.com@mydomain.tld. Now when I get spam at that email address I know I can't trust shavedpussy.com and it hurts my feelings.

      Well, the motherfucker, fudgepacker [no, sorry, I take that back. I'm ovbiously drawing a blank, there's gotta be a better suiting swearword out there.] spammers have decided that it would be a great idea to send their crap from those owned computers, forging the From field to something like randomcrap@mydomain.tld.

      So now I get hundreds of emails a day from all those friendly mail servers around that world that Jake is Out Of the Office, and that sillybunns@telstar.com is not a Known User. I'm the most grateful person on the planet, obviously, to have been relayed this information. I think the SMTP protocol is swell and any software that automatically replies to email is a Good Thing(tm).

      So my sneaky system has been turned against me by the exact people that I was trying to defeat. Now I have to block *@mydomain.tld and specifically add any new email that I assign. I'm extatic, because it's not a lot of work at all and just in general I'm bored most of the time, so I can use the distraction.

      It actually didn't work that well anyways, because after receiving spam to mom845@mydomain.tld I realized that mom just couldn't resist the excitement of sending just one more eCard because this one was just too funny to not send. At least she stopped forwarding me chain-letters (which she really wasn't into, but this one was for a good cause) with all the email addresses in the To or Cc field. She's good now, she puts the addresses in the Bcc field. Of course after learning of this technique she broadcasted an email to everyone she knew to make sure that they were aware of it as well. Cc: mom452@mydomain.tld.

      The point of my story: let's say I have changed my mind about the right to bear arms. And I understand that the intention of the constitution may not be my interpretation of it and all, but times change and since spammers didn't exist when the constitution was written, I figure I'm a pretty well regulated Militia, and spammers, well, they just screw things up. (I'm still working on the wording of that a little, it's become terribly hard to interpret the part about security and stuff, especially now that Ashcroft is playing grab-ass with anyone willing to pitch in a dime to keep Patriot Act II moving along, but that's an whole nother can of worms. Speaking of worms....).

    2. Re:Erm, not by plugger · · Score: 1

      Interesting post but man, are you on speed or something? That's one hell of a detailed story.

    3. Re:Erm, not by welsh+git · · Score: 1

      >For said domain, I receive everything that is sent to *@mydomain.tld. I used this
      > to keep track of which people would sell my email address. For example if I had
      > to register with shavedpussy.com, I'd give them the email address:
      > shavedpussy.com@mydomain.tld. Now when I get spam at that email address I
      > know I can't trust shavedpussy.com and it hurts my feelings.
      > Now I have to block *@mydomain.tld and specifically add any new email that I
      > assign. I'm extatic, because it's not a lot of work at all and just in general I'm
      > bored most of the time, so I can use the distraction.

      I do the same thing, for the same reasons, however I use a subdomain for this - can't you do the same thing ? i.e. remove the *@mydomain.com like you have, but create a *@myfourthaccount.mydomain.com and start using that instead

      --
      Sig out of date
  69. Sorry: that link is the full pdf, here's abstract by blorg · · Score: 4, Informative
    Sorry, that is a link the entire pdf of the article. This is the abstract, which you may as well have here if I'm posting again (on the linked page, you also have other formats available, as well as mirrors):

    We provide an automated graph theoretic method for identifying individual users' trusted networks of friends in cyberspace. We routinely use our social networks to judge the trustworthiness of outsiders, i.e., to decide where to buy our next car, or to find a good mechanic for it. In this work, we show that an email user may similarly use his email network, constructed solely from sender and recipient information available in the email headers, to distinguish between unsolicited commercial emails, commonly called "spam", and emails associated with his circles of friends. We exploit the properties of social networks to construct an automated anti-spam tool which processes an individual user's personal email network to simultaneously identify the user's core trusted networks of friends, as well as subnetworks generated by spams. In our empirical studies of individual mail boxes, our algorithm classified approximately 53% of all emails as spam or non-spam, with 100% accuracy. Some of the emails are left unclassified by this network analysis tool. However, one can exploit two of the following useful features. First, it requires no user intervention or supervised training; second, it results in no false negatives i.e., spam being misclassified as non-spam, or vice versa. We demonstrate that these two features suggest that our algorithm may be used as a platform for a comprehensive solution to the spam problem when used in concert with more sophisticated, but more cumbersome, content-based filters.

  70. *Sigh* by NanoGator · · Score: 2, Insightful

    All this work to stop spam, and ICQ's done it for years.

    Frankly, a series of filters is probably the worst approach at stopping SPAM. It's a game of "make the filter, defeat the filter, and risk not getting important mail." Why bother? The solution lies in a different approach. Authorization. There needs to be authorization layers in order to defeat spam. We need buddy lists, we need blacklists, we need the ability to request authorization, etc.

    I realize that fixing this problem isn't a simple one given the scale in which it's used. But man, I really wish somebody'd figure out how to do the transitory work. I'm almost completely reliant on ICQ and Private Messaging on forums in order to keep up with everybody.

    --
    "Derp de derp."
    1. Re:*Sigh* by liquidsin · · Score: 1

      Boy, did I ever bork that. Here goes again:
      I know it comes up all the time when spam comes into the conversation (what's that, like 50 times a day?), but you should check out TMDA. It's not for everyone, but it's pretty much what you're describing. Whitelists and blacklists. Unknown senders get a message back telling them to reply if they want you to recieve their mail. When they respond, they get added to the whitelist. Spammers don't get that, since they spoof their from address anyways. I use it at home and I *never* see spam.

      --
      do not read this line twice.
    2. Re:*Sigh* by feepness · · Score: 1

      This sounds cool, but how does this work with a mailing list that you subscribe too?

  71. a link to the technical pdf by 0xfc · · Score: 1

    http://www.arxiv.org/abs/cond-mat/0402143

    Geez, that article was written very poorly.
    This was the link at the bottom of the article
    giving much more technical detail.

    I will admit, on my first read, i did not quite understand it. Hopefully after reading some informative posts it will clear it up.

    and while we are discussing spam, i would like to mention spamhaus.org is a very shoddy blacklist. Their policies are anti business. I would recommend not using them anymore.

  72. Re:Want NOSPAM? Bring back the BBS by AndroidCat · · Score: 1

    But there was spam on BBSs. Any number of times people would attempt to post/mail some kind of Make Money Fast scam on my BBS. The lameness filter would almost always catch it, but they kept trying.

    --
    One line blog. I hear that they're called Twitters now.
  73. Spammers are already countering this by argent · · Score: 1

    Not only is this well known and widely implemented - for example Apple mail's junk filter automatically accepts messages from people in your address book - but spammers are already countering it. They regularly forge the source address of messages based on the sending domain. But they can do better. The viruses they're already using to relay spam have access to the information about the "network of friends" in people's address books.

    We can already do much much better than this. If you're prepared to lose a little mail from people who couldn't be bothered jumping through a minimal hoop, token-based spam blocking (such as challenge-response, signed messages, or the token RISKS posts require) can give you near 100% protection with minimal cost.

  74. Alpha and Beta errors by oneiros27 · · Score: 1
    When you're dealing with this sort of thing, you have a population of items that fall into a category, and those that don't. You are attempting to sort each element into those two categories. You therefore have four potential outcomes:
    • In the category, classified as in the category (correct)
    • In the category, classified as being not in the category (error)
    • Not in the category, classified as being in the category (error)
    • Not in the category, classified as being not in the category (correct)
    So, there are two types of 'correct' situations, and there are two types of 'incorrect' situations.

    Depending on what e-mail is being used for, it may be acceptable to one person to lose an e-mail message, if it means that there are 100 spam messages they never see. For others, it may be 1:10. For others, there may be no acceptable level of lost mail that justifies spam filtering.

    They claim that they are able to do that last item -- they can correctly identify spam, with absolutely no false positives. Of course, I have no idea how well this works on the 'remember me from high school?' type messages [I've only gotten 4 of them in 10 years].

    As for the viruses -- viruses tend to vary much less than spam messages, and are much, much easier to block, and to prevent false positives on. Although you might get some virused messages at first, once the definition is updated, they do not trash good messages. [when they're done correctly... I think there was some bad virus definition a few years back that triggered on 'p' being in the body, or something stupid like that]

    Spam is a much more subjective thing, which is why it hasn't yet been eliminated. [and yes, I suspect that all that will happen from this is that spammers will write viruses to mail your addressbook to them, so that they can write better spam. Or modify a virus, so that for each mail you send, you also send a spam to your recipient].
    --
    Build it, and they will come^Hplain.
  75. Reverse MX DNS querying by germinatoras · · Score: 3, Interesting

    I've been thinking about this method for a while - basically, you configure your SMTP server to do this:

    • MTA connects to you, gives you a MAIL FROM: xxxxx@somedomain.com
    • Your server performs a MX query for somedomain.com, getting a list of IP addresses
    • Your server compares the IP of the connecting MTA to the list of IPs in the MX records.
    • No match? Connection gets aborted.

    This idea is cleary too simple to have not been thought of before - but I have yet to find a good explanation as to why it won't work. Verizon.net uses this exact method - try sending a SMTP message from a host that isn't listed in your domain's MX records, you get a 550 Sorry, you aren't allowed to mail for this domain". or something comparable. How come this method isn't more widely used? Going through my own SMTP server logs show that the vast majority of SMTP servers sending legit mail are also listed in the domain's MX records. The only price is that you require the sender and receiver to be the same within a domain - hardly an unreasonable requirement.

    1. Re:Reverse MX DNS querying by argent · · Score: 2, Informative

      This won't work because the incoming and outgoing mail servers of just about any large organization have nothing to do with each other.

      In fact one of the rules I use blocks messages that claim to come from the MXes of certain large service providers because such messages are 100% spam from spammers who already thought of your idea.

    2. Re:Reverse MX DNS querying by germinatoras · · Score: 1

      I guess I must be working with different data sets. My spam comes almost exclusively from adsl-23l3-202.dynamic.some-isp.net. Thus, from my point of view, it seemed like a great idea. Hypothesis contrary to fact, I guess. Perhaps I should post my cleartext e-mail in a Usenet forum to get a more sizeable data set.

      I agree that the mail server doing the sending and the server doing the receiving can be entirey different - but in my correspondence, I very rarely see a sending MTA which isn't also listed in the MX section of the sender's domain.

    3. Re:Reverse MX DNS querying by catdevnull · · Score: 2, Informative

      we tried to implement this very method. it had very good results in drastically reducing the spam levels we were getting. Unfortunately, it also excluded small business and .orgs who didn't have their mail servers entered correctly if at all in the DNS. Although the "unclean" but legit mail servers were only about 2-3% of the total incoming mail, it was still enough "false positives" to make us have to open up the fort again. :(

      until everyone jumps on the bandwagon of MX registration, this method won't work. Required SMTP auth would be nice--at least it would be a bit more traceable. As long as 1/10th of 1% of spammers reply to spam msgs, then those damn spammers will think it's profitable. spammers die!

      --

      I might know what I'm talkin' about, but then again, this is Slashdot...
    4. Re:Reverse MX DNS querying by athakur999 · · Score: 2, Informative

      There is already something out there that's pretty similar to what you're suggesting. It's called Sender Policy Framework.

      Basically, as part of your DNS entry, you have a record containing a list of all of the addresses that are allowed to send email on your domain's behalf. I think there was a story on Slashdot a few weeks ago about it as AOL has starting using it.

      --
      "People that quote themselves in their signatures bother me" - athakur999
    5. Re:Reverse MX DNS querying by argent · · Score: 1

      Oh, I have a very good data set. I've had the same email address for over 15 years and I've been very active online.

      I used to get an enormous amount of spam from dynamic IPs (what we used to call 'direct from dialup'), but there's some very good blocklists that keep most of them from getting as far as "MAIL FROM", and I'm working on an "amber list" program that should cover most of the rest of the dynamic-IP problem.

      But I have to say I haven't tried checking to see if the outgoing mail server is *also* listed as an MX. It hadn't occurred to me to try... I don't have any backup MXes myself, for example, because unless you have a second dedicated server you're not sharing with other people also doing backup there... they just complicate the spam filtering problem and end up increasing the spam load, and I wouldn't be surprised to find that there's a lot of sites in the same situation.

    6. Re:Reverse MX DNS querying by DigitaLunatiC · · Score: 0

      If I'm not mistaken, even Road Runner does this, or something with similar end results. While I'm on campus, I'm unable to send mail from my Road Runner account. But I think this might be part of the problem with it. What if somebody is away on business and can't access their regular ISP? They would be unable to send mail whilst on the road, and that wouldn't sit well with Corporate America at all.

    7. Re:Reverse MX DNS querying by kindbud · · Score: 1

      Our outgoing mail servers are not the same as our incoming MX mail servers. Your method would reject any emails from our 5,000+ employees. I can tell you that it isn't so that Verizon.net does this, or at least they don't do it to us. Our emails are not rejected by Verizon.net, nor have I ever seen any bounces from anyone because of this.

      The method you describe would reject a lot of legitimate emails, especially from medium to large companies and medium to large ISPs who have a more complex mail system than one MX relay.

      --
      Edith Keeler Must Die
    8. Re:Reverse MX DNS querying by germinatoras · · Score: 1

      Rats. So there's a critical mass of MXes out there which either 1) aren't listed correctly in the DNS, or 2) are deliberately not listed as MXes because they are "send-only" SMTP servers. Dang.

      Well...maybe this MX test could be one consideration among several? Messages from a non-MX host would be automatically assigned some negative bias in the whole spam-filtering process, to make them more likely to be flagged as spam. I'll have to toy around with the idea.

    9. Re:Reverse MX DNS querying by complete+loony · · Score: 1

      i think the MAIL FROM: and / or HELO information (the envelope header) should be tracable back to the actual machine / connection that was used to send the email. That said the From address in the email body should allow almost anything, for example so you can send email "from" your work address while on the road and without access to the work email server.
      Why is it that most email clients use the same value for the MAIL FROM and the From in the email body? I think a trustworthy mail server should replace the FROM with say hash@their.domain and remember who the original MAIL FROM and body "From:" were to assist in reporting abuse.
      At the ISP / small business level it's not too hard to verify that the sender is who they say they are. The problem then becomes one of how to trust that other mail servers are trustworthy and honest, and how bad is their reputation for sending spam.
      I think we will eventually need an email signing process similar to the process used in HTTPS, so that the receiver can verify that the messages was sent by someone with the key for the domain / user, or was transmitted through the domain's email server. I realise this would be hard to implement especially as backward compatibility must be maintained until all mail servers use an approach like this. But even if this was only implemented between the major ISP's, and all of the ISP's customers were forced to relay email through them, something needs to happen to make the original sender accountable for their abuse.
      You will always have the problem of compromised machines sending spam through legitimate channels, but we need a verifiable audit trace to make email senders accountable for their actions.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    10. Re:Reverse MX DNS querying by Anonymous Coward · · Score: 0

      Too easy to get around.

      Get a throw away domain...

  76. I once had an evil idea by WormholeFiend · · Score: 3, Interesting

    to deal with open relays in China...

    I would ve harvested the emails of as many members of the ruling communist party as possible, and used those relays to spam them with anti-communist propaganda. I believe the consequences would've been swift and ruthless.

    Unfortunately I cant read/write Chinese, and this idea wouldnt work in less repressive regimes...

  77. Half of the spam by roman_mir · · Score: 1

    50% of spam stopped sounds good, but what if 50% is 350 Billion email messages? Spammers only have to double their messages to go around this 'filter' to produce the same volume tomorrow as they produce today.

    What I would like to see is a spam signature sharing, Spam Detection Servers SDS would collect hash per spam email sent within a time period. An email will have to be stopped on any email server and verified against an SDS to see if it is not spam before sending it further. How would these SDSs collect the signatures? Feedback from email users, black lists, good filters etc. All email servers will have to register with SDSs, or they become black listed.
    But you probably can tell me why this is not going to work, can you?

    1. Re:Half of the spam by Anonymous Coward · · Score: 0

      Mod dis mofo up!

    2. Re:Half of the spam by Anonymous Coward · · Score: 0

      Spammers randomize each message send by adding random words at the begging or end of each message. So when a spam message is sent out, each copy will have a different hash.

      What you need to do is filter on common spam phrases in the message "For a limited time", "Click here to order" Viagra, etc. You assign a weight to each phrase. Scan the entire message and sum up the weights of the search hits. Higher the count, the more likely the message is spam.

      Currently I have a Regular expression database that contains about 900 phrases and is able to filter out 98% of all spam. I have about a 0.5% false positives. I am working on tuning the weights so that the false positives is near zero.

  78. Re:Sorry: that link is the full pdf, here's abstra by hpavc · · Score: 1

    web of trust + web of familiarity via correspondence?

    --
    members are seeing something, your seeing an ad
  79. public access to address books? by bob_jenkins · · Score: 1

    Wouldn't this have to make my address book public in order to work? The trend recently (among receivers of email, at least) has been to hide email addresses, not publish them and annotate them with personal information.

  80. Mailing lists / newsletters by blorg · · Score: 4, Insightful
    A mailing list would have multiple folks in the To: line, which would be easy to spot automatically.

    Not necessarily, indeed most professional ones avoid this. While many spams do contain multiple people in the To: field (but also many don't). One way or the other, I don't think this is relevant if we are trying to compare the graph of a mailing list to that of a spammer. To take an example, user slashdot-headlines@newsletters.osdn.com sends thousands of emails to people *who don't know each other*. User enlargeyourdong@hotmail.com has exactly the same pattern. How do you tell these apart?

    1. Re:Mailing lists / newsletters by sab39 · · Score: 2, Interesting

      Easy - those thousands of people who don't know each other also send email *back* to the mailing list. Only a few dummies send email back to the spammers.

      For something based on statistics, the difference would likely be very noticeable.

    2. Re:Mailing lists / newsletters by The+Dakota+Kidd · · Score: 3, Insightful

      According to the paper this article is based on, the algorithm is effective against messages with multiple recipients in the To: or Cc: headers. This means that messages coming from slashdot-headlines@newsletters.osdn.com would probably be in the unclassifiable half. Indeed, a good chunk of spam these days would be unclassifiable according to this algorithm.

      However, the whitelist that this algorithm generates would still be valid. To me, this is the real strength of the algorithm, to be able to generate a white list with no input on my part.

    3. Re:Mailing lists / newsletters by dzelenka · · Score: 1

      ..."Only a few dummies send email back to the spammers."

      This filter suggests the possibility of identifying those that DO respond to spam. Spammers, and those that hate spammers, would both be interested in that little bit of information!

      I suggest hunting them down and killing them.

      --
      Bah!
  81. bcc to all! by Datoyminaytah · · Score: 3, Insightful

    These people don't seem to realize how SMTP works. The RCPT command doesn't distinguish between types of recipients, it's up to the sending process to "play nice" and put that information in properly created headers.

    A spammer could manipulate the To and CC headers as necessary to fool filters that analyze them, without affecting the ACTUAL list of email addresses to which the email is sent.

    I don't think spam can be stopped without replacing or overhauling SMTP, and then ceasing to support "old" SMTP. But that ain't gonna happen anytime soon. (sigh)

    --
    assert(birth_date<time-86400)
  82. The joke network would fail on this. by Anonymous Coward · · Score: 0

    You all know what I mean. Your idiot(friend, parent, coworker,or spouse) that can't help but send you mail you don't want but you are socially not in a position to refuse will get through because they also send you legit mail you DO want/need to read. POPfile sorts this out for me very well. Some how I doubt this system would do as well.

  83. The important question is: by Anonymous Coward · · Score: 0

    Bud do his daughters suck?!!

  84. Hey, our .sigs match! by cthulhubob · · Score: 1

    [nt]
    ----

    --

    In post-9/11 America, the CIA interrogates YOU!
  85. We need a 'calling card' mailing standard/protocol by Anonymous Coward · · Score: 0

    Spam is so annoying at this point that I'm going to start doing something similar to the approach used in the article and start using my saved messages and contacts in Evolution as a whitelist and just trashing all other mail.

    As way to get on my whitelist, what I would love to have is a standard for sending unsolicited 'calling card' messages to me. For example, a calling card message could simply be a normal e-mail with an empty body and a restricted subject line (to, say, 64 characters). Assuming I could at least ping the address from which the calling card (nominally) came, the callling card would be added to my Calling Card Inbox.

    That way, if a person that I've never interacted with before wants to correspond via e-mail, they would send a calling card message to me. If I want to accept e-mail from them, I could include them to my white list and let them with a (standard) response to their calling card.

    This isn't really all that different from the handshaking that mailing lists perform when you first subscribe to them.

    So, anyone else interested in a calling card standard?

  86. The joke network would fail this. by nlinecomputers · · Score: 1

    You all know what I mean. Your idiot(friend, parent, coworker, or spouse) that can't help but send you mail you don't want but you are socially not in a position to refuse will get through because they also send you legit mail you DO want/need to read. POPfile sorts this out for me very well. Some how I doubt this system would do as well.

    Note this is a double post. In a moment of stupidity I posted it AC. Someone please mod it down redundant.

    --
    Slashdot, home of supporters of free software, free music, and free speech.Except for Moderators that disagree with you.
    1. Re:The joke network would fail this. by William+Tanksley · · Score: 1

      You'd certainly still be running POPFile -- this method only creates a determination for 50% of email anyhow, and the remaining 50% still has to be classified. The nice part is that it works at the receiving ISP level, so you never have to see the "spam" part of the 50%, and your ISP doesn't have to store it.

      This would pass the "joke network" emails right on through, and POPFile would be able to work on them -- but how nice, POPFile wouldn't ALSO have to wade through all those other nonsense emails.

      This isn't marketted as a cure-all, only as an augmentation of what we do now. It's also apparently unique, since it requires relatively low overhead and can be done in a distributed way -- that is, it's practical for even large ISPs, but it doesn't require a centralized list like the blacklists do.

      -Billy

  87. 50% isn't very useful by TheLink · · Score: 1

    There are other existing antispam solutions that automatically filter out 80-95% of spam with very low false positives.

    Heck even the static filters I set up on my email program do far better than 50% (though probably not as good as spamassassin or stuff like that). Not telling you how I do it tho ;).

    --
  88. Some of us rely on e-mail from strangers by beagle72 · · Score: 5, Insightful

    The proposed anti-spam clustering technique is of course a variation on whitelisting. While clever, it fails to address a problem I have not often seen addressed. Many people defend themselves from spam by obscuring their e-mail addresses in public places, and perhaps by using whitelists to prefer known senders. This may be effective for many people.

    However, some of us can't avoid having a publically available e-mail address. For example, writers such as myself rely on feedback from readers who are, in nearly all cases, strangers (and sometimes strange, but that's another story...) Avoiding false positives from strangers is very important to me. I want their messages. But, since my e-mail address is published frequently (hence no reason to hide it here), I obviously receive a ton of spam.

    For the past few months I have experimented with a plug-in called BayesIt! for the Windows email reader The Bat!. As the name implies, it's a bayesian filter. The nice thing about BayesIt is that I could point it to my already-stuffed spam folder and train it on thousands of messages in one go. So far it has worked out rather well. No false positives, and only about 10-20 false negatives per day (out of approx. 400 spams).

    Still, in the long run I support proposals that shift the economics of e-mail in ways that have minimal impact on human beings while making spam unprofitable. Changing the economic model of spam is the only sure solution; relying solely on technology will simply keep us locked in an ongoing arms race.

    -Aaron

    1. Re:Some of us rely on e-mail from strangers by kirkjobsluder · · Score: 1

      However, some of us can't avoid having a publically available e-mail address. For example, writers such as myself rely on feedback from readers who are, in nearly all cases, strangers (and sometimes strange, but that's another story...) Avoiding false positives from strangers is very important to me. I want their messages. But, since my e-mail address is published frequently (hence no reason to hide it here), I obviously receive a ton of spam.

      Um, read the actual proposal. It addresses your concerns. Email messages from stranger that are sent just to you will get through. Spam that uses a dictionary attack will not.

      Still, in the long run I support proposals that shift the economics of e-mail in ways that have minimal impact on human beings while making spam unprofitable. Changing the economic model of spam is the only sure solution; relying solely on technology will simply keep us locked in an ongoing arms race.

      The proposal points out that you don't need high filtering efficiency to make spam unprofitable.

  89. Most newsletters are one-way by blorg · · Score: 4, Insightful
    Easy - those thousands of people who don't know each other also send email *back* to the mailing list. Only a few dummies send email back to the spammers.

    Most mailinglists and newsletters are one way - I'm not talking about discussion lists or listservs, but rather about the bot that sends me Slashdot headlines, Jakob Nielsens' Alertbox, Fred Langa's newsletter, and even commercial speech that I am signed up to and want to hear such as Komplett's weekly offers, or Ryanair's cheap flights, etc.

    1. Re:Most newsletters are one-way by Beardydog · · Score: 1

      The weekly nature might help. Spam tends to come from flase and constantly changing addresses.

    2. Re:Most newsletters are one-way by benna · · Score: 2, Informative

      Another possible problem could be confirmation emails when you sign up for a mailing list or message board or something. This would be even more dificult to tell from spam than newsletters. Also you have no way of knowing the email address it will come from to add it to a whitelist.

      --
      "It is not how things are in the world that is mystical, but that it exists." -Ludwig Wittgenstein
    3. Re:Most newsletters are one-way by jonadab · · Score: 1

      > Most mailinglists and newsletters are one way

      Newsletters and announcement lists are one way, but they send mail only
      once every week/month/whatever. Spammers send 24/7 pretty much. I'm not
      sure how well the method deals with that difference, but it is potentially
      possible to devise one that does.

      Discussion lists of course are a no-brainer, because there's traffic going
      out from the users to the list.

      The other way to deal with announcement lists of course is to have the user
      add them to a whitelist when they sign up for the list. This would not be
      something you'd want to do at an ISP level (at least not for all users --
      maybe for users who knowingly sign up for it), but it would be something
      people who get a lot of spam could resort to relatively painlessly; it's
      already a multi-step process to sign up for most such lists, giving them
      your address in usually a web-form, getting a confirmation message, going
      to the special tokenized address it gives you... adding one extra step to
      that process (whitelist the sender) would be an annoyance, but for those of
      us who get a lot of spam it would be less of an annoyance than the spam.

      I must say, I am dubious about the claims of no false positives, but if it
      proves to be true, sign me up. False positives are the bane of bayesian
      filtering -- what good is it if you have to go through the spam folder to
      make sure there aren't any false positives? At that point you might as well
      filter your real mail off into folders, let the spam fall into the inbox,
      and go through it there to check for real mail that got missed. (This is
      what I currently do, although I filter some spam off into folders using
      methods that don't get false positives, so that I don't have to look through
      them. Filtering on character set is my most effective technique for this; it
      can't possibly get false positives because the idea of a legitimate message
      in a character set I can't read is as far as I'm concerned inherently
      oxymoronic; even on the off chance that the sender is not a spammer, the
      message is still not one that I can usefully read.)

      --
      Cut that out, or I will ship you to Norilsk in a box.
    4. Re:Most newsletters are one-way by jonadab · · Score: 1

      > Also you have no way of knowing the email address it will come from to add
      > it to a whitelist.

      You almost always know the domain it's going to come from, and you almost
      always know (within a couple of minutes) when it's going to come, so you
      could set your filter to a mode where it holds probably-spam messages in
      quarantine and use a web interface to pick the one in question off the list.

      False positives that can be *predicted* aren't the problem. What worries
      me more is the ones that would surprise us. Like I said upthread, if I
      have to go through the list of all the spam messages looking for false
      positives, then the system is no better than what I have now, wherein (most
      of) my spam stays in my inbox and almost all of my legitimate mail gets
      picked up by various regex filters and sorted into folders. Then I look
      through the list of messages in the inbox and mass-move them out to a spam
      folder, but I glance over the list as I do it, checking for legit messages
      that my filters missed. Any spam filtering method that proposes to beat
      this approach has to absolutely guarantee to get zero false positives, and
      I have to believe it. Only then can I stop looking through the list for
      the occasional message that should have gone into the real mail folder.

      --
      Cut that out, or I will ship you to Norilsk in a box.
  90. It wouldn't be meta-bayesian. by Ayanami+Rei · · Score: 3, Informative

    It'd still be bayesian, except that word frequencies and graph connectivity of sender would _both_ be considered for additional spam probability. I don't have a filter to check, but don't most Bayesian classifiers also include other metrics besides top 20 word frequency, like length or presence of attachments, etc.?

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  91. Wow! Best filter ever! by Feztaa · · Score: 1

    The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category.

    This spam filter has 100% accuracy! ... half of the time, anyway.

  92. Privacy?! by kyshtock · · Score: 1
    Am I a total idiot (don't say a word!) or they will want to create models of our real-life social network? Who will hold that data? Maybe I don't want EVERYBODY to know that Joe is my friend, and from the moment he gave me Jane's address, I have meaningfull conversations with Jane, while rubbing noses with Louise... damn!

    I know that THEY might know the list of people I email to, but creating a centralised structure?! I think I prefere spam.

    I am just picturing some pimple faced hacker, rubbing his hands: so, finnally, Jane and John got to know each other... let's spy some more.

    Or, maybe, I am just paranoid. What do you think?

    --
    Bite my shiny metal... oops... Nevermind!
    1. Re:Privacy?! by lurker412 · · Score: 1

      Well, I RTFA and it was not clear to me just how this scheme is implemented. Specifically, it was not stated whether the social network is compiled centrally or on the client machines. Your concern is valid only if there is a central database. However, I can't think of any compelling reason to build it that way. If the data are stored locally, then I wouldn't have any privacy concerns. On the other hand, eliminating 50% of spam without false positives does not seem to be a great leap forward.

  93. WARNING by kiick · · Score: 1

    This property is guarded by Smith and Wesson 3 nights a week.

    You guess which three

  94. great in theory by warpSpeed · · Score: 1
    But in practice I find that milter-sender works wonders in screening out fake/made up email from addresses. Of course you have to be using sendmail.

    It is not a perfect solution, but is sure cuts out a ton of crap from making it into my server.

  95. 50%??!? WOW! by Johnny5000 · · Score: 0

    I also came up with a great way to correctly sort 50% of email into spam/non-spam catagories.

    if random_number >= 0.50 it's spam.
    else
    it's not spam.

    --
    The libertarian solution to the failures of capitalism is to apply more capitalism til the failures are fixed.
    1. Re:50%??!? WOW! by Datoyminaytah · · Score: 1

      That only works if 50% of your email is spam.

      --
      assert(birth_date<time-86400)
  96. A Commonsense solution by kneels_bore · · Score: 1

    A system I have been using for some years now beats any approach I have seen, whether it be Bayesian, blacklisting or whatever. As soon as I get spam on an email address I terminate that address, create a new one and inform all relevant parties, explaining that my address has been compromised. People understand. It works. Since November I have not received one single spam message and I get at least ten emails per day.

  97. One Word: Spamcop by Anonymous Coward · · Score: 0

    It works. Period. Say goodbye to spamheads forever. The occasional one that gets by can be quickly reported and blocked PERMANENTLY. The $30/yr is worth every single penny, IMHO. My spam count per Year went from thousands to low forties, now at zero. It's a godsend.

  98. Why? by Cytop1asm · · Score: 1

    "Why is spam so bad when its done via email but when you cut down a tree and print it out, its okay? You can just delete an email in 1 second, but mail just ends up the floor causing pollution." -jekz

    1. Re:Why? by TheoMurpse · · Score: 1

      because of Ed McMahon and Publisher's Clearinghouse

  99. Plaxo Revealed? by Fritz+Benwalla · · Score: 2, Interesting

    Isn't this scheme the perfect use for the wide-ranging social network information being collected by Plaxo?

    It makes sense - they certainly haven't annouced a revenue stream yet, and "keeping your address book up-to-date," even in a wireless and multiplatform world just doesn't seem like a big enough idea to justify the huge amounts of data collected.

    So is that the annoucement that's coming from Plaxo, the unveiling of a broad Spam solution that used 'degrees of separation' data from your address book and the address books of your friends to implement a spam filtering solution?

    If I may say, it does seem like the killer app for their unique data set.

    -------

    --

    Believe me, I'm as surprised by my comment as you are.
  100. Addressed, not send by by SmallFurryCreature · · Score: 3, Informative
    In order to break this system those spam you received would have to have been send by someone those people know. Not just send to a lot of people you know that is in fact what would tip the system something is wrong.

    I send you and your sister a spam. While both of you are getting the spam, to both of you I am an unknown and therefore the system would flag me. ONLY if I send the spam to you while pretending to be your sister would the system break. I would need to know both your email and the email of someone you know. This would not be impossible to harvest with virusses stealing addressbooks but is not what is currently happening. Currently email address lists used by spammers are very simple flat text files. Of course nothing complex would be needed. Simply a similar text file but now with two emails per line. The first the recipient, the second the person to forge as the sender. Simple but more work.

    So it looks like a pretty clever idea. Especially for work place email where most mail is by people you know and very little email from outside usually arrives. And even when it is done it is usually from a known domain namely a client or supplier.

    Will it work? Who knows. Gotta be worth a try. Unless you want to wait for Bill Gates to fix it. We all know how well the security problems in windows were fixed eh?

    There is not going to be a magic bullet that fixes spam. We will just have to use a lot of ordinary lead ones. Don't worry Bush says they are safe.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

    1. Re:Addressed, not send by by crymeph0 · · Score: 2, Interesting

      I think this could be pretty easily beaten, and I'm surprised my spam isn't already showing this characteristic, now that I think about it...

      (all spammers, please don't read anymore below here, I don't want to give you ideas).

      In my example, I get spam sent to me and several other people at my work. It would be trivial for spammers to modify their algorithms so that instead of sending to x people in my office, they send to (x-1) people in my office, and use that last address as the "From" field. Of course, you could set up your email server to detect this (mail coming from outside claiming to be from inside). Does Exchange Server provide this kind of functionality? If not, it would be all too easy for spammers to break this method.

      --
      It should be illegal to say that freedom of speech should be limited.
    2. Re:Addressed, not send by by ogre57 · · Score: 1

      Of course, you could set up your email server to detect this (mail coming from outside claiming to be from inside). Does Exchange Server provide this kind of functionality?

      Don't know re Exchange, do know:
      Postfix: default config
      Sendmail: definitely, may be default config
      Qmail, Exim: never tried, but probably

      Guess: 80..95% of the email servers in use today can be easily config'd to reject this, and likely already are.

    3. Re:Addressed, not send by by PetWolverine · · Score: 1

      There is not going to be a magic bullet that fixes spam. We will just have to use a lot of ordinary lead ones.

      So I'm authorized to shoot spammers now? Yay!

      I'll just have to track a few of them down...

      --
      I found the meaning of life the other day, but I had write-only access.
  101. Impressive by fetus · · Score: 1

    "The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category"

    Translation: For the half that it works on, it works.
    So basicaly, it works when it works - for some reason I'm not impressed.

  102. Seems like a good use for FOAF by GeorgeH · · Score: 2, Informative

    FOAF is an open XML/RDF standard for describing these social networks, it seems like that would be a good way to implement this. Plus, since it uses SHA1 sums of email addresses it would be possible to check addresses without giving them up to spammers.

    A lot of sites like Tribe.net and my own project SongBuddy are working on integrating FOAF into the site, so that you won't have to worry about the mechanics of it unless you want to. Seems like an easy way to build these kind of white lists.

    --
    Why can't I moderate something "Wrong" or at least "Grossly Misinformed"?
  103. everything has a weakness... by Cruciform · · Score: 3, Funny

    Next thing I know all my email is going to have a reply-to: Kevin Bacon.

  104. Re: So It's just a very good rule, how is that bad by Robo+Dojo · · Score: 1

    If you don't have any friends, then every e-mail that you send will come up as a false-positive, and you'll be blacklisted forever. I wrote up a good e-mail to SomethingAwful, too, only to have it returned with a spam score of 1.2!

  105. Well, gee... by mazarin5 · · Score: 1
    The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category.

    I would hope that once you root out all the cases where it doesn't work, that all you have left is cases where it does work.

    --
    Fnord.
  106. Nature is pimped. by ubiquitin · · Score: 1

    Note to moderators: I really try to keep things positive, but these guys should know better. Not a troll here, just felt this needed to be said.

    I always suspected this was the case, but now I have hard evidence. Nature is pimped out to private interests. It used to be a voice of the scientific community.

    Spam isn't science, people. It might be network warfare, but that makes it more about power than it is about knowledge. So now Nature is just another magazine.

    Note that the only give one footnote, and it is a self-serving link to a preprint. "Boykin, P. O. & Roychowdhury, V. Personal email networks: an effective anti-spam tool. Preprint, http://www.arxiv.org/abs/cond-mat/0402143, (2004)."

    There are real scientific journals out there. With tricks like this, Nature apparently isn't among them.

    --
    http://tinyurl.com/4ny52
  107. Huh? by aonaran · · Score: 1

    The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category.

    Am I the only one who read this sentence and said "huh??"

    I thought sorting into the right category was THE determining factor of whether a spam filter works.

    Of course 100% of the times it worked it sorted into the right category. The only stat that is important is that 50% of the total messages DIDN'T sort into the right category (the 50% that didn't work)

  108. HOW SPMAMMERS CAN BEAT THIS FILTER by goombah99 · · Score: 4, Interesting

    There are three ways one can beat the filter.

    The first is trivial and certain to succeed but has a Drawback to spammers: only send e-mail to single recpients. The drawback is this puts a much higher load on their servers since every message is sent individually.

    The second method is to always include dummy addresses in the mailing list that the recpients probably have in their address books. For example, add the following names to the to-field: notifications@paypal.com and list-notication@ebay.com.
    Any recpieint that of the spam message that also has recieved e-mail from e-bay or pay-pal will trust the message.

    One can do even better by planning ahead when harvesting e-mails. For example, if you harvest a set of e-mails from a pqarticular bulliten board you can make note of message cliques at the time of harvesting, and send messages in the same groupings. for good measure you also send the addresses of the buliten board admins as well.

    Third, all the spammer really has to do is to know is one recipient you have gotten messages from. Thus either buy mailing lists from legitimate companies people actually do bussniess with. Or create your own loss-leader messages. For example, send out some political action alert or anything that has some vlaue or use to most people, maybe a lottery drawing for a prize, or a discount subsciption to time magazine, so they will accpet the message. the sender does not have to be the same as your spammer address. Now you know someone in the adress book of the victim. Now you spam the crap out of them while including the trojan address in the to: field.

    --
    Some drink at the fountain of knowledge. Others just gargle.
    1. Re:HOW SPMAMMERS CAN BEAT THIS FILTER by luugi · · Score: 1

      You know a little too much on this.....

      --
      Think like a man of action, act like a man of thought.
    2. Re:HOW SPMAMMERS CAN BEAT THIS FILTER by kirkjobsluder · · Score: 3, Interesting

      The first is trivial and certain to succeed but has a Drawback to spammers: only send e-mail to single recpients. The drawback is this puts a much higher load on their servers since every message is sent individually.

      True this method is strongest against dictionary spam and does not work against non-dictionary spam.

      [i]The second method is to always include dummy addresses in the mailing list that the recpients probably have in their address books. For example, add the following names to the to-field: notifications@paypal.com and list-notication@ebay.com.
      Any recpieint that of the spam message that also has recieved e-mail from e-bay or pay-pal will trust the message.[/i]

      Um, did you RTFA? (And perhaps most importantly, did anybody modding this article RTFA.)

      The algorithm has nothing do do with addressbooks. Instead, it looks at friend of a friend networks as identified by mail headers.

      For example, I work on a project with Bob, and Susan. A typical email message about the project will include my address, and their addresses in the header. The algorithm assumes that three first degree relationships exist:
      me-bob
      me-susan
      susan-bob

      There are also three second-degree (friend of a friend relationships.
      me-susan-bob
      me-bob-susan
      susan- me-bob

      The high ratio of second-degree/first-degree relationships gives susan and bob a higher score (3/3=1), and puts them on the whitelist.

      With paypal.com, there is only one first-degree relationship: (paypal.comme) and no secondary relationships. The algorithm handles single relationship networks as a special case, and defines them as ambiguous.

      With a typical dictionary attack, a spam comes with 50 email addresses in the header. However, because a dictionary attack relies on sequential or randomly generated usernames, the number of recipients who are part of my social network is low. So we have 50 first degree relationships, and lets say the spammer gets lucky and nails Susan and Bob as well. It still gets a low score. (2/50=.04)

      One can do even better by planning ahead when harvesting e-mails. For example, if you harvest a set of e-mails from a pqarticular bulliten board you can make note of message cliques at the time of harvesting, and send messages in the same groupings. for good measure you also send the addresses of the buliten board admins as well.

      This is a slightly better strategy. However, this only works if you use email from a member of the clique, and limit the recipient list to members of the clique.

      But there is a serious problem with the strategy. The stated goal of the authors (did you RTFA?) is to increase the costs of spamming to the point where spamming is no longer economically profitable. Such a strategy would require research which is expensive.

      Or create your own loss-leader messages. For example, send out some political action alert or anything that has some vlaue or use to most people, maybe a lottery drawing for a prize, or a discount subsciption to time magazine, so they will accpet the message. the sender does not have to be the same as your spammer address. Now you spam the crap out of them while including the trojan address in the to: field.

      Once again RTFA. The algorithm has nothing to do with addressbooks. But you did raise one possible threat: spoofing. A spammer could not get integrated into my social network by offering a loss-leader (for the same reason that messages from ebay.com would not be whitelisted). A spammer could spoof a member of my social network. (For example, using Bob's address.) However, the problem here is economics. Bob would probably only be auto-whitelisted by 50 people. Thus spoofing Bob would only get you access to a small population, which defeats the entire economic rationale for spamming.

    3. Re:HOW SPMAMMERS CAN BEAT THIS FILTER by Anonymous Coward · · Score: 0

      Hey retardo boy. If you substute mail-headers for addressbook in the original post all your repsonses become nonsense.

    4. Re:HOW SPMAMMERS CAN BEAT THIS FILTER by Anonymous Coward · · Score: 0

      It is even simpler to use "from" = "to", for a while, anyway.

    5. Re:HOW SPMAMMERS CAN BEAT THIS FILTER by teklob · · Score: 1

      I don't think you quite understand the system. Sending messages to individual spammers would not help, because there would still be a large volume of mail from one address sent to many, just not in one email.
      Also, the person who said that having his email headers stored would violated his privacy should realize that there could easily be some sort of one-way hashing system that would obscure the headers to make comparisons possible but not address extraction.

  109. TMDA is definitely not for everyone by Vainglorious+Coward · · Score: 1

    TMDA certainly isn't for everyone. By sending out challenges to unknown senders, you shift the burden from yourself to the joe-job victim. Nice for you, not so nice for the poor people on the receiving ends - each spam you receive generates a spam for the joe-job victim. Not even that nice for you either, since for every spam you receive, you double the bandwidth it has consumed by generating an outgoing message. If everybody did use TMDA, our systems would all be clogged under the flurry of challenges. TMDA has its place, but that place is not as a general purpose spam-reduction technique.

    --
    My next sig will be ready soon, but subscribers can beat the rush
    1. Re:TMDA is definitely not for everyone by mdfst13 · · Score: 1

      If everyone used TMDA, then there wouldn't be non-whitelisted spam. That 60% reduction in email would more than cover the bandwidth for the once per sender authentication that TMDA requires (actually less than that, since it auto-whitelists people to whom you send email and allows you to just whitelist people). Especially considering that a TMDA challenge is relatively small if you don't return the original message.

      Combine TMDA with something like SPF sender validation ( spf.pobox.com ) and you won't even have the bounce problem unless someone's actual account was used to send the email.

    2. Re:TMDA is definitely not for everyone by Vainglorious+Coward · · Score: 1
      If everyone used TMDA, then there wouldn't be non-whitelisted spam

      I'm still trying to parse that to make sure I understand you. "non-whitelisted spam" is spam from a source that I haven't whitelisted? How is that different than the spam I have today?

      You haven't addressed my basic problem with TMDA as spam-reduction technique in that it defrays the burden of spam on the TMDA user by spreading that burden to other innocent third-parties. TMDA (indeed any anti-spam system that generates emails to purported senders) can never offer a global solution to spam, because it just spreads the mess around. It may work well for a small number of individuals, but at a cost to the wider community, tragedy-of-the-commons style.

      Compare also with email systems that detect viruses and generate emails to the purported sender - this makes a virus outbreak worse by flooding innocent third parties with alarming warnings (at best) or infective emails (in the worst cases).

      --
      My next sig will be ready soon, but subscribers can beat the rush
    3. Re:TMDA is definitely not for everyone by mdfst13 · · Score: 1

      If you are on the whitelist, you can spam (no challenge). If *everyone* used TMDA, that would be the only kind of spam (whitelisted spam), and there is no point in sending spam if it is not going to be received. Whitelisted spam will not be challenged; it's already whitelisted. Also, whitelist spam will be relatively rare, as it would put the burden on the spammer to get an address you whitelisted.

      If *everyone* used TMDA, then there would not be a user level problem with challenges for messages that you did not send, as TMDA can drop challenges from people to whom you have not sent mail.

      Even at the ISP level, remember that we now have 60% of the email bandwidth with which to play (the reduction from current spam). That should more than cover sending a challenge for every email message much less the trivial number of legitimate messages that actually get challenged (6%) by a TMDA system.

      TMDA works really well with a system like SPF. With SPF compliant domains, you can identify the sender (or at least the sending domain) accurately. If you can do that, no fake challenges to SPF compliant domains with solid authentication of SMTP users (of whatever form: SMTP Auth, etc.). Now, the fake challenges (i.e. those to people to whom you did not send mail) encourage faster adoption of SPF (and thus more authenticated senders).

    4. Re:TMDA is definitely not for everyone by Vainglorious+Coward · · Score: 1

      I really wish now I'd written "significantly more people" instead of "everyone", since of course, "everyone" using TMDA is a special case, and only applies in the utopia where there is no spam and so anti-spam measures aren't necessary anyway. We can never get to the "everyone" scenario for the reason I've already given - the burden is not being mitigated, merely increased and spread around. What you see as "encouraging faster adoption of SPF" I see as inflicting damage on innocent third parties. Of course, you may be right and I may be wrong. But I've got a dollar here says neither TMDA nor SPF will ever take hold

      --
      My next sig will be ready soon, but subscribers can beat the rush
    5. Re:TMDA is definitely not for everyone by mdfst13 · · Score: 1

      Innocent? Like Open Relays are innocent? Domains that are unwilling to do a minimal amount of authentication support are not innocent IMO. Simply reckless and dangerous.

      I'll take your bet with the following modifications: SPF or another reverse DNS method that does the same thing (authenticate IP addresses as valid senders for a domain); take hold defined as more than 50% of the domains with MX records have SPF records.

      I hear that AOL is adopting SPF. If Hotmail and Yahoo follow, the pressure will build. Particularly with ISPs like EarthLink adopting TMDA-like challenge response systems (plus the Microsoft version that is currently being contemplated). SPF helps one of the biggest problems with the SMTP protocol: the lack of sender authentication. The biggest obstacle to its adoption is the pressure from Microsoft, et. al. to develop a more expensive solution (remember the proposal to create SSL-like ID certificates?).

    6. Re:TMDA is definitely not for everyone by Vainglorious+Coward · · Score: 1

      Innocent? Like Open Relays are innocent?

      By "innocent" I was referring to the joe-job victims whose addresses are faked by spammers. As for open relays, surely you're aware that they represent only a small percentage of spam sources these days - most of the big spam runs are injected using armies of trojanned PCs (think about how you might use such an army to defeat SPF).

      I think it's fair to characterise AOL's involvement with SPF as "experimental" rather than "adopting" at this point. And the reason for this is obvious - so much spam has a faked @aol.com address (I know of some mail admins who simply reject all mail "from" aol.com; others will weight against aol.com in spamassassin). Note that adopting SPF does nothing for AOL users *receiving* spam.

      What you describe as a problem with SMTP - lack of sender authentication - is actually a design feature. Think about it this way - why should I have to be authorised by some corporation in order to send email? Are you arguing that one shouldn't be permitted to send email if you don't pay money to an ISP? Of course, the big ISPs *would* love this and are arguing about the method, but that's because who ever "owns" the method owns email and a potentially lucrative revenue stream (viz your example of the certificate proposal).

      I'm trying hard not to sound like I'm throwing my hands in the air and wailing that we can never fix the spam problem, and obviously (I hope!), I loathe spam as much as anyone. But I also recognise that this is a *difficult* problem. Lots of very bright people have put considerable effort into thinking about solutions, and no-one has yet been successful. Even reverse-DNS type methods are nothing new. I'm not suggesting that you're an anti-spam kook (and I certainly am not), but if you haven't seen it already, check out the "You Might be an Anti-Spam Kook If..." page to get a flavour of the kind of difficulties and subtleties involved in coming up with a Final Ultimate Spam Solution.

      So, we have a bet. Actually, I'm going to go crazy and offer a dollar on non-uptake of SPF and another dollar on non-uptake of TMDA. If or when I'm proven wrong, I'll start arguing over your criteria of "50% of the domains with MX records", which leaves me some wiggle room ;)

      --
      My next sig will be ready soon, but subscribers can beat the rush
  110. If I where a spammer... by nekuz · · Score: 1

    I would send my spam with spoofed addresses taken from the same pool of victim addresses who will receive my crap. This way, I will be generating a faked social network, surpasing this algorithm...

  111. Fatal flaws by mccrew · · Score: 1
    The e-mail clusters can be mapped out by inspecting the 'from', 'to' and 'cc' fields in a user's inbox. An automated system can quickly build up a blacklist of spammers, as well as a 'whitelist' of approved sources.

    You mean that someone has come up with a solution for Spam, while the rest of us smart people were thinking really hard about the problem for the past 5 years and could not come up with the silver bullet? Let's see...

    Fatal flaw #1: With spam, you can't trust the From: header, and frequently the To: header either.

    Fatal flaw #2: A "blacklist of spammers?" That's a hoot! How much disk have you got? I have an equivalent idea: since all e-mail addresses are forged, why not skip the inspection of the user's inbox, and just conjure up random combinations from /usr/share/dict/words? That would be just as effective.

    1. Garbage in
    2. apply algorithm
    3. Garbage out
    4. ???
    5. Profit!
    (Always wanted to do one of those:)
    --
    Hey, Windows users, there is no such thing as "forward" slash, there is only slash and backslash.
  112. Re:Erm, not? by jmichaelg · · Score: 1
    Simply : untrue. It's as easy to fake the envelope sender as it is the From: header.

    But it has to be faked with the correct information for a particular recipient. You can't just put some random name there and get by this filter. Spammer has to know that Mary knows Tom. If Mary gets email from Jim, whom she doesn't know, the email is flagged. It's the pairing of From and To headers that matters, not the individual entities.

  113. hmmm by loopyfx · · Score: 2, Interesting

    suppose a spammer harvests from a social network site and spoofs their source address to be from harvested addresses... it's pretty likely 2 people on the same social network site will be within eachother's threshhold if only the to/from/cc headers are used...

    maybe more sophistocated techniques to include the source IP subnet or something? Some sender verification would be required.

  114. How can they publish it ? by Gads · · Score: 1
    After reading this article I was quite disappointed.
    It's an extension of whitelist mechanism with some graph theory included. ( and quite too much theory for something so simple...)

    As stated in the article:
    The most obvious contermeasure is to never use multiple recipients in the To: or Cc: headers of a spam message.

    It sounds like BCC: and there is nothing special about it. It's quite widely used. Most of my spam comes without CC: or multi recipients.

    It would certainly be very interesting for ISP, because they can track many emails at a time. The main issue would be the size of the graph induced (less CPU intensive -> more MEM intensive) and privacy.
    But it's definitely not for individuals.

    I keep this for generating examples for bayesian analysis, that's all:
    Using this algorithm, it would be possible to generate training sets for learning algorithms in achieving accurate, automated spam filtering.

    Also nice graphs... perhaps to big at the end...
  115. Re:TMDA by mdfst13 · · Score: 1

    You give the mailing list a special email that always goes through, or you just whitelist (i.e. add them to your "buddy" list before they send you email) the mailing list. Whitelisting is better but doesn't work if your list sends from people you've never seen previously. If the list always sends from itself (i.e. listname@listserv.com), then a whitelist is the way to go.

    TMDA also supports throwaway email addresses that you can use to register at a site that sends an email confirmation. The email address will stop working after a while and the site can't spam you. Think real.com for an example of why this is necessary. You can also get throwaway email addresses from spamgourmet.com without TMDA.

  116. Been there, doing that. It works. by Anonymous Coward · · Score: 0

    I've been doing this for over a year, commercially, and it works very well. Funny, when I implemented it it seemed so obvious to me I didn't want to make a big deal about it, just added it to the service description and the options.

    It's not enough by itself but it makes a huge difference. Guess the cat's out of the bag now, eh?

    (Sorry, AC post, as my poor little server couldn't handle a slashdotting right now. I'm still just a small anti-spam biz, though an effective one.)

  117. who responds to spam still? by mcdade · · Score: 1

    I can understand how in the beginning of the internet people would be suckeder into all these sorts of spam deal but now.. it's pretty wildly known and like 80% of spam hits the trash without anyone seeing it. The rest, well it's not even opened or looked at .. right into the trash too.

    So all these spammers are sending more and more messages, costing more and more, but getting less response from them. Wouldn't they figure that the game is up and just go back to the cesspool they climbed out of ??

    I had the idea of doing a mail server auth service, sort of like DNS but you have to pay to register your mail server, 100% spam free network and mail servers registered only receive emails from other who are registered... this would allow anyone who is legit to register cable modem servers and that sort of thing but yet keep spammers out.. soon as one system gets compromised it gets removed from the list, and no other server will talk to it... closed network system..

    who wants to fund this project??

    -b

  118. My filter by renfrow · · Score: 2, Interesting

    I have my own domain, and run my own mail server for personal email. The ONE thing that I have done to reduce incoming spam drastically(i.e. I only get 5% as much now), is to refuse incoming connections to the mail server from any machine that does not have a valid rDNS value. I may miss email from someone, but, they'll have gotten a(n) (somewhat) informative message telling them why their email did not succeed. They can either complain to their ISP and get their rDNS fixed (like I did :-) or call me/send me a letter.

    Tom.

  119. 78 filters in KMail! by Anonymous Coward · · Score: 0

    I have 78 filters in KMail and I have to deal w/ at most 10 pieces of spam a week not getting caught... and about 40-80 a day going straight to the trash!!!

    My method is fairly simple, I put everyone I regularly communicate with and some keywords that suggest something is not spam or may be important on a filter to various folders sorting out people into groups (coworkers, undergrad classmates, med school class mates, family, documents, other)
    these filters come first so that if any of these emails contain a word blocked by further filters such as marketing, promotions, values, singles, etc... they will still get through. I then also have a filter that checks for my name in the header (which it would if it was a reply.

    The last set of filters are for blocked words in the header such as the aboce mentioned marketing, promotions... but some words are blocked from the complete message

    It took a little time to get the system of filters in place, but believe me, emptying the trash once a day sure beats having to go through all the fluff.

    and it takes care of people on mailing lists too, cuz they can just make a positive filter for it.

    Just my 2 cents. I'm sure I am not the first to do this as this is the reason such filtering schemes were invented... but w/ all this talk of anti-spam stuffs... I just wanted to remind people that its right at your finger tip for free.

    Since I have stuck w/ KMail for some time I don't know any other mail software 'cept Pine... so your mileage may very with your client.

    peace my /. brothers.

  120. Chain of Trust by deinol · · Score: 1

    As I see it, the biggest problem is that of verifying the sender. This sounds easy for a corporate relay where you can validate users from the internal network.

    But what about my case? I own a domain. It does nothing but my own e-mail. Sadly, that address was available on the internet for longer than spam has been a real problem, so it gets hit hard. But the point is, the server is a linux box attached to my cable modem. I can't relay out through it. To cut back on spam, my ISP blocks SMTP out. I have to relay through their smtp server. So they have to allow me to send from any number of e-mails. Granted, some places use a login/password to tie that to a specific cable modem account, but even with that, there is no way for them to verify the validity of the address I supply.

    I'm not a crypto expert, but the only way I can see it working is if that relay server can compare some key I provide with a key that it gets from the dns record for my domain. But the real trick will be making certain the key I provide can't just be copied and used again. Maybe if it is linked with the timestamp?

    It's not an easy problem. And, all the SMTP servers need to agree on a standard to make it work.

    --
    Got Apathy?
    1. Re:Chain of Trust by ComputerSlicer23 · · Score: 2, Interesting
      You don't have to verify that it came from the SMTP serve one would expect. You have to have something the sender can do, that no one else can, that is easy for anyone in the world to verify.

      Essentially, that is a short description of how a "Chain of Trust", or better named a "Web of Trust" works in GPG. You have people who verify that person A knows the private key A_1 the corresponds to public key A_2.

      Even if they don't bother encrypting everything, but just digitally sign it. It's also just an anti-spam filter, so I'm even less worried about having the key be encrypted. Now, I can go sign any key, with my key rating how "trustworthy" I deem people. You get a 5 if you are really trustworhty, and a 0 if I deem you absolutely untrustworthy.

      From there, you can build layers of trust, trusting the ratings of people you trust, on and on, until you establish a relationship thru the web between you and the sender.

      Now the problem, is that there is no marginal benefit, an it'd be very hard to get the users individually to do this. So, I'd suggest that the SMTP servers do this themselves. You create a web of trust that is only for SMTP servers. You register you key on the web. You send people some e-mail. Eventually, they'll e-mail the admin of the E-mail servers you communicate with regularly telling them asking them to review their logs and sign your key. Ask your friend, peers, clients, vendors, and/or upstream providers to sign the keys deeming you trustworthy.

      They do this, and your on the web of trust. You find a mail that doesn't do this, view it as suspcious. You find one that is signed with an SMTP key that is known to have sent spam by someone you trust, you drop it on the floor.

      Then you can start to trust SMTP servers. It has all of the advantages of SPF, and has some type of cryptographic security, plus doesn't allow spammers to just setup SPF records bogusly and get away with it. They'll have ton continuiously try and get new keys that are deemed trustworthy.

      Assuming you have any friends, who have friends outside your clique, it should be relatively easy to get a foothold in the web of trust. Everybody who befriends a Spammer will be deemed "untrustworthy" in short order. So you won't trust people they trust. Eventually the system should balance out. No work need change by individual users. Mail Admin's could communicate with each other and make the system work. About the only real problem, is that it puts extra load on any mail server. Depending on the volume of mail you have, just setup 2 or 3 inboud/outbound sendmail servers that you queue to. Their sole job is to verify and/or add the digital signature/encryption to mail.

      Webs of trust are a well understood animal in GPG land. While I'm not terrible conversant with them, they are essentially a distributed rating system by which rankings and trust worthyness can be ascertained about people you've never met. Think of it as a better system, with more flexibility then Karma + Karma Modifiers + Friend/Foe on Slashdot.org

      Kirby

  121. That's not what that error means by mdfst13 · · Score: 1

    The error that you are getting is telling you that you are trying to relay through a mail server, i.e. that the To email address is not associated with that mail server and that you haven't met its standards to send from the From email (in your case, it sounds like it requires you to have appropriate DNS records for the sending IP; it could also use SMTP authentication as well--same concept). All correctly configured mail servers will do this in some manner. In fact, one of the spam blocking techniques is to set the server to reject email from any server that is on one of the lists as an "open relay" (meaning that it is not properly configured to reject unproven senders to outside domains). You won't get that error if you try to send from an outside domain to one that Verizon manages.

    A more common method is to check for a PTR record for the IP that is sending you the email. If it doesn't have a PTR record, then your mail server rejects the mail. Checking for an MX record is overly restrictive and will blacklist many large organizations.

    There is also a method called SPF that actually does allow organizations to "whitelist" their mail servers as appropriate senders for their domain. I just found out about it today, but I have my host looking into adding the appropriate DNS entry for me. The great part about it is that it is a whitelist method at the domain level, i.e. it makes individual domains responsible for authenticating their mail sending servers. Combined with a blacklist of open relays, this allows you to at least apportion blame. If spam is sent, then that domain can fix it, because it is caused by a failure in their authentication system.

  122. so for 50% of all mail it works by huckda · · Score: 1

    and the other 50% gets through...
    nice work guys!

    --
    "Just Smile and Nod." --Huck
  123. I think Bayesian would consider this by Trinition · · Score: 1

    If your Bayesian filter is intelligently tokenizing, thenit shoudl be able to see the 'CC' header. WIth proper training, 'ham' e-mails which happen to be CC'ed to a lot of friends or co-workers will actually start having those e-mail addresses tokenized as being pro-spam and that should contribute to their 'haminess'.

    And, like I think I understood in the article, those e-mails without a lot of CC's won't have that extra 'haminess', but you can't guarantee that they're 'spam' -- so that half will have to rely on non-social-network properties to determine its 'haminess' or 'spaminess'.

  124. Re:LINUX IS "FREE" by Anonymous Coward · · Score: 0

    I get paid for my time, better me getting paid than some software megacorp.

  125. It won't work, but can be imporved by axxackall · · Score: 1
    Relying on "From", "To", and "Cc" fields is a bad idea when you fight spammers. Such information can be spoofed and substituted. Many firends write each other without any Cc and like that.

    But the idea of social clamps is not dead, it just must be implemented ddifferently. How? We already hundreds times discussed it here:

    Every message must be signed with the key deployed to some reliable key/CA server.

    Where to get such reliable CA server? Easy! The answer is actually in the article. IMHO community of email users should sign certificates of each others. And exchange such trust tickets.

    So, if I've got email that signed by the key that is trusted by other friends of the same community - I accept to read it. If it signed with the key that I don't have any trust information - it should be marked as "Untrusted" and wait my free time in some low-priority mailbox. If it's signed with the key I trust my self - accept it immidiately. If it's signed with the key I revoked - reject it. If it's unsigned - autorespond with an advise to sign it.

    --

    Less is more !
  126. 100% of 50% by penguiniator · · Score: 1

    "The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category."

    Yes, indeed, getting it right in every one of half of all cases is quite an advance over getting it wrong in every one of the remainder.

    --
    ZZ
  127. And let spammers kill Orkut? by grokster · · Score: 1, Insightful

    How long till the spammers come up with a way of infiltrating Orkut, and inviting random people to be their friends?

  128. Show me the code... by Anonymous Coward · · Score: 0

    Hello folks,

    I've read the full article at Arxiv.org and it sounds promising. But I see no code... and the algorithm description looks much too complicated for me to bother trying to implement it just to see how well it works.

    It would be really nice at this stage to have some working code to throw some real messages on.

    I've scoured the author's personal pages (which seem to be here and here)
    but can't find anything there either...

    Hello Misters Boykin & Roychowdhury, what about some working code?

    And please don't forget: I may be lazy, but you are ugly and I can always try working harder... :-)

  129. mod parent up by Anonymous Coward · · Score: 0

    as subject says...