Slashdot Mirror


New Method of Spam Filtering

Alephcat writes "A simple and easily implemented scheme for combating e-mail spam has been devised by two researchers in the United States. P. Oscar Boykin and Vwani Roychowdhury of the University of California, Los Angeles use their method to exploit the structure of social networks to quickly determine whether a given message comes from a friend or a spammer. The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category. The article was published on Nature magazines website earlier today."

107 of 326 comments (clear)

  1. My favorite filter by krog · · Score: 2, Funny

    >/dev/null

    1. Re:My favorite filter by catdevnull · · Score: 3, Insightful

      my namesake! spam assassin on our mail servers helps bunches. x-headers that we add are so easy to filter. gets about 99% of the spam. your milage may vary.

      --

      I might know what I'm talkin' about, but then again, this is Slashdot...
  2. Everytime you filter spam... by Anonymous Coward · · Score: 5, Funny

    You take food away from a spammer and his children. Don't block spam, or else you hate childeren. You don't hate children... do you?

  3. Vwani Roychowdhury by Anonymous Coward · · Score: 5, Funny

    He was probably sick of people like me mistaking his name for a made up spam "from" line.

    1. Re:Vwani Roychowdhury by kc3lai · · Score: 3, Funny

      you mean "from: Anonymous Coward"?

  4. Interesting by jchawk · · Score: 5, Interesting

    It would be interesting if Google could find away for this idea to work with Orkut.com, since users of this service are typically connected to many other people who are not spammers. :-)

  5. Easily spoofed? by Sam+Ruby · · Score: 5, Insightful

    What's to stop the From:, To:, and Cc: fields from being spoofed (like a lot of viruses do)?

    --
    - Sam Ruby
    1. Re:Easily spoofed? by cavebear42 · · Score: 4, Informative

      as i understand it, they would have to spoof to someone who you know, a virus could easily do that (after it has your address book) but not so much for spam.

    2. Re:Easily spoofed? by Anonymous Coward · · Score: 3, Informative

      The fact that competant mail admins know how to prevent such stupidity from happening.

      Every wonder why worms use their own SMTP engine? Because those of us that are competent have one mail relay that only accepts messages from the internal domain. We prevent the worm's SMTP engine from working by having MX wildcard records to a logging box only for internal DNS - this ensures that any message sent from an internal box that gets out goes through the relay, which authenticates the user.

    3. Re:Easily spoofed? by imbaczek · · Score: 2, Informative

      Viruses are a different kind of spam. They actually come from someone you know (or might know.) Regular spam has those headers forged (and getting those right would rise costs of a single message, which is good.)

    4. Re:Easily spoofed? by SydShamino · · Score: 2, Interesting

      This certainly needs to be combined with a revamped SMTP system (or complete replacement) that enforces DNS-style From: lookups.

      So no, this certainly isn't a solution all by itself. It's the best one I've seen so far that doesn't involve more laws, though.

      Most of the other ideas surrounding DNS lookups are to enforce accurate From: lines. But then the ideas break down, with the best suggestions to be new laws to punish the sender of the spam. With the proposal here today, it can be done with technology instead of waiting for legislation.

      --
      It doesn't hurt to be nice.
    5. Re:Easily spoofed? by FauxPasIII · · Score: 5, Informative

      There are two 'sender' fields that one is concerned with: The envelope-sender and the From: header. The latter can be spoofed as much as you like. The former cannot be spoofed in most cases, at least the host/domain part (the username can be spoofed if the server uses unauthenticated SMTP, which almost all do).

      A typical message would look like this:

      From spammer@baddomain.com
      From: Your friend <yourfriend@gooddomain.org>
      Subject: Re: your mail

      Buy our crap ! Click below to be removed. Blah blah.


      The first From field is the 'envelope sender' and comes entirely from the servers that have touched the mail. The rest of the fields are just a freeform part of the message, which by convention most (all?) MUA's treat in a special way to add convenient features like having the 'real name' next to your mail address in the visible From: field.

      --
      25% Funny, 25% Insightful, 25% Informative, 25% Troll
    6. Re:Easily spoofed? by DR+SoB · · Score: 3, Interesting

      The issue is recieving.. Yes, you can EASILY block outbound, it's inbound that's an issue.

      "We prevent the worm's SMTP engine from working by having MX wildcard records to a logging box only for internal DNS -"

      Say what? Why wouldn't you just block outbound port 25 from anyone expect YOUR SMTP server's address? If a worm has it's own SMTP engine (many do, yes), then what's to stop it from doing it's own MX look-ups? It would take about 4 extra lines of code to accomplish this.

      --
      Mod +5 Drunk
    7. Re:Easily spoofed? by mlefevre · · Score: 5, Informative

      The envelope-sender can be just as easily spoofed as the From: header. If you're sending email out through your ISP or corporate email relay, that may well check that the host (or the whole address) is correct.

      If you do as most spammers do and connect directly to the receiving server, then you can feed it whatever you like in the envelope sender, and it has no way of checking whether it's genuine or not. This is what stuff like SPF can help with, but as things are currently implemented just about everywhere, the envelope-sender addresses on spam and viruses are generally forged.

    8. Re:Easily spoofed? by crymeph0 · · Score: 3, Insightful

      Easily 30% of the spam I've received over the last few months has been addressed to several people in my office (and not to anyone outside the office). I'm guessing this a result of viruses harvesting emails off people's computers, then it's a simple matter of finding all known emails in a given domain. Would this break the system described here?

      --
      It should be illegal to say that freedom of speech should be limited.
    9. Re:Easily spoofed? by FauxPasIII · · Score: 3, Insightful

      > If you do as most spammers do and connect directly to the receiving server, then you can feed it
      > whatever you like in the envelope sender, and it has no way of checking whether it's genuine or not.

      Isn't it typical for the receiver to reverse-lookup the sender's IP, or at least forward-lookup whatever you hand it in the HELO to make sure you're legit ? I could be mistaken here, but that's always been my perception.

      --
      25% Funny, 25% Insightful, 25% Informative, 25% Troll
    10. Re:Easily spoofed? by DR+SoB · · Score: 2, Insightful

      "Please try to pay attention."

      I'll try..

      Your assuming too much dude.. Your assuming it's going to try and access your default DNS server, but it could be hardcoded to try any DNS server (i.e. use akadns.yahoo.com for lookups)..

      Also, some SMTP's don't even bother to do MX look-ups, they just assume it will be either:

      MAIL.[domain].[whatever]
      or
      MAIL1.[domain].[wh atever]

      And it will be correct 80% of the time. (Yes I picked 80% off the top of my head, but let's just say I've seen enough mail server's to know..)..

      --
      Mod +5 Drunk
    11. Re:Easily spoofed? by Vainglorious+Coward · · Score: 4, Informative
      Isn't it typical for the receiver to reverse-lookup the sender's IP, or at least forward-lookup whatever you hand it in the HELO to make sure you're legit ?

      Some systems do this, but any sensible system will not reject solely on this basis because it breaks delivery of some legitimate messages. In particular, nowhere does it say that mail "from" a particular domain has to emanate from a particular host (there's no analogue to MX for *sending* hosts). That's what SPF and similar techniques are trying to impose - registered "senders" for a particular domain.

      --
      My next sig will be ready soon, but subscribers can beat the rush
    12. Re:Easily spoofed? by gnu-generation-one · · Score: 2, Interesting

      "as i understand it, they would have to spoof to someone who you know, a virus could easily do that (after it has your address book) but not so much for spam."

      And virus-infected machines are being used to send spam, they're also capable of swapping email address details between machines?

      Coincidence? You'd better hope the spammers think so.

  6. Volume by enderanjin · · Score: 4, Interesting

    If the filters are effective against only half of the emails, what is preventing spammers from doubling their load in order to control the same amount of spam getting to your inbox as they do now?

    --
    Anything in parenthesis may (not) be ignored.
    1. Re:Volume by Dukael_Mikakis · · Score: 2, Insightful

      And from the sounds of it, what makes it different from black(or white)lists? True, it's more sophisticated because it uses the whitelists of those on your whitelists, but why not just use a plain whitelist anyway?

      And how does this allow email from internet transactions or other non-social sources through? The article didn't seem to address that so clearly.

    2. Re:Volume by ComputerSlicer23 · · Score: 2, Interesting
      It'd be novel to see how this worked, when implemented at say the ISP level. Possible an intra-ISP level, where they ended up exchanging information.

      Then when I get a random e-mail from a friend, of a friend that isn't on my white list, it's a lot more likely to show up in my filtered mail. It's an easy way of having a white list built for you. Besides, I hate maintaining a white list. Anytime someone changes e-mail addresses, I have to go play with the white list. It's not terrible convienent. I'd be much happier if they could be intelligently built by an automated system (with a weighting, and me maintaining possibly another white list).

      However, in the end, they are building a bass ackwards version of a "chain of trust". I mean, all you'd have to do is build a chain of trust of "From:" addresses you trust. However, if it is available to the public, it's probably a spammers dream. Which means it'll need some type of method of verifing that the From: headers are legitimate. As soon as that is done, spam filtering will be pretty easy, but it will create a whole slew of problems for generating e-mails from automated systems.

      Kirby

  7. huh? by wankledot · · Score: 4, Interesting
    It only works for half... but it works great on that half!!! How is that a good filter at all?

    Of course one huge downside to this "friend of friends" approach is all the virus spam I get that's sent using someone's address book (thanks Outlook!) Guess what... all those addresses are probably whitelisted because it came from someone I "know."

    --
    My sig is blank, I typed this by hand.
    1. Re:huh? by CeleronXL · · Score: 5, Interesting

      Well you can run mail through a system like that first, pulling out the mail that is definitely not spam and shuffling it away to the Inbox. Then run it through a different kind of spam system, such as a system like SpamBayes, and you cut it down even more.

      On its own it doesn't sound like it works well, but you can couple it with already-existing systems to boost accuracy.

    2. Re:huh? by nick_davison · · Score: 4, Funny

      Hey, don't knock a filter that can correctly sort mail in to two piles fifty percent of the time. CoinToss 1.0 has been a real innovation!

    3. Re:huh? by feepness · · Score: 2, Interesting

      It only works for half... but it works great on that half!!! How is that a good filter at all?

      No, it works PERFECTLY on that half.

      Important distinction. Now instead of needing need to troll through for spam yourself to generate the Bayesian filter you can set this to automatically generate your Bayesian filter. Not only would this be easier, but it would reduce false negatives/positives by 50%.

    4. Re:huh? by johnynek · · Score: 2, Interesting

      It is also really good at looking for false-positives or false-negatives of existing solutions (like spamassassin or crm114).

      --
      jabber: johnynek@jabber.org
    5. Re:Huh? by TheoMurpse · · Score: 2, Informative

      no what they meant was that 50% of all email messages are sorted into "friends" or "spam" correctly...the other 50% aren't sorted into either, but rather considered "undetermined"

    6. Re:Huh? by jonesvery · · Score: 2, Funny
      The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category.

      Am I the only one who read this sentence and said "huh??"

      Oh, no -- makes perfect sense to me. I applied that logic to quite a few exams when I was in college: "My score on this exam is perfect...I could only come up with answers to half of the questions, but every one that I answered was correct! a+ for me!"

      My professors were the bastards who didn't understand...

      --

      * * *
      It is a dada story -- it has no moral.

  8. hm.. by arabagast · · Score: 2, Interesting

    isn`t this somewhat similar to thunderbirds function not to mark those in your mailinglist as spam ?

    --
    Doolittle : ...What is your one purpose in life?
    Bomb no.20 : To explode of course.
  9. Cleaning up the gene pool by Anonymous Coward · · Score: 5, Funny

    Spammers suck, right? And their children have obviously inherited the spamming gene. So, by starving the children to death, we're preventing the spam gene from spreading. It may sound wrong, but we're actually helping society.

  10. Viruses? by AntiOrganic · · Score: 3, Interesting

    Won't this just inspire more spammers to pursue virus, trojan and spyware-oriented methods of spamming? Granted, this is significantly more difficult than just harvesting email addresses off of Usenet and web pages, but it seems like we're only one step ahead at any given time with our methods of spam prevention.

    1. Re:Viruses? by Xzzy · · Score: 2, Insightful

      > Won't this just inspire more spammers to pursue
      > virus, trojan and spyware-oriented methods of
      > spamming?

      Fine by me.. that puts them soundly into the lawbreaking category. Which means that after you track them down and actually find someone operating inside the borders of your country, you can DO something about it.

      Since the laws being passed in the US are clearly indicative that spam is and will always be in an impossible to regulate grey area, the next best solution is to make spamming so difficult that only outlaws can do it.

    2. Re:Viruses? by MoogMan · · Score: 3, Insightful

      That isnt necessarily a bad thing, forcing users to clue up on good practices regarding viruses etc by automatically blackmailing their email address otherwise. If this is coupled with a decent system to stop the from/to/cc from being filtered then it may start solving two problems at once.

    3. Re:Viruses? by Syberghost · · Score: 2, Funny

      Which means that after you track them down and actually find someone operating inside the borders of your country, you can DO something about it.

      Screw that; if they send even one spam to an FBI agent, they're interfering with his ability to do his job, and thus providing aid and comfort to terrorists.

    4. Re:Viruses? by j_matthews · · Score: 2
      Won't this just inspire more spammers to pursue virus, trojan and spyware-oriented methods of spamming?

      Yes. But what that does is degrade spammers not to people with annoying business models, but to criminals. This is good. Criminals can be locked up. Criminals can have restraining orders placed on them. Criminals can be fined. Yes I know that a lot of spammers use international borders to hide behind, but I don't think there is a government in the world that wants to be associated with crime syndicate protection just in case they get labelled TERRORIST or other politically correct name calling.
  11. Bugger Off! by ackthpt · · Score: 5, Interesting
    You take food away from a spammer and his children. Don't block spam, or else you hate childeren. You don't hate children... do you?

    You know darn well that this will only increase employment in the Spam Technology sector and is a good thing.

    Seriously, Spammers are often a step ahead and lately a lot of spam I'm getting is masked to look like Amazon orders or closed ebay auctions. I haven't ordered anything from Amazon (USA) in ages, but I till have to peek to see if someone has cracked my account and ordered something. Just expect the harder they are pressed, the harder spammers will press back by sinking to new lows.

    --

    A feeling of having made the same mistake before: Deja Foobar
  12. Good idea by Schezar · · Score: 5, Interesting

    After reading this, I realized that a good 90% of the email I receive is either from someone I've had previous contact with, or else someone 1 or at most 2 degrees of separation from one of those people. I never get mail worth reading from total strangers. Anything important is always linked back to me in some way.

    It should be interesting to see how this method plays out. (Now, I don't know why I even bothered with that last sentence. Everyone says that about every new spam-filtery thing. ((Don't know why I bothered with that last sentence either. Work is slow today I suppose.)) )

    --
    GeekNights!
    Late Night Radio for Geeks!
  13. this doesn't address spoofed email by alpha1125 · · Score: 3, Interesting

    What about spoofed messages from people on my list?

    Worms, from infected email systems?

    The researchers didn't address this.

    --
    Money cannot buy happiness, but can buy something soo darn close, that you can't really tell the difference
  14. A two tier system? by erick99 · · Score: 4, Interesting
    I suppose you could use this as a first pass and let those go directly to the "recycle bin" or whatever deletes mail (if you really can be confident that they are all spam). Then, the balance of your email could go through whatever antispam system you use. Right now I get over 100 spam emails a day. These go into a folder and are sorted by sender so that I can quickly scan through for any "friendly" emails. If would be nice to cut down the amount that has to be manually scanned by a half. Either way, this sounds like it's going in the right direction - towards a system that is close to 100% effective (if that is truly possible).

    Happy Trails!

    Erick

    --
    http://www.busyweather.com/
  15. email still has to get to user by belmolis · · Score: 3, Insightful

    If I understand the technique correctly, it relies on information specific to individual users. Unless there is a way for users to export their information, that means that the filtering can only be done after the email reaches its destination, not by the ISP or central mail server. So it may be helfpul to individual users, but unlike some proposed techniques, it won't cut down on total email traffic.

  16. End user's access is not the issue. by Sentosus · · Score: 3, Insightful

    For me as an ISP, I don't care if the email gets filtered between me and my customers. It hurts and costs me more for bandwidth to receive the emails, then store them, and then support the users that want me to clear their pop3 accounts when they are on dialup. Spam Filtering should take place at the Hub Cities on edge servers so it never gets to my mail server in the first place and I do not have the bandwidth charges. In exchange, I will filter all my outgoing mail on the mail server for spam outgoing. BTW, my mother likes spam. It is a good hobby of hers just to read through it. She gets very entertained by the content.

  17. Spam filtering by eclectro · · Score: 5, Funny


    If it doesn't use bullets, I don't want to hear about it.

    --
    Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
  18. I don't always like my friends' friends by Clemence · · Score: 5, Funny

    Can't stop the friend-of-a-friend idiot who hits "reply to all."

    It might not be "spam" but I filter it now. I'll stick with my procmail filters.

  19. Good Start by ticklemeozmo · · Score: 2, Interesting

    This seems to be a good start, but it still requires software on the user side. And that software must work with their mail client...

    I guess it seems this is where the focus has become. While some spam can be blanketed and deleted, it's really up to the RECIPIENT to judge whether its spam or not.

    But then again, do we trust the user? Do we trust Joe and Jane (our loving SixPack couple) to make the right decision? Sure, it might be prudent in a company of 5-50, but what about 500-5000? Deploy and manage copies of these program to see if it's going right or not?

    I'm a sysadmin and I prefer the server based solution. Blacklists, SpamAssassin, et. al. Easier to fix one machine than 5000 desktops.

    Comments?

    --
    When modding "Informative", please make sure it both has a source and IS actually informative.
  20. Re:Sounds interesting... by rjelks · · Score: 4, Insightful

    I would agree with that in terms of personal email accounts, but for a business, new contacts are pretty important. Most companies would hope a lot of real email was from new sources.

    -

  21. Heading the wrong way by Muddie · · Score: 5, Interesting

    This sounds like the whole "Friends and Family" network from AT&T a few years ago, and now Verizon's "In" network thing, but with email and exclusive instead of "Free calls to friends on 'the list'".

    Pretty soon, you will have to send an MD5 hash of your DNA from a static IP address that is reversible and supply 5 refrences all in a PGP encrypted letter, along with a copy of your passport and birth certificate.

    When it's more work to block spam than stop it, you have to ask what is going wrong. Maybe if we somehow figured out wonderful technologies to *stop* spammers instead of blocking them, we'd be getting towards the ultimate goal. This is much like throwing money at a problem to bandage it, not fix it. The solution, however, also has to be easier for end users, who are doing nothing wrong. Why is every solution harder for end users, but just a 'bump in the road' for spammers? Am I missing something?

  22. My own method by PossibleMat · · Score: 2, Interesting

    I would like to share in all humility my own method of spam filtering:

    I use a super-extra-secret e-mail that I give only to my friends. ;-)

    --
    Have you Meta Meta Moderated lately?
  23. (OT sig response) by jridley · · Score: 4, Funny

    Member of the Stop Fucking Saying 'M$' army

    Right, from now on, it's "micros~1" for me.

  24. Spam from Co-workers? by Titusdot+Groan · · Score: 2, Insightful
    These guys are way behind the curve. A growing percentage of the spam I get appears to be coming from my coworkers.

    These idiots have forgotten the basic rule of dealing with spammers (and other mail miscreants) which is:

    They LIE!
    They lie in the HELO, they lie in the MAIL FROM:, in the headers, etc. etc. etc.

    Any method that depends on this kind of data is doomed to a quick failure in the real world.

    1. Re:Spam from Co-workers? by johnynek · · Score: 2, Informative
      If you read the paper on the archive you will see that there is a method to deal with this problem.

      Namely, when someone joins a spam and non-spam component of the network.

      PS: This method was tested on email boxes from the "Real World", but of course, we could use more email boxes to test with. Please send me a tarball of all your email and I will tune the algorithm! :)

      --
      jabber: johnynek@jabber.org
    2. Re:Spam from Co-workers? by onion2k · · Score: 2, Funny

      Do you work for a penis enlargement company? Coz that'd explain a whole lot..

  25. The key is the cost by h00pla · · Score: 2, Interesting
    People who propose anti-spam measures should keep one thing clearly in mind, it seems to me. Spam will decrease as the cost of sending it increases.

    Though I'm no fan of Microsoft or Bill Gates, the solution proposed by them - one where a complicated math calculation is required for every mail they send - is on the right track because at least, in theory, it becomes expensive to send mail and therefore spammers are at a disadvantage. If this is to be a really workable solution, only time will tell - and given the MS tradition of hype ... who knows.

    Schemes that make it expensive for the handlers (networks, ISPs) or the recipients, are not the way to go. After reading the article, it seems that this is just another one of those.

    --
    I've been swashdotted -- Elmer Fudd
  26. New math? by WD · · Score: 2, Insightful

    The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category

    That has to be one of the most ridiculous statements I've heard in a while. That's like saying I've got a great new burglar alarm system. Now, it only works about half of the time, but when it does work it catches the crook with a 100% success rate!
    Who's buying?

  27. Spammers already defeat this (partially) by xleeko · · Score: 5, Interesting
    Spammers already sort addresses by site in order to take advantage of this effect. They forge the from address as someone else from your site on the theory that you know them and would whitelist them.

    In fact, this has provided me with a kind of "honeypot", since I now check for the addresses of several people who are long gone from my site. If I see their address its gotta be spam!

    - Dave

  28. So it's just a very good rule, how is that bad? by Smack · · Score: 5, Informative

    According to the article, it can make a decision on 53% of the total e-mail, and divide it up into Spam or non-Spam with complete accuracy. The key is that it makes no judgement on the rest of the e-mail.

    So you could throw this as a rule into SpamAssassin with a 100 weight on Spam results and a -100 weight on non-Spam results. That could only help your filtering. With zero false-positives.

    1. Re:So it's just a very good rule, how is that bad? by GooberToo · · Score: 4, Interesting

      Or simply not process the 53% with other spam detection software, which saves on CPU! In other words, make this the first anti-spam process, whereby, half of your email gets to skip spamassassin (or whatever). The other 50%, you process as usual.

    2. Re:So it's just a very good rule, how is that bad? by GooberToo · · Score: 3, Interesting

      Oh ya, in case it's not obvious, that means up to a 50% reduction in the small percent of email which are false-positives. That means, if you have a 5% false-positive, you *may* see that reduced to as little as 2.5%! Technically, it may actually be higher than that. The reason being, it may be that 100% of the false-positives fall into the 50% that this technique properly identifies. Needless to say, that's very exciting. It also means that it creates the possibility to allow people to lower their spam threshold without fear for creating a higher false-positive hit rate. That in turn, means more spam identified with fewer false positives. Let's hope reality false close to my rambling speculations here! ;)

      Very interesting indeed!

  29. Only 50%, but no false positives by blorg · · Score: 2, Interesting

    It only works on 50%, but it claims *no false positives* on that 50%. That means that that 50% can be deleted immediately; no-one has to check in case there is a false positive. By contrast, Bayesean filters *will* produce the occasional false positive, so you have to trawl through your spam folder occasionally to check against this. If I could reduce my spam folder checking from 200 mails a day to 100, I'd be very happy.

    1. Re:Only 50%, but no false positives by ichimunki · · Score: 2, Insightful

      The reason it's not giving you any false positives is because it's giving up on about half of the attempts. In my mind those are false negatives because they require additional effort (i.e. the filter errs on the side of accepting the maisl)... and at a 50% rate that's not much help. I don't think I've ever seen a Bayesian filter that was allowed to just give up on 50% of all inputs... and if it was, I'd bet good money that it wouldn't generate any false positives either.

      Paul Graham kind of got everybody thinking about statistical filtering techniques, but people haven't really picked apart his algorithm or looked at ways to tighten it up. Personally I think that path is a lot more promising.

      --
      I do not have a signature
  30. Scorched Earth:Cleaning up the gene pool by ackthpt · · Score: 3, Funny
    Spammers suck, right? And their children have obviously inherited the spamming gene. So, by starving the children to death, we're preventing the spam gene from spreading. It may sound wrong, but we're actually helping society.

    The Spam Gene is actually a regressive gene, not likely it appeared in the parents or ofspring. It's affect is similar to fouling the nest or pissing on food before eating.

    --

    A feeling of having made the same mistake before: Deja Foobar
  31. This method will ruin a cool part of the net by The+Wing+Lover · · Score: 5, Insightful

    Used to be that one of the cool things about the net was that you would get email from total strangers... "Hi, I'm from {some far away place}. I saw your {Usenet post|web page|profile on some bulletin board site} and really liked your ideas about {something}. I've also been experimenting with {something} and I have some ideas about {whatever}..."

    Now, if we only have emails from our (already existing) friends or friends of friends, then how will we ever meet anybody new?

    --

    - In Capitalist America, law violates YOU!

  32. Link to the Research Paper by Nepre · · Score: 4, Informative

    The actual paper that describes this technique can be found here

  33. Problem halved -- Yarright by ZakMcCracken · · Score: 2, Insightful

    The remaining half of the e-mail then has to be filtered in a more sophisticated way. But by then the scale of the problem has been cut in half.

    Solving "half" of the problem is pretty useless. Spammers -- assuming this technology is ever be widely adopted -- wouldn't be long to find a way to get their messages in the unfiltered heap. The only ones to suffer damage will be the legit email senders.

    Says the Cat, "Instead of counting all the stars in the sky, you could just count half of them and multiply the number by two. You just halved the problem there."

    1. Re:Problem halved -- Yarright by saiha · · Score: 2, Interesting

      This is right on the mark. I think that if this system was widely implemented then we would begin to see more email virus based spamming. Essentually using the infected people to do the spamming to all of the people in their address book. This would in a sense defeat the whitelist method.

      In response to the quote aobut counting the stars, you could use a monte carlo method to count a few stars in random portions of the sky to get a fairly accurate count of all the visible stars.

  34. More fodder for the mill by daves · · Score: 2, Interesting

    The Bayesian rule is just a mechanism for combining multiple independent estimates into an overall estimate.

    This is clearly an independent estimate, and a good mechanism to improve the overall detection probability.

    What we need is a "meta-Bayesian" process that appropriately weights and combines other spam prediction estimates, not just word counts.

    --
    People who disagree with you are not automatically evil, greedy, or stupid.
    1. Re:More fodder for the mill by ceswiedler · · Score: 2, Informative

      SpamAssassin does this. They use a genetic algorithm to calculate the best weights to give all of the tests they have, where 'best' = least false positives and most accurate positives (on their 'standard' spam/ham corpus).

  35. The Joy Of Statistics by tds67 · · Score: 2, Funny
    The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category.

    So it works 100% of the time in 50% of the cases? There is only a 25% chance that I would be interested in something like this.

  36. How it works - clustering coefficients by blorg · · Score: 5, Informative
    You can read an abstract, and download the full (e.g. original) article here in a variety of formats.

    From what I can make out, this system graphs correspondent pairs into correspondence maps, and notes that while normal people all email each other and thus have dispersed graphs, (high clustering coefficient) spammers have a distinct pattern, e.g. 1 person emailing a few million others (low clustering coefficient). There are figures in the article that make this point well.

    The system would be ideal for implementation at a fairly high level, (e.g. the ISP level) where systems can aggregate email headers across many different users in order to come up with meaningful graphs. The advantage it claims of no false positives means that it would be feasible at this level.

    I'm impressed; it looks like a very clever idea. My only question concerns how this would deal with mailing lists, which must appear to it like spam?

    1. Re:How it works - clustering coefficients by gnu-generation-one · · Score: 2, Insightful

      "My only question concerns how this would deal with mailing lists, which must appear to it like spam?"

      Well mailing lists are, by definition, identical to spam, so far as an automated program looking at each messagae is concerned. Whenever there's a test of spam-filtering programs the "false positives" are mailing lists that the tester forgot to tell the spamfilter about.

      It would be useful to have some way of publishing a list of mailing lists who have permission to send you email -- I'll leave it up to the "all you need is a system of public keys..." crowd to start shouting suggestions.

      And for the people who'll suggest whitelisting based on the From field, don't forget that the spammers can easily put "bugtraq@securityfocus.com" as the sender.

    2. Re:How it works - clustering coefficients by gatekeep · · Score: 2, Insightful

      Whitelist on the from field, and enforce SPF.

    3. Re:How it works - clustering coefficients by edrugtrader · · Score: 2, Insightful

      NO NO NO NO NO NO NO.

      do not filter ANYTHING at the ISP level.

      this is not a suggestion, it is a demand.

      --
      MARIJUANA, SHROOMS, X: ONLINE?! - E
    4. Re:How it works - clustering coefficients by orthogonal · · Score: 4, Insightful

      The system would be ideal for implementation at a fairly high level, (e.g. the ISP level) where systems can aggregate email headers across many different users in order to come up with meaningful graphs. The advantage it claims of no false positives means that it would be feasible at this level.

      Yeah, but I'd consider a high-level analysis of my email headers (either sent or received) to be a violation of my privacy. Whether or not I'm mailing to kinky@alterate.life.styles.com, fringe.politcal.groups.require@free.speech.too.org , unpopular.opinions@free.thinkers.net, or falun.gong@is.banned.by.my.dictator.org, it should be nobody's business but my own.

      Someone will undoubtedly argue that since headers are sent in the clear anyway, it shouldn't matter, but keeping a database of who mails what to whom only makes abuse -- by freelance busybodies or government spies and censors -- that much the easier.

      This is a case, I think, were the threat inherent in the cure is worse than the disease.

    5. Re:How it works - clustering coefficients by orthogonal · · Score: 2, Insightful

      Yeah, but I'd consider a high-level analysis of my email headers (either sent or received) to be a violation of my privacy

      And in reply to myself. ;)...

      Since the whole point of this is to build social-connection-webs, it's ideal for government crackdown via the guilt by association angle: not only can you find everybody who is emailing to dump.ashcroft@new.american.revolution.org, you can also find -- and investigate -- all the friends of the dissenter, too.

      And for anyone who isn't worried that the FBI occasionally oversteps it bounds in investigating dissent, just consider that the social affinity networks of p2p traders could also be subpoenaed: we know Joe uploads mp3s, let's subpoena his email "buddy list" and investigate all those people too.

    6. Re:How it works - clustering coefficients by mdfst13 · · Score: 2, Insightful

      As someone who used to sysadmin a mail server, I can tell you that this (saving info about who emailed who) is already required. I forget what the limit was, but we were supposed to keep the mail logs (which carry from who to who info) for at least six months. We actually archived them to our write only backup system on a regular basis. AFAIK, they stayed there forever (of course, it's anyone's guess whether or not we would have been able to retrieve them; our backup system had issues--thus the write only tag).

      This proposal does not involve collecting or saving new info. It involves *using* the existing info at a summary data level. Also, understand that it would be the *recipient's* ISP who would do this, not your ISP. This means that they could only collect info on what you send to email addresses on that server, not cross reference it with all the email that you send.

      It's also worth noting that other ISP-level SPAM filters already process this info as well. This isn't a new concept. The new part is that it is trying to use the patterns *before* putting it in the receiver's mail box rather than after it is identified as SPAM by the receiver.

    7. Re:How it works - clustering coefficients by edbarrett · · Score: 3, Funny
      We actually archived them to our write only backup system
      /dev/null?
    8. Re:How it works - clustering coefficients by Tomble · · Score: 2, Interesting
      SPF? Very neat, hadn't heard of that before. About time somebody did something about the whole header forgability issue- IMO that alone (unless I've misunderstood what it's designed to stop) would be enough to deal with most spam anyways.

      Before I saw your posting, I was thinking that perhaps one way to deal with it would be for a similar approach to the "social networks" and "web of trust" ones to be applied to the servers and networks themselves: each network could keep a list of mail servers on other networks that they trust to not be open relays or spam hosts, etc, and for mail sent from other servers, they could check the lists that other trusted networks keep. They could then choose to add those servers to their own lists too if they turned out to be OK. Some means would need to be made for new servers to be able to get on somebody's list, of course...

      But the point is ultimately, that dealing with the Spam issue by filtering on the content is just stupid, it's a losing battle as they keep finding new stupid ways to get past the filters, and the filters will always have some risk of blocking legitimate emails. What if I send a parody of a spam to a friend as a joke? And if we only use filters at the user's end, the burden of the traffic is still felt by our ISPs and email providers. There HAS to be a way to block it at the source.

      --
      Be careful! New moon tonight.
  37. for a MUCH more interesting read... by germinatoras · · Score: 2, Funny

    Try the link at the bottom of the page:
    Sniffing stools speeds diarrhoea diagnosis
    19 February 2004
    http://www.nature.com/nsu/040216/040216-13.h tml

  38. Bigger Issue... by glpierce · · Score: 3, Insightful

    While this may work for teenagers, it has no use in the business world. In the last week, I've gotten two dozen vital emails from people I did not previously know (professors at various grad programs). In that period, I haven't gotten a single message from people I know (or who know someone I know), because I have conversations with friends them face-to-face, over the phone, or through instant messages. This sort of filtering just removes the most important reason for the existence of email, which is replacing snail-mail, not replacing conversations.

    --
    G
  39. I guess that pigs have wings. by Henry+Stern · · Score: 3, Interesting

    I never thought that Slashdot would help me find papers relevant to my research!

    I think that their idea is good from a technical point of view, but very bad from a privacy point of view. I am of the opinion that gathering social network information is extremely dangerous. A pertinent example: If your friend is branded a "terrorist," then "they" can exploit the information that you have voluntarily provided to then put you on a "terrorist" watch list.

    Another example: Say that someone who knows someone that you know actually buys something from a spam. If the spammer can access the social network information, suddenly your little niche of the network is going to be aggressively spammed. After all, like minds congregate.

    There is no doubt in my mind that the black hatters will infiltrate the social network communities and use that information to spy on potential viewers. See this bugzilla thread where the folks from Atriks Professional Email Deployment Service follow SpamAssassin's development and adapt their "ratware" tool accordingly.

    The biggest problem with collecting social networks is that once the data has been gathered, it is very hard to control. Those of you using Orkut should think long and hard about it.

    In conclusion, I think that this is technically a good idea but it opens a Pandora's box.

  40. Erm, not by Vainglorious+Coward · · Score: 5, Informative
    The [envelope-sender] cannot be spoofed in most cases

    Simply : untrue. It's as easy to fake the envelope sender as it is the From: header. I think you're getting confused with "Received" headers, where each mail system inserts its own bit of tracking information. The envelope-sender is completely under the control of the sender, and (usually) propagates un-modified as an email is handed between systems (indeed, one of the criticisms of SPF is that by modifying the envelope sender you break forwarding).

    --
    My next sig will be ready soon, but subscribers can beat the rush
    1. Re:Erm, not by MyFourthAccount · · Score: 2, Funny

      I think you're getting confused with "Received" headers, where each mail system inserts its own bit of tracking information

      Which, for all completeness, is now also totally useless since spammers use compromised boxen to do the dirty work from them (hence you can only track it back to some worm-infected box owned by grandma who's just been taken to the hospital with a severe cramp in the left side of the body after pressing the 'Ctrl' key 4,523,098 times when the computer said 'press any key to continue'. This, of course, after the RMA'd keyboard arrived, which yet again did not contain the 'any' key, but did come with a friendly letter clearing up the issue.)

      All seriousness aside, as an owner of a common word domain name, I get to be the target of many a spammer. Not in the To field, but in the From field.

      For said domain, I receive everything that is sent to *@mydomain.tld. I used this to keep track of which people would sell my email address. For example if I had to register with shavedpussy.com, I'd give them the email address: shavedpussy.com@mydomain.tld. Now when I get spam at that email address I know I can't trust shavedpussy.com and it hurts my feelings.

      Well, the motherfucker, fudgepacker [no, sorry, I take that back. I'm ovbiously drawing a blank, there's gotta be a better suiting swearword out there.] spammers have decided that it would be a great idea to send their crap from those owned computers, forging the From field to something like randomcrap@mydomain.tld.

      So now I get hundreds of emails a day from all those friendly mail servers around that world that Jake is Out Of the Office, and that sillybunns@telstar.com is not a Known User. I'm the most grateful person on the planet, obviously, to have been relayed this information. I think the SMTP protocol is swell and any software that automatically replies to email is a Good Thing(tm).

      So my sneaky system has been turned against me by the exact people that I was trying to defeat. Now I have to block *@mydomain.tld and specifically add any new email that I assign. I'm extatic, because it's not a lot of work at all and just in general I'm bored most of the time, so I can use the distraction.

      It actually didn't work that well anyways, because after receiving spam to mom845@mydomain.tld I realized that mom just couldn't resist the excitement of sending just one more eCard because this one was just too funny to not send. At least she stopped forwarding me chain-letters (which she really wasn't into, but this one was for a good cause) with all the email addresses in the To or Cc field. She's good now, she puts the addresses in the Bcc field. Of course after learning of this technique she broadcasted an email to everyone she knew to make sure that they were aware of it as well. Cc: mom452@mydomain.tld.

      The point of my story: let's say I have changed my mind about the right to bear arms. And I understand that the intention of the constitution may not be my interpretation of it and all, but times change and since spammers didn't exist when the constitution was written, I figure I'm a pretty well regulated Militia, and spammers, well, they just screw things up. (I'm still working on the wording of that a little, it's become terribly hard to interpret the part about security and stuff, especially now that Ashcroft is playing grab-ass with anyone willing to pitch in a dime to keep Patriot Act II moving along, but that's an whole nother can of worms. Speaking of worms....).

  41. Sorry: that link is the full pdf, here's abstract by blorg · · Score: 4, Informative
    Sorry, that is a link the entire pdf of the article. This is the abstract, which you may as well have here if I'm posting again (on the linked page, you also have other formats available, as well as mirrors):

    We provide an automated graph theoretic method for identifying individual users' trusted networks of friends in cyberspace. We routinely use our social networks to judge the trustworthiness of outsiders, i.e., to decide where to buy our next car, or to find a good mechanic for it. In this work, we show that an email user may similarly use his email network, constructed solely from sender and recipient information available in the email headers, to distinguish between unsolicited commercial emails, commonly called "spam", and emails associated with his circles of friends. We exploit the properties of social networks to construct an automated anti-spam tool which processes an individual user's personal email network to simultaneously identify the user's core trusted networks of friends, as well as subnetworks generated by spams. In our empirical studies of individual mail boxes, our algorithm classified approximately 53% of all emails as spam or non-spam, with 100% accuracy. Some of the emails are left unclassified by this network analysis tool. However, one can exploit two of the following useful features. First, it requires no user intervention or supervised training; second, it results in no false negatives i.e., spam being misclassified as non-spam, or vice versa. We demonstrate that these two features suggest that our algorithm may be used as a platform for a comprehensive solution to the spam problem when used in concert with more sophisticated, but more cumbersome, content-based filters.

  42. *Sigh* by NanoGator · · Score: 2, Insightful

    All this work to stop spam, and ICQ's done it for years.

    Frankly, a series of filters is probably the worst approach at stopping SPAM. It's a game of "make the filter, defeat the filter, and risk not getting important mail." Why bother? The solution lies in a different approach. Authorization. There needs to be authorization layers in order to defeat spam. We need buddy lists, we need blacklists, we need the ability to request authorization, etc.

    I realize that fixing this problem isn't a simple one given the scale in which it's used. But man, I really wish somebody'd figure out how to do the transitory work. I'm almost completely reliant on ICQ and Private Messaging on forums in order to keep up with everybody.

    --
    "Derp de derp."
  43. Reverse MX DNS querying by germinatoras · · Score: 3, Interesting

    I've been thinking about this method for a while - basically, you configure your SMTP server to do this:

    • MTA connects to you, gives you a MAIL FROM: xxxxx@somedomain.com
    • Your server performs a MX query for somedomain.com, getting a list of IP addresses
    • Your server compares the IP of the connecting MTA to the list of IPs in the MX records.
    • No match? Connection gets aborted.

    This idea is cleary too simple to have not been thought of before - but I have yet to find a good explanation as to why it won't work. Verizon.net uses this exact method - try sending a SMTP message from a host that isn't listed in your domain's MX records, you get a 550 Sorry, you aren't allowed to mail for this domain". or something comparable. How come this method isn't more widely used? Going through my own SMTP server logs show that the vast majority of SMTP servers sending legit mail are also listed in the domain's MX records. The only price is that you require the sender and receiver to be the same within a domain - hardly an unreasonable requirement.

    1. Re:Reverse MX DNS querying by argent · · Score: 2, Informative

      This won't work because the incoming and outgoing mail servers of just about any large organization have nothing to do with each other.

      In fact one of the rules I use blocks messages that claim to come from the MXes of certain large service providers because such messages are 100% spam from spammers who already thought of your idea.

    2. Re:Reverse MX DNS querying by catdevnull · · Score: 2, Informative

      we tried to implement this very method. it had very good results in drastically reducing the spam levels we were getting. Unfortunately, it also excluded small business and .orgs who didn't have their mail servers entered correctly if at all in the DNS. Although the "unclean" but legit mail servers were only about 2-3% of the total incoming mail, it was still enough "false positives" to make us have to open up the fort again. :(

      until everyone jumps on the bandwagon of MX registration, this method won't work. Required SMTP auth would be nice--at least it would be a bit more traceable. As long as 1/10th of 1% of spammers reply to spam msgs, then those damn spammers will think it's profitable. spammers die!

      --

      I might know what I'm talkin' about, but then again, this is Slashdot...
    3. Re:Reverse MX DNS querying by athakur999 · · Score: 2, Informative

      There is already something out there that's pretty similar to what you're suggesting. It's called Sender Policy Framework.

      Basically, as part of your DNS entry, you have a record containing a list of all of the addresses that are allowed to send email on your domain's behalf. I think there was a story on Slashdot a few weeks ago about it as AOL has starting using it.

      --
      "People that quote themselves in their signatures bother me" - athakur999
  44. I once had an evil idea by WormholeFiend · · Score: 3, Interesting

    to deal with open relays in China...

    I would ve harvested the emails of as many members of the ruling communist party as possible, and used those relays to spam them with anti-communist propaganda. I believe the consequences would've been swift and ruthless.

    Unfortunately I cant read/write Chinese, and this idea wouldnt work in less repressive regimes...

  45. Mailing lists / newsletters by blorg · · Score: 4, Insightful
    A mailing list would have multiple folks in the To: line, which would be easy to spot automatically.

    Not necessarily, indeed most professional ones avoid this. While many spams do contain multiple people in the To: field (but also many don't). One way or the other, I don't think this is relevant if we are trying to compare the graph of a mailing list to that of a spammer. To take an example, user slashdot-headlines@newsletters.osdn.com sends thousands of emails to people *who don't know each other*. User enlargeyourdong@hotmail.com has exactly the same pattern. How do you tell these apart?

    1. Re:Mailing lists / newsletters by sab39 · · Score: 2, Interesting

      Easy - those thousands of people who don't know each other also send email *back* to the mailing list. Only a few dummies send email back to the spammers.

      For something based on statistics, the difference would likely be very noticeable.

    2. Re:Mailing lists / newsletters by The+Dakota+Kidd · · Score: 3, Insightful

      According to the paper this article is based on, the algorithm is effective against messages with multiple recipients in the To: or Cc: headers. This means that messages coming from slashdot-headlines@newsletters.osdn.com would probably be in the unclassifiable half. Indeed, a good chunk of spam these days would be unclassifiable according to this algorithm.

      However, the whitelist that this algorithm generates would still be valid. To me, this is the real strength of the algorithm, to be able to generate a white list with no input on my part.

  46. bcc to all! by Datoyminaytah · · Score: 3, Insightful

    These people don't seem to realize how SMTP works. The RCPT command doesn't distinguish between types of recipients, it's up to the sending process to "play nice" and put that information in properly created headers.

    A spammer could manipulate the To and CC headers as necessary to fool filters that analyze them, without affecting the ACTUAL list of email addresses to which the email is sent.

    I don't think spam can be stopped without replacing or overhauling SMTP, and then ceasing to support "old" SMTP. But that ain't gonna happen anytime soon. (sigh)

    --
    assert(birth_date<time-86400)
  47. Some of us rely on e-mail from strangers by beagle72 · · Score: 5, Insightful

    The proposed anti-spam clustering technique is of course a variation on whitelisting. While clever, it fails to address a problem I have not often seen addressed. Many people defend themselves from spam by obscuring their e-mail addresses in public places, and perhaps by using whitelists to prefer known senders. This may be effective for many people.

    However, some of us can't avoid having a publically available e-mail address. For example, writers such as myself rely on feedback from readers who are, in nearly all cases, strangers (and sometimes strange, but that's another story...) Avoiding false positives from strangers is very important to me. I want their messages. But, since my e-mail address is published frequently (hence no reason to hide it here), I obviously receive a ton of spam.

    For the past few months I have experimented with a plug-in called BayesIt! for the Windows email reader The Bat!. As the name implies, it's a bayesian filter. The nice thing about BayesIt is that I could point it to my already-stuffed spam folder and train it on thousands of messages in one go. So far it has worked out rather well. No false positives, and only about 10-20 false negatives per day (out of approx. 400 spams).

    Still, in the long run I support proposals that shift the economics of e-mail in ways that have minimal impact on human beings while making spam unprofitable. Changing the economic model of spam is the only sure solution; relying solely on technology will simply keep us locked in an ongoing arms race.

    -Aaron

  48. Most newsletters are one-way by blorg · · Score: 4, Insightful
    Easy - those thousands of people who don't know each other also send email *back* to the mailing list. Only a few dummies send email back to the spammers.

    Most mailinglists and newsletters are one way - I'm not talking about discussion lists or listservs, but rather about the bot that sends me Slashdot headlines, Jakob Nielsens' Alertbox, Fred Langa's newsletter, and even commercial speech that I am signed up to and want to hear such as Komplett's weekly offers, or Ryanair's cheap flights, etc.

    1. Re:Most newsletters are one-way by benna · · Score: 2, Informative

      Another possible problem could be confirmation emails when you sign up for a mailing list or message board or something. This would be even more dificult to tell from spam than newsletters. Also you have no way of knowing the email address it will come from to add it to a whitelist.

      --
      "It is not how things are in the world that is mystical, but that it exists." -Ludwig Wittgenstein
  49. It wouldn't be meta-bayesian. by Ayanami+Rei · · Score: 3, Informative

    It'd still be bayesian, except that word frequencies and graph connectivity of sender would _both_ be considered for additional spam probability. I don't have a filter to check, but don't most Bayesian classifiers also include other metrics besides top 20 word frequency, like length or presence of attachments, etc.?

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  50. Plaxo Revealed? by Fritz+Benwalla · · Score: 2, Interesting

    Isn't this scheme the perfect use for the wide-ranging social network information being collected by Plaxo?

    It makes sense - they certainly haven't annouced a revenue stream yet, and "keeping your address book up-to-date," even in a wireless and multiplatform world just doesn't seem like a big enough idea to justify the huge amounts of data collected.

    So is that the annoucement that's coming from Plaxo, the unveiling of a broad Spam solution that used 'degrees of separation' data from your address book and the address books of your friends to implement a spam filtering solution?

    If I may say, it does seem like the killer app for their unique data set.

    -------

    --

    Believe me, I'm as surprised by my comment as you are.
  51. Addressed, not send by by SmallFurryCreature · · Score: 3, Informative
    In order to break this system those spam you received would have to have been send by someone those people know. Not just send to a lot of people you know that is in fact what would tip the system something is wrong.

    I send you and your sister a spam. While both of you are getting the spam, to both of you I am an unknown and therefore the system would flag me. ONLY if I send the spam to you while pretending to be your sister would the system break. I would need to know both your email and the email of someone you know. This would not be impossible to harvest with virusses stealing addressbooks but is not what is currently happening. Currently email address lists used by spammers are very simple flat text files. Of course nothing complex would be needed. Simply a similar text file but now with two emails per line. The first the recipient, the second the person to forge as the sender. Simple but more work.

    So it looks like a pretty clever idea. Especially for work place email where most mail is by people you know and very little email from outside usually arrives. And even when it is done it is usually from a known domain namely a client or supplier.

    Will it work? Who knows. Gotta be worth a try. Unless you want to wait for Bill Gates to fix it. We all know how well the security problems in windows were fixed eh?

    There is not going to be a magic bullet that fixes spam. We will just have to use a lot of ordinary lead ones. Don't worry Bush says they are safe.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

    1. Re:Addressed, not send by by crymeph0 · · Score: 2, Interesting

      I think this could be pretty easily beaten, and I'm surprised my spam isn't already showing this characteristic, now that I think about it...

      (all spammers, please don't read anymore below here, I don't want to give you ideas).

      In my example, I get spam sent to me and several other people at my work. It would be trivial for spammers to modify their algorithms so that instead of sending to x people in my office, they send to (x-1) people in my office, and use that last address as the "From" field. Of course, you could set up your email server to detect this (mail coming from outside claiming to be from inside). Does Exchange Server provide this kind of functionality? If not, it would be all too easy for spammers to break this method.

      --
      It should be illegal to say that freedom of speech should be limited.
  52. Seems like a good use for FOAF by GeorgeH · · Score: 2, Informative

    FOAF is an open XML/RDF standard for describing these social networks, it seems like that would be a good way to implement this. Plus, since it uses SHA1 sums of email addresses it would be possible to check addresses without giving them up to spammers.

    A lot of sites like Tribe.net and my own project SongBuddy are working on integrating FOAF into the site, so that you won't have to worry about the mechanics of it unless you want to. Seems like an easy way to build these kind of white lists.

    --
    Why can't I moderate something "Wrong" or at least "Grossly Misinformed"?
  53. everything has a weakness... by Cruciform · · Score: 3, Funny

    Next thing I know all my email is going to have a reply-to: Kevin Bacon.

  54. HOW SPMAMMERS CAN BEAT THIS FILTER by goombah99 · · Score: 4, Interesting

    There are three ways one can beat the filter.

    The first is trivial and certain to succeed but has a Drawback to spammers: only send e-mail to single recpients. The drawback is this puts a much higher load on their servers since every message is sent individually.

    The second method is to always include dummy addresses in the mailing list that the recpients probably have in their address books. For example, add the following names to the to-field: notifications@paypal.com and list-notication@ebay.com.
    Any recpieint that of the spam message that also has recieved e-mail from e-bay or pay-pal will trust the message.

    One can do even better by planning ahead when harvesting e-mails. For example, if you harvest a set of e-mails from a pqarticular bulliten board you can make note of message cliques at the time of harvesting, and send messages in the same groupings. for good measure you also send the addresses of the buliten board admins as well.

    Third, all the spammer really has to do is to know is one recipient you have gotten messages from. Thus either buy mailing lists from legitimate companies people actually do bussniess with. Or create your own loss-leader messages. For example, send out some political action alert or anything that has some vlaue or use to most people, maybe a lottery drawing for a prize, or a discount subsciption to time magazine, so they will accpet the message. the sender does not have to be the same as your spammer address. Now you know someone in the adress book of the victim. Now you spam the crap out of them while including the trojan address in the to: field.

    --
    Some drink at the fountain of knowledge. Others just gargle.
    1. Re:HOW SPMAMMERS CAN BEAT THIS FILTER by kirkjobsluder · · Score: 3, Interesting

      The first is trivial and certain to succeed but has a Drawback to spammers: only send e-mail to single recpients. The drawback is this puts a much higher load on their servers since every message is sent individually.

      True this method is strongest against dictionary spam and does not work against non-dictionary spam.

      [i]The second method is to always include dummy addresses in the mailing list that the recpients probably have in their address books. For example, add the following names to the to-field: notifications@paypal.com and list-notication@ebay.com.
      Any recpieint that of the spam message that also has recieved e-mail from e-bay or pay-pal will trust the message.[/i]

      Um, did you RTFA? (And perhaps most importantly, did anybody modding this article RTFA.)

      The algorithm has nothing do do with addressbooks. Instead, it looks at friend of a friend networks as identified by mail headers.

      For example, I work on a project with Bob, and Susan. A typical email message about the project will include my address, and their addresses in the header. The algorithm assumes that three first degree relationships exist:
      me-bob
      me-susan
      susan-bob

      There are also three second-degree (friend of a friend relationships.
      me-susan-bob
      me-bob-susan
      susan- me-bob

      The high ratio of second-degree/first-degree relationships gives susan and bob a higher score (3/3=1), and puts them on the whitelist.

      With paypal.com, there is only one first-degree relationship: (paypal.comme) and no secondary relationships. The algorithm handles single relationship networks as a special case, and defines them as ambiguous.

      With a typical dictionary attack, a spam comes with 50 email addresses in the header. However, because a dictionary attack relies on sequential or randomly generated usernames, the number of recipients who are part of my social network is low. So we have 50 first degree relationships, and lets say the spammer gets lucky and nails Susan and Bob as well. It still gets a low score. (2/50=.04)

      One can do even better by planning ahead when harvesting e-mails. For example, if you harvest a set of e-mails from a pqarticular bulliten board you can make note of message cliques at the time of harvesting, and send messages in the same groupings. for good measure you also send the addresses of the buliten board admins as well.

      This is a slightly better strategy. However, this only works if you use email from a member of the clique, and limit the recipient list to members of the clique.

      But there is a serious problem with the strategy. The stated goal of the authors (did you RTFA?) is to increase the costs of spamming to the point where spamming is no longer economically profitable. Such a strategy would require research which is expensive.

      Or create your own loss-leader messages. For example, send out some political action alert or anything that has some vlaue or use to most people, maybe a lottery drawing for a prize, or a discount subsciption to time magazine, so they will accpet the message. the sender does not have to be the same as your spammer address. Now you spam the crap out of them while including the trojan address in the to: field.

      Once again RTFA. The algorithm has nothing to do with addressbooks. But you did raise one possible threat: spoofing. A spammer could not get integrated into my social network by offering a loss-leader (for the same reason that messages from ebay.com would not be whitelisted). A spammer could spoof a member of my social network. (For example, using Bob's address.) However, the problem here is economics. Bob would probably only be auto-whitelisted by 50 people. Thus spoofing Bob would only get you access to a small population, which defeats the entire economic rationale for spamming.

  55. hmmm by loopyfx · · Score: 2, Interesting

    suppose a spammer harvests from a social network site and spoofs their source address to be from harvested addresses... it's pretty likely 2 people on the same social network site will be within eachother's threshhold if only the to/from/cc headers are used...

    maybe more sophistocated techniques to include the source IP subnet or something? Some sender verification would be required.

  56. My filter by renfrow · · Score: 2, Interesting

    I have my own domain, and run my own mail server for personal email. The ONE thing that I have done to reduce incoming spam drastically(i.e. I only get 5% as much now), is to refuse incoming connections to the mail server from any machine that does not have a valid rDNS value. I may miss email from someone, but, they'll have gotten a(n) (somewhat) informative message telling them why their email did not succeed. They can either complain to their ISP and get their rDNS fixed (like I did :-) or call me/send me a letter.

    Tom.

  57. Re:Chain of Trust by ComputerSlicer23 · · Score: 2, Interesting
    You don't have to verify that it came from the SMTP serve one would expect. You have to have something the sender can do, that no one else can, that is easy for anyone in the world to verify.

    Essentially, that is a short description of how a "Chain of Trust", or better named a "Web of Trust" works in GPG. You have people who verify that person A knows the private key A_1 the corresponds to public key A_2.

    Even if they don't bother encrypting everything, but just digitally sign it. It's also just an anti-spam filter, so I'm even less worried about having the key be encrypted. Now, I can go sign any key, with my key rating how "trustworthy" I deem people. You get a 5 if you are really trustworhty, and a 0 if I deem you absolutely untrustworthy.

    From there, you can build layers of trust, trusting the ratings of people you trust, on and on, until you establish a relationship thru the web between you and the sender.

    Now the problem, is that there is no marginal benefit, an it'd be very hard to get the users individually to do this. So, I'd suggest that the SMTP servers do this themselves. You create a web of trust that is only for SMTP servers. You register you key on the web. You send people some e-mail. Eventually, they'll e-mail the admin of the E-mail servers you communicate with regularly telling them asking them to review their logs and sign your key. Ask your friend, peers, clients, vendors, and/or upstream providers to sign the keys deeming you trustworthy.

    They do this, and your on the web of trust. You find a mail that doesn't do this, view it as suspcious. You find one that is signed with an SMTP key that is known to have sent spam by someone you trust, you drop it on the floor.

    Then you can start to trust SMTP servers. It has all of the advantages of SPF, and has some type of cryptographic security, plus doesn't allow spammers to just setup SPF records bogusly and get away with it. They'll have ton continuiously try and get new keys that are deemed trustworthy.

    Assuming you have any friends, who have friends outside your clique, it should be relatively easy to get a foothold in the web of trust. Everybody who befriends a Spammer will be deemed "untrustworthy" in short order. So you won't trust people they trust. Eventually the system should balance out. No work need change by individual users. Mail Admin's could communicate with each other and make the system work. About the only real problem, is that it puts extra load on any mail server. Depending on the volume of mail you have, just setup 2 or 3 inboud/outbound sendmail servers that you queue to. Their sole job is to verify and/or add the digital signature/encryption to mail.

    Webs of trust are a well understood animal in GPG land. While I'm not terrible conversant with them, they are essentially a distributed rating system by which rankings and trust worthyness can be ascertained about people you've never met. Think of it as a better system, with more flexibility then Karma + Karma Modifiers + Friend/Foe on Slashdot.org

    Kirby