Slashdot Mirror


The Next Step in Fighting Spam: Greylisting

Evan Harris writes "I've just published a paper on a new and unique spam blocking method called "Greylisting". The best thing about it other than achieving better than 97% effectiveness in blocking spam, is that it practically eliminates the main problem of other solutions: the false-positive. There's even source code for an example implementation written as a perl filter for sendmail, along with instructions for installing, so you can get up and running quickly."

24 of 481 comments (clear)

  1. your first mistake by frieked · · Score: 4, Insightful

    I'm going to try to say this as nicely as possible and without trolling:
    You have just rendered Greylisting pretty useless by making it open source. Spammers are much smarter than you think and what you have basically done is shown them what they need to do in order to get around Greylisting. That's just my take on the issue, maybe I'm wrong but I doubt it.

    --

    I have often regretted my speech, never my silence.
    -Xenocrates
    1. Re:your first mistake by Soko · · Score: 4, Insightful

      I'm going to try to say this as nicely as possible and without trolling:

      Not trolling at all - you have a legitimate (though perhaps misguided) problem with this method.

      You have just rendered Greylisting pretty useless by making it open source. Spammers are much smarter than you think and what you have basically done is shown them what they need to do in order to get around Greylisting. That's just my take on the issue, maybe I'm wrong but I doubt it.

      So, the spammers themselves will be of significant help in debugging and helping to fix the code so they can't circumvent it, won't they? OSS means anyone who finds how the greylist script is beaten can figure out a fix and post it. Sounds like the best thing to do IMHO.

      Soko

      --
      "Depression is merely anger without enthusiasm." - Anonymous
  2. 1 false positive is not acceptable. by Pop+n'+Fresh · · Score: 3, Insightful

    This isn't very reassuring:

    "it practically eliminates the main problem of other solutions: the false-positive."

    What does 'practically eliminates' mean? If it gives false positives at all, it is just as useless as all those 'other solutions'.

    --
    *This page intentionally left pointless*
    1. Re:1 false positive is not acceptable. by dasmegabyte · · Score: 3, Insightful

      Maybe we don't want them to be so accurate.

      I get these chain emails from my brother. They are always some funky scheme to get money that won't work. I'd love to just delete them...but if I do this, he tells my mom I don't answer his email.

      She then laces into me like you would not believe...blah blah blah he's your brother and you should love him. I don't need that grief...so instead I respond with a "not interested, no cash right now." Keeps the family happy.

      I could see it being more important than this, though. Your boss sends you direct mail HE received and appends a "Should we do this" to the bottom. Or, worse, your marketting team constructs a direct mailing that fails your spam filter (no comments from the peanut gallery...obviously this is a good thing to find out, but this is not the way to find it out). Missing that one email could make somebody VERY angry and put you in danger. I have had messages from my boss/CEO/etc go into my junk folder and found them when cleaning it out.

      It is correct for the spam engine to label these as spam email. It would be incorrect for it to delete them before they got to you. And so I subscribe to the school of thought that a single false positive makes any spam filter absolutely worthless. It is very easy to delete a message that gets through the filter. It is impossible to resurrect a mailing you never even knew you got.

      --
      Hey freaks: now you're ju
  3. Time critical by Synithium · · Score: 5, Insightful

    Time critical mailing will go out the window. I can see how this might make any corporate user irate. The same thing goes for challenge-response, the time delay in the business world is unacceptable.

    This would be great for personal mail, but that's about it. ISPs would have the same problems with it because their business-class users most likely use the same servers as their consumer-class users.

    1. Re:Time critical by IncohereD · · Score: 4, Insightful

      How often do you get time critical e-mail from someone you've never recieved e-mail from before?

      some guy telling you to BUY THIS NOW != time critical.

      your wife telling you to BUY THIS NOW == time critical, and in theory, your wife == whitelisted (or blacklisted, depending on personal preference).

  4. security through obscurity, again? by dh003i · · Score: 4, Insightful

    If they can get around it by looking at the source, then something was wrong with it, waiting to be exploited. Might as well fix it.

    1. Re:security through obscurity, again? by SuiteSisterMary · · Score: 3, Insightful

      The way to get around this, of course, being that you send each email twice. In other words, run through your database, then run through your database. Same IP addy, same sender, same recipient. As far as the MTA's concerned, it's retrying. Boom.

      --
      Vintage computer games and RPG books available. Email me if you're interested.
    2. Re:security through obscurity, again? by SillySlashdotName · · Score: 4, Insightful

      I see that, in fine /. tradition, you didn't RTFA.

      From the article: If we have never seen this triplet before, then refuse this delivery and any others that may come within a certain period of time with a temporary failure. (emphasis addded)

      Later in the article it goes into much more detail about the delay, how long to delay if the triplet has not been seen before, life time of the whitelist, etc.

      It also talks about configuring the times - they mention the default delay is 1 hour, but that their records suggest that 1 minute would have caught 99% of the same spam messages - "The data collected during testing showed that more than 99% of the mail that was blocked with the tested setting of 1 hour would still have been blocked with a delay setting of only 1 minute. At that point, having a larger initial delay will definitely help, as it gives time for other blocking methods to act. For this reason, it is suggested that at least a one hour delay value be kept as a default, since spammers will start adapting as soon as this method becomes known and starts being used. (again, emphasis added)

      RTFA!

      --
      Acts of massive stupidity are almost never covered by warranty. --me.
    3. Re:security through obscurity, again? by blakestah · · Score: 4, Insightful

      RTFA!

      There is no magical waiting period or re-try period that cannot be trivially coded around. And, with good money on the line, will be trivially coded around.

      You don't get it. Really smart people are getting paid a whole lot of money to make programs to exploit every possible crack in the way we send email. There is no general rule to spammers, except that it is a lot of money and they are very clever. Little bandaids are not going to stop this one - there needs to be a much more fundamental change. And I am not talking about laws against spam - I am talking about changes in the protocols we use to send email.

    4. Re:security through obscurity, again? by SacredNaCl · · Score: 3, Insightful

      As stated, the only reason the hour works right now is because the spammers don't see this in the wild. Re-running your database script an hour later isn't a big deal.

      I disagree. When you are sending 250,000,000 emails a day -- restarting that script IS a big deal. It would, in effect, make them have to do the entire thing twice. That's a pretty big hit on their resources.

      --
      Freedom is merely privilege extended unless enjoyed by one and all.
  5. Re:Questions by sulli · · Score: 3, Insightful
    1 hour is the time proposed. Completely unacceptable unless the whitelist works.

    Since most personal users are on dialup or dynamic IPs, unless the mail client can upload the whitelist in a trusted fashion (or the MTA remembers what users the client sent messages to!), this won't work.

    Do any mail clients include whitelist-collection? Mail.app for OS X does collect all addresses you've sent to, but I've never seen any tool to upload it somewhere.

    --

    sulli
    RTFJ.
  6. spam.....hrmmm by chef_raekwon · · Score: 5, Insightful

    with all of these solutions to spam..and all of the spam now flooding mail servers...

    isn't it time to change the specification (RFC) and possibly the manner in which our current system works? i haven't come up with anything yet, but surely there must be some sort of handshaking/secure type connection that could be used - - some sort of postage (free) that is encrypted into the mail, that states that it is genuine....kind of like the hologram on those windows cds...

    i dunno. file this story under redundant.

    --
    We're like rats, in some experiment! -- George Costanza
  7. I'm not sure about this... by BiteMeFanboy · · Score: 3, Insightful
    These applications appear to adopt the "fire-and-forget" methodology

    I thought it was generally understood that most spam was sent by abusing open relays, thus hiding it's origin. This could be wrong. However if it's not, those figures aren't appllicable. Nor is spam going to be diverted since an open relay is generally running a regular mta and will attempt a retry. For instance, if qmail were running on an open relay and was abused by a spammer it would try again and again with an increasing delay (calculated logarithmically if memory serves) between attempts. So the mail will still get through.

    When you further consider that if a spammer hits an open relay and hammers your mailserver from it and all of the "triplet's" are new, you're increasing your traffic, because all of that mail will be attempted again.

  8. Poor use of statistics by GGardner · · Score: 4, Insightful
    The data in this article claims that 1% of all corporate mail servers in the UK allow open relaying, down from 91% in 1997. For all we know, the total number of corporate e-mail servers has grown by a factor of 100 (or more) in the last six year, meaning that perhaps there are more open relays now.

    The article also doesn't measure the amount of spam coming through those relays. Even if there are only 10 open relays in the UK at any one time, it still might be possible for all of the spam to be coming through them.

    Certainly, closing down open relays is a good thing, but lowering the percentage of open relays doesn't prove anything about the source of spam

  9. Easy for end-users, sure. by Medievalist · · Score: 5, Insightful
    Just encode your e-mail address on web pages & don't sign up to any dubious mailing lists.
    Many of us must maintain contact addresses in the global whois database - so that people can contact us when something is broken.

    Look at it this way: you can stop crank calls by unlisting your phone numbers. But you can't unlist the hospital, the ambulance service, the fire department, etc.

    We're not all end-users. Some of us are the plumbers.
  10. Delaying email by one hour! by pjrc · · Score: 5, Insightful
    From the linked paper:

    An hour is short enough that in most cases, users will not notice the delay.

    I'm wondering how I'm going to explain that to a new customer over the phone who says "I'll just email that file right now so we can go over it together".

    1. Re:Delaying email by one hour! by vidarh · · Score: 4, Insightful
      Agreed. I've been involed in operating a larger (hundreds of thousands of active users) mail system a couple of years ago, and users would complain if their mail took more than seconds. We had to upgrade our system at one point because rapid growth had made mail delivery take a couple of minutes on average, and it caused bad publicity - a lot of users had a clear expectation that e-mail should be delivered in a few seconds and that if it didn't something was wrong.

      I think changing that perception of e-mail as near instant will be incredibly hard. And if you succeed it will just move even more traffic over to the IM networks and cause spamming of IM networks to escalate instead.

    2. Re:Delaying email by one hour! by pjrc · · Score: 3, Insightful
      Saddly, you have missed the central point about the necessity of timeliness of email delivery and instead focused on using FTP rather than attachments.

      Even if FTP were a solution, it does nothing to answer a new customer who says "I just heard about you and I'm excited about your products. Wanted to call and ask you some questions. I sent an email about 10 minutes ago with an outline of the project we're doing were you guys could really help out, have you had a chance to look it over yet".

      There's a limitless number of these important common customer relationship scenarios, where the expectation of all parties involved is that email is delivered in under 1 minute and typically 5-10 seconds. And there are an infinite number of scenarios other than sales and customer service/relations where people quite reasonably expect email to be delivered in seconds.

      Focusing on using FTP isn't just the wrong answer, it's not even an answer at all to the problem of email delivery taking an order of magnitude longer than users expect and depend upon.

      But as others have pointed out, most users don't have access to FTP servers to receive files. Most corporate firewalls would prohibit users from setting up a FTP server. I would guess that almost any employee behind a corporate firewall wanting to somehow receive a file from a new customer via FTP who attempted to ask a sysadmin would get the answer "just have them send it as an attachment". FTP is simply not a viable protocol for customers and salespeople (or most others) to use to pass files back and forth.

      Aside from not solving the unacceptable delay and the inappropriateness of using FTP, there is the problem of bad attitude. Specifically:

      Explain this to your user. You can just tell them that... [snip]

      Where did "new customer" turn into "user". The word "user" in this context is often spoken in the tone of an overworked, grumpy sysadmin who's personal view of his priorities are decoupled from the larger organization's mission (usually taking care of customers, selling products, operating efficiently, and so on).

      In this particular example, what is important is that the new customer whats to talk with someone about solving his problems. That someone is me, and I want to impress him, sell him something that will truely meet his needs, and hopefully turn him from "new customer" into "repeat customer" or even "loyal customer". THAT is what is important, and getting the customer's file quickly and easily with minimal hassle is merely a tool that enables the truely important work to happen.

      Not having the email for 1 hour means I'll either have to call him back in an hour, while he probably calls some competitors and shops around. Often times people will buy from the first friendly, knowledgable person who goes to some effort to help them.... searching until they find that person/company. Delaying response to a new customer by 1 hours would put me at a competitive disadvantage.

      Or we'll have to proceed without it (FTP is not an option), leading to frustration as he explains material that would have been much better delivered as a file. Maybe it would go ok, maybe not. But it's starting the whole process "on the wrong foot".

      Then again, if your business is being a grumpy sysadmin where you have (captive) "users" rather than "customers", maybe delaying new email conversations is a big advantage which is not offset by any impact in "responsiveness" because it's already intentionally low.

  11. One good point about this proposal by Anonymous Coward · · Score: 5, Insightful

    It deals with spam at the server level. All the wonderful user-level solutions don't do jack to stop spam from being sent. Look at the numbers the spammers show for return rate, and look at how fast spam programs can go, and you'll see that the only solutions that will work are those that make it expensive to send spam. Anything else will just make the spammers send more spam to try and get the hit rate they need.

  12. co-evolution by 73939133 · · Score: 3, Insightful

    During the initial testing of Greylisting, it was observed that the vast majority of spam appears to be sent from applications designed specifically for spamming. These applications appear to adopt the "fire-and-forget" methodology.

    Spam guards and spam co-evolve. Since greylisting is easy to get around by spammers, if it becomes widespread, spammers will take measures to avoid it, and the net result will be a lot of extra traffic.

    In fact, the impact of this kind of system on mail could be pretty bad if widely adopted: large amounts of mail may end up being held up in delivering servers, and "informative" messages sent by helpful mail systems (about "temporary failures") may end up creating more junk mail than they avoid.

  13. Re:Published a paper? by vidarh · · Score: 4, Insightful
    To me publishing a paper in a peer reviewed journal instead of on the web would mean that I'd expect audience to be reduced to a ridiculously small fraction of people that might be interested. If I wanted to publish something I'd do it on the web first, and if it stacks up people I respect would start talking about it and link to it.

    Yes, I realize that for "serious" science still expect things to be published in peer reviewed journals, but in most cases I can't help but think that getting the article out there would be more useful. Sure, peer review is important, and somewhere to look for some kind of verification of the value of a paper is useful. But I much prefer the Research Index way, where I can get a good indication of the value of a paper by looking at how many people have cited a paper and WHO have cited a paper.

    Anyway, pretending that putting up a document on a website is somehow less publishing a paper than having it printed in a journal, is just plain elitist. You should propably be a bit more critical to papers that are published that you don't know have been through a proper review, especially if you're not a domain expert yourself, but being aware of the source is something that you always need to be.

  14. SpamAssassin by ajs · · Score: 3, Insightful

    The comments in this paper about other systems ignore one of the oldest and largest SPAM filters: SpamAssassin.

    SpamAssassin can also be used at the MTA-level, and while this tool might be an interesting test to integrate with SA, its claims that other systems cannot feed back to the sender that their mail has been blocked is flat-out wrong.

    Most people do not do this because you are almost certainly getting this mail through a relay, and that relay is going to get the SMTP temporary error and try to send a warning to the user who sent it. Spammers regularly slam my home mail server by using my address as the "From" in an entire batch of spam. It's pretty seriously annoying to get that deluge of junk, and it's not really necessary. If your spam system just identifies spam and lets the user (or sysadmin) decide how to deal with it based on how "spamish" it is, you get a much more reasonable behavior.

    I junk thousands of pieces of spam every week, and I *never* junk valid mail. Yes, I do have some spam in my inbox. Most of it is tagged as potential spam, and I delete that after cursory inspection of the from addresses. Some of it is missed, and the overhead that I suffer having to identify that myself is amazingly low compared to not being able to read my mail prior to SA.

    Check out SA. The latest version is pretty impressive, and if this "new" technique (I don't think the idea of tracking connection quality is very new, it's certainly done in SA to some extent) turns out to be useful... well SA works on much the same principal as Perl: There's More Than One Way To Do It. Bayes, Blacklists, Whitelists, Obfuscation detection, Checksum trackers, you name it, SA uses it. None of these techniques gets to say "this is spam", they all just get to poke a message in the direction of being spam or non-spam. This leads to something far more reliable than any one techniqe.

  15. Re:Bayesian Filtering by anti$pam · · Score: 4, Insightful

    The key is to make spammers not make money!

    If people start adopting anti-spam technologies we would reduce the return spammers get from sending spam. Reduce this enough and the spamming business will no longer be profitable.

    POPFile is great. I've also used SAProxy (http://saproxy.bloomba.com/) under windows and it works great too.

    Again, the idea is not to eliminate all spam, but to reduce the return rate, and therefore the money made by spammers.