The Next Step in Fighting Spam: Greylisting
Evan Harris writes "I've just published a paper on a new and unique spam blocking method called "Greylisting". The best thing about it other than achieving better than 97% effectiveness in blocking spam, is that it practically eliminates the main problem of other solutions: the false-positive. There's even source code for an example implementation written as a perl filter for sendmail, along with instructions for installing, so you can get up and running quickly."
I'm going to try to say this as nicely as possible and without trolling:
You have just rendered Greylisting pretty useless by making it open source. Spammers are much smarter than you think and what you have basically done is shown them what they need to do in order to get around Greylisting. That's just my take on the issue, maybe I'm wrong but I doubt it.
I have often regretted my speech, never my silence.
-Xenocrates
I like the idea though. Since SMTP is broken anyway, why not use another of it's features in a new way to help filter unwanted email. Keep up the good work!
most spam today is sent through open relays. Those relays will simply retry the delivery no matter which software the spammer uses, so the method won't work.
The Next Step in the Spam Control War: Greylisting
By Evan Harris
Copyright 2003, all rights reserved.
Introduction
This paper proposes a new and currently very effective method of enhancing the abilities of mail systems to limit the amount of spam that they recieve and deliver to their users. For the purposes of this paper, we will call this new method "Greylisting". The reason for choosing this name should become obvious as we progress.
Greylisting has been designed from the start to satisfy certain criteria:
1. Have minimal impact on users
2. Limit spammers ability to circumvent the blocking
3. Require minimal maintenance at both the user and administrator level
User-level spam blocking, while somewhat effective has a few key drawbacks that make its use in the continuing spam war undesirable. A few of these are:
1. It provides no notice to the senders of legitimate email that is falsely identified as spam.
2. It places most of the costs of processing the spam on the receivers side rather than the spammers side.
3. It provides no real disincentive to spammers to stop wasting our time and resources.
As a result, Greylisting is designed to be implemented at the MTA level, where we can cause the spammers the most amount of grief.
For the purposes of evaluating and testing Greylisting, an example implementation has been written of a filter that runs at the MTA (Message Transfer Agent) level. The source for this example implementation is available as a link below, and as other implementations or additional utility code become available, they will also be linked.
Greylisting has been tested on a few small scale mail hosts (less than 100 users, though with a fairly diverse set of senders from all over the world, and volumes over 10,000 email attempts a day), however it is designed to be scalable, as well as low impact to both administrators and users, and should be acceptable for use on a wide range of systems, including those of very large scale. Of course, performance issues are very dependent on implementation details.
The Greylisting method proposed in this paper is a complimentary method to other existing and yet-to-be-designed spam control systems, and is not intended as a replacement for those other methods. In fact, it is expected that spammers will eventually try to minimise the effectiveness of this method of blocking, and Greylisting is designed to limit options available to the spammer when attempting to do so.
The great thing about Greylisting is that the only methods of circumventing it will only make other spam control techniques just that much more effective (primarily DNS and other methods of blacklisting based on IP address) even after this adaptation by the spammers has occurred.
The Greylisting Method
High Level Overview
Greylisting got it's name because it is kind of a cross between black- and white-listing, with mostly automatic maintenance. A key element of the Greylisting method is this automatic maintenance.
The Greylisting method is very simple. It only looks at three pieces of information (which we will refer to as a "triplet" from now on) about any particular mail delivery attempt:
1. The IP address of the host attempting the delivery
2. The envelope sender address
3. The envelope recipient address
From this, we now have a unique triplet for identifying a mail "relationship". With this data, we simply follow a basic rule, which is:
If we have never seen this triplet before, then refuse this delivery and any others that may come within a certain period of time with a temporary failure.
Since SMTP is considered an unreliable transport, the possibility of temporary failures is built into the core spec (see RFC 821). As such, any well behaved message transfer agent (MTA) should attempt retries if given an appropriate temporary failure code for a delivery attempt (see below for discussion of issues concerning non-conforming MTA's)
First I heard of it was from Landon Noll and Mel Pleasant. It is noted in brief as one of the techniques in this plan to end spam (though their plan, which did include the triplets, is not laid out in full there.)
It is a worthwhile technique for a little while, and if spammers were rational, would be worthwhile for some time to come. But spammers are not rational, and already this technique is not as useful as would be hoped.
Do a Google Search for Tempfailing especially in ASRG to see statistics etc.
Just encode your e-mail address on web pages & don't sign up to any dubious mailing lists.
I haven't received 1 single spam in recent months from doing this!
This isn't very reassuring:
"it practically eliminates the main problem of other solutions: the false-positive."
What does 'practically eliminates' mean? If it gives false positives at all, it is just as useless as all those 'other solutions'.
*This page intentionally left pointless*
Time critical mailing will go out the window. I can see how this might make any corporate user irate. The same thing goes for challenge-response, the time delay in the business world is unacceptable.
This would be great for personal mail, but that's about it. ISPs would have the same problems with it because their business-class users most likely use the same servers as their consumer-class users.
If they can get around it by looking at the source, then something was wrong with it, waiting to be exploited. Might as well fix it.
social sciences can never use experience to verify their statemen
with all of these solutions to spam..and all of the spam now flooding mail servers...
isn't it time to change the specification (RFC) and possibly the manner in which our current system works? i haven't come up with anything yet, but surely there must be some sort of handshaking/secure type connection that could be used - - some sort of postage (free) that is encrypted into the mail, that states that it is genuine....kind of like the hologram on those windows cds...
i dunno. file this story under redundant.
We're like rats, in some experiment! -- George Costanza
I thought it was generally understood that most spam was sent by abusing open relays, thus hiding it's origin. This could be wrong. However if it's not, those figures aren't appllicable. Nor is spam going to be diverted since an open relay is generally running a regular mta and will attempt a retry. For instance, if qmail were running on an open relay and was abused by a spammer it would try again and again with an increasing delay (calculated logarithmically if memory serves) between attempts. So the mail will still get through.
When you further consider that if a spammer hits an open relay and hammers your mailserver from it and all of the "triplet's" are new, you're increasing your traffic, because all of that mail will be attempted again.
How does the effectiveness of Greylisting compare with what others are seeing with existing techniques (such as Bayesian filtering)? Is it a false positives problem, such as digests and opt-in mailing lists getting incorrectly tagged as spam?
Dr. Rick
- "It's such a fine line between clever and stupid" (Nigel Tufnel)
- Zort! (Pinky)
Doels this mean all public crypto algorithims are useless?
Keep the Classic Slashdot.
The best idea I've seen in YEARS was to have people start using a specific, original poem as their signatures. Then, the author granted license to anyone who WASN'T sending spam. Therefore, they could sue any spammer for copyright infringement if they used it, and you could train your mail filter to look for the signature. Once spamassassin took it up, it pretty much snowballed. See story here
-Looking for a job as a materials chemist or multivariat
Question about this system: if it sends a temporary unavailable message or whatever it does, does it log the original message? Where I'm going with this is what happens if a legitimate message is blocked but never resent? Most anti-spam software allows you to view the spam folder, or something equivalet, to check for false positives. How are false positives handled here?
This post cannot be rebroadcast without the express written constent of Major League Baseball.
How about in the spirit of RFC 3514 (the evil bit) we create a spam header in email. Spam will set this header so we can easily filter it out.
Where? To me, publishing a paper means your writing appeared in some peer-reviewed journal (where the "peers" are acknowledged as domain experts). What you did was put up a web page. With a donation link at the bottom.
For others looking for a solution, try POPFile. Open source, cross platform, gives me 96% accuracy.
One more thing: "practically eliminates" is not the same as "eliminates".
The article also doesn't measure the amount of spam coming through those relays. Even if there are only 10 open relays in the UK at any one time, it still might be possible for all of the spam to be coming through them.
Certainly, closing down open relays is a good thing, but lowering the percentage of open relays doesn't prove anything about the source of spam
Look at it this way: you can stop crank calls by unlisting your phone numbers. But you can't unlist the hospital, the ambulance service, the fire department, etc.
We're not all end-users. Some of us are the plumbers.
The Next Step in Fighting Spam: Death Penalty
http://use.perl.org
But what happens when you try to have an email
discussion about stopping spam, and someone in
the discussion says "Well, I filter out any
message with the words viagra or penis..."?
Does that get flagged as spam and discarded too?
Spammers are notoriously resiliant. Within a few days/weeks/nanoseconds the spammers would realize they need to retry after a delay, and they would stop with the fire-forget mentality.
I wish your plan would work but I just don't think it will.
Plus the spammers can get their viagra at wholesale cost!
I don't want to be here.
I'm wondering how I'm going to explain that to a new customer over the phone who says "I'll just email that file right now so we can go over it together".
PJRC: Electronic Projects, 8051 Microcontroller Tools
According to this article (June 12), open relays at least in the corporate environment are becoming hard to find, requiring spammers to find new ways. In 1997, 91% of mail servers tested were open; as of a year ago only 1%. ISP and home machines apparently weren't tested.
This doesn't really say what's actually being used by spammers, but it's a sign of improvement. At the least, it narrows the pool of available relays. Continuing progress will increase the spam pressure on those remaining, which will in turn make it more likely that they'll be fixed.
The article also doesn't say what spammers might use as an alternative. From something else I read recently (don't recall where), mail viruses that take over users' machines are rapidly becoming the tool of choice. There are a lot more of them than mail servers, so it makes sense for the spammers. It does put them in a more dangerous position WRT the law. IMHO (IANAL), using a virus to exploit someone's machine for profit is almost certainly illegal under existing law.
It's easier to be a result of the past, but more fun to be a cause of the future! http://www.spacefinancegroup.com/
It deals with spam at the server level. All the wonderful user-level solutions don't do jack to stop spam from being sent. Look at the numbers the spammers show for return rate, and look at how fast spam programs can go, and you'll see that the only solutions that will work are those that make it expensive to send spam. Anything else will just make the spammers send more spam to try and get the hit rate they need.
The registrar I use (jumpdomain.com) has a clever hack for despamming WHOIS contact email. Basically they change your published contact address once a week. The published address i automatically generated, looks like gibberish, and forwards to your real address. If someone wants to contact you by looking up your address by WHOIS and writing to you, it works fine. But if they add the address to a mailing list, it stops working in a week. That has eliminated almost all my WHOIS spam. Good scheme.
We should grab some of the guys who get 1000+ spams per day, point them to the physical location of the spammers, and then step back. I can guarantee you that vigilante justice is entirely appropriate here, considering we want the gov to step back from the 'net instead of entering new "secret arrests of spammers"(?) laws.
find the people who sent it, and send them a message saying "I'm Ron Jeremy... I don't think you *want* me to have another 3 inches"
I'm currently using Bogofilter (and looking into CRM114) and getting better than 99% accuracy (about 1 in 200 false negatives at the moment) and very very few false positives (maybe 2 in 5000 messages).
Of course these are MUA level filters (and yes, I know, I've already "paid" with bandwidth to download the spam) - however since the proposed "greylister" would have to be installed as the MTA at major ISPs (as the authors note) I'm not convinced that is more likely to get widespread adoption than the various sorts of adaptive client-based filtering now available, particularly as it requires a database to back the method up.
As far as I am concerned the major factor in a spam filter should be zero false positives - personally I don't mind reviewing one or two spams a week but I get really annoyed if I were to lose a real message (note the two false positives I have sent to date with bogofilter contained forwarded sales pitches along with a message).
-- Nothing unusual happened today
During the initial testing of Greylisting, it was observed that the vast majority of spam appears to be sent from applications designed specifically for spamming. These applications appear to adopt the "fire-and-forget" methodology.
Spam guards and spam co-evolve. Since greylisting is easy to get around by spammers, if it becomes widespread, spammers will take measures to avoid it, and the net result will be a lot of extra traffic.
In fact, the impact of this kind of system on mail could be pretty bad if widely adopted: large amounts of mail may end up being held up in delivering servers, and "informative" messages sent by helpful mail systems (about "temporary failures") may end up creating more junk mail than they avoid.
I get 98-98.5% accuracy with POPfile. I get about 200 mails a day, of which around 30% spam. I get about 1 false negative a day, and maybe 2 or 3 false positives a month. It's a personal solution and as such is much more attractive to me than something server-based which has to be installed by a [typically VERY uncooperative] BOFH.
I use it experimentally for general mail classification (business/personal/a variety of mailing lists etc., all in all 7 buckets) on my home machine, and it works fine in these conditions too, although the accuracy is a bit lower (around 95%).
What I do is set my MTA (well, actualy someone I'm using someone else's mail server) to forward all the mail sent to my domain to my main account. That way, whenever I sign up for anything or give away any email address I use a unique address.
Oddly enough, I've been totaly promiscuous with these throw-away email addresses, but I've only got one SPAM from a company I actualy bought stuff for. So far no one has sold any of the address, and all the websites I've posted too either arn't scanned or are protecting the address well.
OTOH, my 'main' address are spammed constantly. rrr.
autopr0n is like, down and stuff.
You know... All of these fancy 'spam-fighting' methods are just a waste of time, when all you really need is a properly configured smtp server and some good free realtime blackhole lists. After making simple changes, I went from receiving 5-6 spams a day down to 1 in a year and a half. With no complaints of 'false-positives'. There have been instances where the senders mail server was misconfigured, but after contacting them and explaining the situation they were invariably helpful. All I did was make sure that only mail sent from fqdn to valid local accounts, and such were allowed there is an ok tutorial on basic psotfix configurattion herehere.... and a great one Here.
I've seen some comments that say (paraphrasing) "For real SPAM filtering use <POPFile|Spamassassin|...>", but these missed the point (or perhaps didn't read the paper?). This method is a "first-pass" filter, getting rid of e-mails for which no redelivery attempt will be made. The second-pass filter should still be implemented for everything that gets through the first pass. From the paper:
"The Greylisting method proposed in this paper is a complimentary method to other existing and yet-to-be-designed spam control systems, and is not intended as a replacement for those other methods. In fact, it is expected that spammers will eventually try to minimise the effectiveness of this method of blocking, and Greylisting is designed to limit options available to the spammer when attempting to do so."
sig != null
Personally, I only accept messages that are 10 MB in size or larger. If you want to email me, please be sure to include a huge block of random text at the end of your email or else I'll never see it.
I don't get any spam using this approach, because the spammers don't send big messages. And if *everyone* ignored small messages, spammers would have to close up shop because they could not afford to send millions of big messages.
(This is a joke. But you could do this at the SMTP level, by automatically replying to any sender who is not on your personal whitelist with the response: "Hey you, if you're real, send me back a HUGE reply!" And the SMTP server could cheerfully delete the last 99% of the first-time oversized email you get. I should write my own anti-spam paper and get mad Slashdot cred. Nah, I'm too lazy.)
You'll notice that the US is the #1 country Top 3 are:
- The United States, with over 80,000 open relays
- Korea and Japan pretty much tied at +15,000 each
- Japan, at just under 10,000
That's more than everyone else combined!I have written an essay on ending spam. The idea is to associate a second piece of information that goes along with your e-mail address. This 'secret' can be used for anything you want, such as blocking anyone who does not get the secret right.
Or we could all abandon SMTP and move to my jabber-based email "solution".
What? Where is it? Oh, I'm still working on it. You can send and receive, but buddy lists are not implemented yet.
I quite like the idea of this greylisting, but it seems a lot of spam is nowadays being sent as DSPAM (cf DDOS) or Distributed Spam. A spammer infects a load of broadband machines with a simple trojan, and then calls upon a number of the trojans to send an email spam via that machines normal MTA (ie for most windows machines it uploads to the ISPs mail servers).
I know this is happening as some complete bastard seems to be doing this using my domain as a "From:" address (well, [random-word]@schmerg.com), meaning that I'm been getting about 30 or 40 bounce messages a day for the last 2 or 3 months now. And although the odd sending IP is repeated, mostly they're all from different IP addresses. And of course I'm getting perfectly valid looking bounce messages from perfectly reasonable companies (and only a couple of abusive replies so far).
Now the problem is that the email is being uploaded to thier (non-open-relay) ISP's mailserver that will retry properly, and anyone else looking at the IP address will see a perfectly reasonable IP (the spammer seems to gave infected a lot of AT+T customers, ComCast customers, etc.). So short of blocking spam on subject, this spam is harder to prevent in the first place.
I've semi-automated a process to report the infected machines (that provoked a bounce message) to the appropriate ISP, and seem to havign some success in getting the ISPs to contact their customers, but I think this new form of spamming using a distributed attack will be particularly hard to block.
Anyone with a great idea (or who knows more about this scheme, or the identity of the twat behind it) I'd love to hear from you...
--
T
I spent a lot of money on booze, birds and fast cars. The rest I just squandered. - George Best
Evidence has shown that email harvesters don't even *try* to parse even the simplest email obfuscation techniques (kevin at fury dot com, or changing the @ to an html entity).
These are widespread practices and yet spammers don't care about spending the effort to reach that 5% who so adamantly don't want to be reached.
Unless greymail is used by more than a quarter of *all* email readers, history has shown that the spammers just won't care.
Kevin Fox
It seems one side effect of this approach is that it records the sender and recipient of every email sent through a particular mail server.
Sound familiar to anyone?
Anything that makes spammers do extra work seems to make it more likley that they will have patterns that can be observed and then blocked before mail really goes anywhere...
"There is more worth loving than we have strength to love." - Brian Jay Stanley
The comments in this paper about other systems ignore one of the oldest and largest SPAM filters: SpamAssassin.
SpamAssassin can also be used at the MTA-level, and while this tool might be an interesting test to integrate with SA, its claims that other systems cannot feed back to the sender that their mail has been blocked is flat-out wrong.
Most people do not do this because you are almost certainly getting this mail through a relay, and that relay is going to get the SMTP temporary error and try to send a warning to the user who sent it. Spammers regularly slam my home mail server by using my address as the "From" in an entire batch of spam. It's pretty seriously annoying to get that deluge of junk, and it's not really necessary. If your spam system just identifies spam and lets the user (or sysadmin) decide how to deal with it based on how "spamish" it is, you get a much more reasonable behavior.
I junk thousands of pieces of spam every week, and I *never* junk valid mail. Yes, I do have some spam in my inbox. Most of it is tagged as potential spam, and I delete that after cursory inspection of the from addresses. Some of it is missed, and the overhead that I suffer having to identify that myself is amazingly low compared to not being able to read my mail prior to SA.
Check out SA. The latest version is pretty impressive, and if this "new" technique (I don't think the idea of tracking connection quality is very new, it's certainly done in SA to some extent) turns out to be useful... well SA works on much the same principal as Perl: There's More Than One Way To Do It. Bayes, Blacklists, Whitelists, Obfuscation detection, Checksum trackers, you name it, SA uses it. None of these techniques gets to say "this is spam", they all just get to poke a message in the direction of being spam or non-spam. This leads to something far more reliable than any one techniqe.
I once heard a story once that is probably false (you never know), but contains an interesting idea on how to end spam.
It says that Mississippi tried to outlaw "flag burning", and the law was struck down as unconstitutional. So the Mississippi legislature responded by limiting the maximum penalty for assaulting a person who is engaged in flag burning to a $25 fine.
This sounds like a fine idea on how to handle the spam problem.
Create a law that states that the maximum penalty for physically assaulting a spammer is a $100 fine. I know more than a few people who'd be willing to pony up and take a whack at them.
Though this law would probably be far more satisfying than sensible.
I really like the idea, and I think it will work wonders (if you are willing to accept your nearly-instant email now having approx. 1 hour delay).
However, here is how the spammers will adapt:
MailBlast 2.0 will send each mail TWICE. The first time to get the triplet on the greylist, the second time to actually send it. Or a little more sophisticated, it will run in one-hour blocks, and at the end of the hour, re-run the previous hours emails. No queue or other real-SMTP server functionality necessary.
- For the complete works of Shakespeare: cat
I found a copy of the final draft online: Learning Spam: Simple techniques for freely-available software. The paper covers several machine learning techniques. The particular one I'm talking about here is the information-theoretic clustering and neural network approach.
the sender level.
Like, say, putting a maximum limit on the number and size of e-mails that can be sent out a day.
Gather studies on how many people send 1 to n e-mails a day, and how many people send out e-mails of 1 byte to n bytes in size.
My guess is, it's a pretty distorted curve, with maybe a few thousand people -- of all those online -- sending out millions of e-mails a day. The maximum most "normal" people will send a day is probably 100 (and that's a large over-estimate).
social sciences can never use experience to verify their statemen
Here are my latest thoughts on winning the spam war.
:]
I've submitted it to Slashdot. They rejected it. Tell me what you think. I'd like reactive approaches to get discussed a bit more. If you do too, submit this to Slashdot
...which it couldn't, IMHO, due to the ... tenacity...of the spam community, I still think people would take issues with the proposed "1 hour delay". There are plenty of times when a company will send you an e-mail while you're on the phone with them (for example, RMA requests for damaged equipment), or perhaps you've forgotten the password to a particular news site and need to have it sent to you. Having to wait an hour for something that used to be near instantaneous is a less than ideal solution. My vote's for an overhaul of STMP. We'll be better off in the long run.
When their numbers dwindled from 50 to 8, the dwarves began to suspect Hungry.
Pardon me for being clueless, but I don't see in this concept a description of what happens when a posting from a listserv or other e-mail list goes to a new subscriber. Mail bounces back to the listserv, right?
Well, the first e-mail to a news subscriber is often the e-mail required to confirm subscription. No reply, and the subscriber is plonked.
Sounds suboptimal to me.
So, you whitelist the listserv machines... until one of them has to change IP addresses. Whoops! No umpteen bazillions of e-mail messages go no where.
I'm sure that listserv admin would find the idea suboptimal about this time.
Nice idea, but No Slack, as far as I can see.
There is nothing wrong with yr Internet. Do not attempt to adjust the picture. We are controlling the transmission - NSA
Aside from the obvious of getting the authorities to crack down on the existing illegal activities (relay hijacking, violation of TOS of ISPs, header forging, etc.) which is the only true solution, I think there are much better approaches than this "greylisting" method.
The problem with the greylist method is it still slows down mail service, and potentially more than the relay blacklist features. The objective here is that end-user/networks should not be penalized in the fight against spam. We already waste too many resources, and according to my latest mail server stats, more than 65% of our inbound mail is UCE. I'm fed up with more than half my e-mail bandwidth being crap my users didn't request so more resource allocation on a local level in the fight against spam is counterproductive!
Here's a very clever, much more practical method I cound recently.
A company is Canada has set up what it calls SORBS: Spam and Open Relay Blocking System.
What's different from their blacklist is that they maintain "honeypots" strategically located around the Internet. These are servers they specifically set up as inbound mail relays, but never for legitimate purposes. If the servers get [select] mail activity, it's assumed to not be legitimate and it flags the source as a potential spammer... it makes a lot of sense. You create a domain name, but don't promote it in any legitimate manner, and/or you seed spam lists with these e-mail addresses and then let the spammers send to your key systems around the internet and *bam*, they're identified in real time, and then added to a blacklist.
I really like this idea. Like any other system, it has the potential for abuse but the beauty is the identity of the honeypot systems is kept secret, so it's very difficult for anyone other than spammers to exploit the network.
Open-source crypto works because the secret isn't the algorithm, it's the keys. In this case, the secret is the algorithm. The entire scheme can be circumvented by someone who knows how it works.