A Timeline Of Spam And Antispam

The war on spam/drugs by Anonymous Coward · 2003-04-27 14:09 · Score: 1, Informative

Not that we should not pursue anti-spam countermeasures but spam will never clearly fully go away. Its like warez, its like mp3's, its like drugs, its like this, that and everything. You can try but you'll never really get a hold on it. Minimise it as much and as conveniently as you can, but as soon as you start spending ages trying to outlaw it you will find you've wasted more time than it would have taken to delete the spam and move on with your life.

We owe a lot to anti-spam fighters by bigberk · 2003-04-27 14:10 · Score: 5, Informative

Anti-spam activists go to a lot of trouble to help locate and identify people and groups responsible for flooding the net with spam (or who provide spamware to misinformed laypeople). These same good-doers are often sought out by spammers, sued by groups of them, have their privacy invaded (release of home phone, address) in effort to scare them into shutting up.

I am not kidding here. Take a look at some of the projects that scare the hell out of professional spammers:

spamhaus keeps an exhaustive list of major spam operations.

SPEWS lists areas of the Internet that have frequently be used for spamming, including detailed evidence files and histories of ISPs that turn a blind eye to spam.

Spamware vendor list has a listing of sites that sell spamming software -- without which we would have little or no spam.

How I've Cut Down My Spam by MBCook · 2003-04-27 14:19 · Score: 4, Informative

My e-mail address is plastered all over the internet, and I don't feel like changing it. I have been getting more and more spam, but I've got it pretty much under controll. For the record I get 20-30 e-mails a day, only 0-2 of which is ham. Here is my little anti-spam journey.

First I ignored it. This worked for a while, but my paitence didn't grow nearly as fast as the spam volume (I've been on the net for years, so I remember when spam was a rare occurace). These are only the major things. I've tried others here and there.

Next I started using MS Outlook's built in spam catcher. This is basically a blacklist that you maintain that you can easily add things too. This actually worked somewhat well, but as the use of forged addresses (and just plain random ones) grew, this became less effective.

Next I started to use SpamNet. I used this up untill about last week. This used to be somewhat effective, and in the last month or so has been almost completely effective. This is the most wonderfull anti-spam device I've used. It was great near the end of the beta. But now it's out of beta and I'm not going to pay $5 a month to stop something I shouldn't get in the first place. Sorry Cloudmark.

When Spamnet started, it was pretty effective, but still left a decent amount to be desired. So I searched around and found SAProxy. This program let's you run Spamassassin on Windows, and the combination of this and Spamnet worked wonders. As Spamnet got better, this became more or less useless.

Unfortunatly, I had to get rid of Spamnet, due to the afformentioned monthly fee. So now all I have is SAProxy. It does work great, and it does get better with each new release. Now only about 3 messages a day get through, which is quite fantastic. Only 5% or so of the spam I get gets though. I could set the limit lower (to catch more spam) but right now I don't have to worry about it catching ham (it never has for me) and I don't want to have to start wading through my spam folder to check for ham. I thought I was using this stuff to not have to do that in the first place?

So in short, I'm now using SAProxy and quite happy. If there was a free version of Spamnet, I'd use it, but there isn't. If you're on Windows and have a supported e-mail client, get SAProxy, and save yourself a huge headache.

So what will I use next? I've been thinking of setting up a perl script to automatically find the home address of people who spam me and sending them a few ICBMs with notes attached like "HOW TO WIN AT EBAY WITH FREE CHEAP ICBMS THAT INCREASE YOUR SEXLIFE AND GROW HAIR."

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.

And Now... by Michael's+a+Jerk! · 2003-04-27 14:22 · Score: 5, Informative

According to This Site, The earliest spam was sent by DEC in 1978.

Einar Stefferud, a longtime net hand, reports that DEC announced a new DEC-20 machine in 1978 by sending an invite to all ARPANET addresses on the west coast, using the ARPANET directory, inviting people to receptions in California. They were chastised for breaking the ARPANET appropriate use policy, and a notice was sent out reminding others of the rule.

Interestingly, a young Richard Stallman argued that spammers had every right to send spam.

--

I'm not Seth.

Re:And Now... by jnana · 2003-04-27 14:34 · Score: 2, Informative

Interestingly, a young Richard Stallman argued [templetons.com]that spammers had every right to send spam.
But he retracted in the very next email:
Well, Geoff forwarded me a copy of the DEC message, and I eat my words. I sure would have minded it! Nobody should be allowed to send a message with a header that long, no matter what it is about.

Re:There's a Reasonable, Albeit Draconian Solution by dbitter1 · 2003-04-27 14:41 · Score: 2, Informative

After all, if they could lock away Mitnick (sp?) for over 5 years for downloading a few files, why can't they lock away a virus author or spammer for operating without a permit?

Simple.

Money.

Mitnick's foes' lawyers claimed billions of dollars (that's laywer dollars, not real dollars, of course) of damage to the people padding the politician's pockets.

When spam gets there, we could count on the jack-booted thugs raiding a place or two in the night. Unfortunately, the spammers are getting richer, and trying to make laws that favor them...

--
For us carnivores, "Sucking the marrow out of life" isn't a transcendentalist philosophy but a practical instruction.

Several problems by ajs · 2003-04-27 14:47 · Score: 2, Informative

First off, the article is WAY behind the times on anti-spam techniques. SpamAssassin's statistical techniques far outstrip the simplistic features discussed. For example, it mentions obfuscation techniques, and yet SA is known to detect almost all of them one way or another, and even when it doesn't it catches the mail because it's in Razor2, comes from a BLed site, has obviously forged bits, doesn't look like valid mail to Bayes, etc, etc, etc.

Second, the article is also a bit naive on several points regarding blacklists. Many blacklists are good and useful, many are not. But taken as a whole, they present a spectrum of data that can be interpreted through a number of classical techniques that are applied to noisy data sources. Trusting any one BL or a small list is almost always a mistake, you need to build a sample set and determine who you trust and how much. SA does this, but it would be easy enough to build a BL-only SA-like tool for high-speed analysis on high volume ISPs and pipe-providers.

I'm getting worried that the problem of spam erradication is starting to look like the most divisive problem the net has faced to date. There are an awful lot of angry people, and those pitchforks and torches are starting to point in some very "infrastructurish" directions. Articles like this one, really don't help much....

Re:Just like anti-virus... by kimgh · 2003-04-27 16:24 · Score: 2, Informative

Spam is not going to stop. It will continue despite laws and regulations which do not apply world-wide and are difficult or impossible to enforce.

Oh, I dunno. Fax SPAM was effectively stopped by law; is there any reason to believe that an effective Federal law won't work to at least reduce the volume?

Larry Lessig's proposal for a law, which is actually being introduce by my own Representative, Zoe Lofgren, may very well reduce the flow

I would like to see that law include provisions for going after companies that hire spammers, rather than just the spammers themselves. I don't believe that there is such a provision in the current proposal, but it's been a few weeks since I read it, so I might be wrong. But that might be a helpful addition, if it's not already there.

Finally, I read recently that there are only about 180 major spammers responsible for most of the spam we get. 180 people is not an impossible number to arrest, charge, and shut down. The remaining bit players will probably dry up if the major guys and gals are gone...

Re:Interesting Perspective by Anonymous Coward · 2003-04-27 17:15 · Score: 1, Informative

I agree with you completely.

However, I did see one paper on this which was submitted to the IETF ASRG which was pretty neat on relatively new methodologies to eliminate spam.

You can find it here - Eliminating Spam: Protocol and Infrastructure Changes
.

Re:HELO forging and detecting by dmeranda · 2003-04-27 21:53 · Score: 3, Informative

I too have noticed that the vast majority of spammers now seem to forge the HELO/EHLO greeting. And as most non-spammers don't, this is actually a wonderful way to catch them. I've even seen them send the IP address of my secondary mail gateway in hopes that my primary mail server would fully trust it (obtained probably by looking up my MX records). I run a mail gateway for a corporate domain an get on average 30 to 40 thousand spams per day. Using sendmail with it's milter programming interface I put the HELO greeting though a very strict check. For those contemplating doing the same...

Per RFC 2821, the HELO greeting string should be either the FQDN of the sending hostname, or the IP address of the sending system in SMTP syntax (e.g., [1.2.3.4] or [IPV6:abcd::1234]
Most spammers don't even bother with a domain name, using a random greeting like "sqss7e". If it doesn't have a domain, throw it away. Same if you see an IP address without the [] brackets; it's another dumb spammer that can't read the RFC's.
Sometimes spammers don't even hide their spammy-sounding names in the HELO greeting even though they go to a lot of trouble to make up legitimate From headers. A good regular expression check for common words like "offers" or "optin" in the HELO greeting can work wonders (but use caution).
When checking if a spammer if forging your own address, be sure to check for ALL hostnames under your domain (say you have acme.com, then check for both "acme.com" and "*.acme.com", and use a case-insensitive comparison). Also check for ALL your possible IP address even if you don't use them all. A remote site using your own IP or hostname is never legitimate.
If you are running a gateway, you need to treat outbound versus inbound messages differently. This can usually be done by checking the connecting IP address to see if it is one of yours. Also be sure to check for 127.*.*.* and ::1 (IPv6).
Be aware that some mail clients are broken and don't send conforming HELO greeting; this includes Mozilla (see Bug 68877). So don't be too agressive with your HELO checks for mail originating from the inside of your organization.

One last note about Forged AOL Spam after talking to one of their postmasters...all their legitimate mail by corporate policy is always sent from within the *.aol.com or *.aol.net domains. This will be in both the HELO as well as a reverse DNS lookup of the connecting IP address. If you don't see this in the HELO and DNS but you see a MAIL FROM for aol.com, it's probably spam.

I wish more big ISPs would provide public information about how to better detect forged mail claiming to come from their sites. For instance if I see a MAIL FROM *@yahoo.com, then should the connecting IP address always be from a *.yahoo.com host? Some ISP's like hotmail seemingly always add in a known predictable header whose absence indicates spam. But I can't reliably make these calls unless the ISPs provide that information. Also, beware that some semi-legitimate sites, like Monster.com forge the sending address on purpose; so if you want to receive resumes you may need to whitelist them.

Re:Interesting Perspective by letxa2000 · 2003-04-28 05:54 · Score: 2, Informative

Agreed. Bayesian all by itself is not perfect. But Bayesian can do 95% of the work reliably, and a little extra filtering can take care of the rest.

I personally advocate Bayesian along with some simple keyword filters that contain mostly known spamvertised domains or spam sources. If it is kept up-to-date that helps.

It's been a few months now, and it's gotten worse. Much of my spam seems to be one-liners like "Here's that URL we were looking for: ..." Others contain mis-spellings in common spam-related words, and slip by the filters.

First, with a sufficiently large corpus the mis-spellings shouldn't slip through. The fact that they slip through means your Bayesian filter is still "learning." At some point, "VIAGRA" might be a 98% chance of spam but V1AGRA will essentially be a 100% chance of spam. The mis-spellings often make it easier to detect spam with confidence and the rest of the email should generally be enough to let Bayesian calculate a good spam percentage.

The one-liners can be caught by improving the Bayesian filter itself: Perhaps a new characteristic considered by Bayesian is "Is the message 1-line long?" or "Is the line 2-lines long?" or "Is more than 40% of the body of the message used to convey an HTTP address?" Things like this are valuable characteristics that will help Bayesian catch even 1-liners. Perhaps 90% of your 1-line messages that have an http reference in it are spam--that's something Bayesian can work with.

Marking the ones that slip through as Junk causes more problems with false-positives.

Really shouldn't.

Plus, it's fairly easy for a spammer to tweak his message against a relatively common corpus. I believe that most people would come to the same conclusions as to whether or not something was spam -- and thus an "average" corpus is trivial to create, and tweak your spam against.

While pretty much everyone will agree on what IS spam, not everyone will agree on what isn't--and that's what's great about Bayesian. Sure, they might avoid the word "Viagra" or "slut" but the headers themselves can be damning, the fact that they have 15 images being loaded off an external site is damning, and the fact that a message with a 60-character body consists of a 30-character HTTP address is also probably damning. They're not going to know I have a best-friend named Fred (which is something that will lower the spam score when it is found in my email). As Paul Graham said, if spammers have to stop using all the words (viagra, porn, slut, etc.) and techniques (images loaded from external servers) that they are using to make their pitch, they're going to be significantly limited in what they can say.

If it gets to a point where they totally mangle their emails with SMS-like substitutions to convey their message, you can also add new characteristics for Bayesian: "Are more than 40% of the tokens unknown?" "Are more than 50% of the tokens unknown?" You can assume that if you have a halfway-decent corpus and more than X% of the tokens in an incoming message are unknown, that may be a good indication of a spammer trying to use mangled words to get their message accross.

Sure, Bayesian as proposed by Paul might not be the final solution. But the countermeasures that spammers use will end up being such that the simple use of those countermeasures will probably be something which can be considered a characteristic of the message which will be further used in identifying it as spam.

In my opinion, the trick will be keeping Bayesian "up to date" in terms of identifying new characteristics that can be used to identify spam. For now, tokens in the message are sufficient.

Slashdot Mirror

A Timeline Of Spam And Antispam

11 of 161 comments (clear)