Paul Graham on Fighting Spam

This is not news ... by dougmc · 2002-08-16 04:20 · Score: 5, Informative

The statistical approach is not usually the first one people try when they write spam filters. Most hackers' first instinct is to try to write software that recognizes individual properties of spam.

And he's correct. A few years ago, most spam filters did look for individual properties of spam.

BUT, now, the best spam filters out there already use statistical properties. Spamassassin does this, for example, and it works *extremely* well. Before I found Spamassassin, I had a huge procmial recipe that used it's scoring mechanism to do basically the same thing -- but of course spamassassin does it better, so I switched :)

Major geek bias there... by Kaa · 2002-08-16 04:21 · Score: 5, Funny

From the article:

Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability. And Bayes' Rule, equally unambiguous, says that an email containing both words would, in the (unlikely) absence of any other evidence, have a 99.97% chance of being a spam.

Hmm.... take an average adult geek and yes, an email mentioning sex or sexy can go to /dev/null immediately without as much as a second glance... :-)

On the other hand if you run the statistics on email of an average horny teenager, the probabilities might get a bit different.

--

Kaa
Kaa's Law: In any sufficiently large group of people most are idiots.

This approach is very easy to defeat by Bazzargh · 2002-08-16 04:23 · Score: 5, Interesting

Here's how: the spam should be written as a 'multipart/alternative' with an html version of the spam as the primary alternate. The text version contains an innocuous message intended to pass the statistical spam filter. The spam message is entirely contained as an /image/ within the html. The text of the spam becomes invisible to the reader but not to the poor schmuck who gets the email.

I'm guessing here that the inclusion of a single image tag in the html is unlikely to trigger the spam filter, and supplying a wealth of evidence that the email is 'not' spam in the unseen alternate text will let the letter through.

Re:This approach is very easy to defeat by pmz · 2002-08-16 04:51 · Score: 5, Insightful

The spam message is entirely contained as an /image/ within the html.

Thankfully, my e-mail client is set up to not render any HTML in an e-mail. I have yet to send back any information to a spammer via specially-coded image tags and am proud of it.

HTML-based e-mail is fundamentally insecure and really should be used by no one (except those who simply don't care about privacy). Go here to learn just what a spammer--or anyone who sends you an HTML-based e-mail--can learn about you with just one "click" of your mouse.

Yes, the spammer can learn what browser version you use, what OS you use, and even what city you live in (via the traceroute). An unusually savvy spammer could use this information to install spyware via known exploits in certain browsers and operating systems.

In short, HTML e-mail is damn scary knowing that so many people us it not knowing just how much information they are giving away for free!

--
Healthcare article at Kuro5hin

Comment removed by account_deleted · 2002-08-16 04:26 · Score: 5, Interesting

Comment removed based on user account deletion

Another way to stop Spam by mr.nicholas · 2002-08-16 04:26 · Score: 5, Interesting

Having had the same email address since '93, I receive close to 1000 spams per day to my personal account (which is also aliased from root/postmaster/webmaster).

I've tried everything under the planet to reduce the amount that I see in my mailbox; SpamAssassin being one of the best so far. But even that lets through quite a bit (around 10%).

So I decided to attack it from a different angle. I wrote a series of perl-scripts that I plunked into my procmail file.

The scripts work by checking the address of the sender each time a message is received. That address is looked up in a database. If it exists in the db, and it's marked as "authorized", it's just passed into my mailbox.

If it's marked as denied, /dev/null.

If it's never been seen before, an authentication message is sent to the sender asking them to reply to it to authorize themselves. If that authmessage is bounced back, a db entry is made as "denied".

If it's replied to in a normal fashion, that email is marked as "authorized" and any queued up mail from that person is pushed out.

The concept is that spam will almost never have a valid reply-to; so it will bounce and be marked as denied.

Even if the email doesn't bounce, no spammer alive will reply to it; so after 30 days, that email is marked as "denied".

Since I've set this up (for myself and my 10-year-old son who receives porn in his box (grrr!!!!)), it has worked flawlessly. The "real" email is unharmed, while the spam is stopped.

Oh, and I have a web-based control page so that users can manually add email addresses (for lists and such).

This week, for the first time in YEARS, I don't have spam in my mailbox anymore.

Hurray!

No if I can only stop those damned dictionary-based scanning of my servers, I'll be set. Thank the gods that I don't have metered service.

Re:Another way to stop Spam by LX.onesizebigger · 2002-08-16 05:23 · Score: 5, Interesting

Even if the email doesn't bounce, no spammer alive will reply to it; so after 30 days, that email is marked as "denied".

I've seen similar solutions before, and they are all nice and dandy except for one application: when communicating with businesses. What happens when you order a Widget from Acme, Inc. and Acme sends you your confirmation by e-mail? Your script bounces a question, and Acme's mail server either bounces back at you, making it look like it was spam in the first place, or simply doesn't respond at all.

The system implies that anything not sent by a human being is spam. This is not necessarily the case today. A lot of today's e-mail communications are auto-generated.

To truly combat spam, it must be fought at the source. One step closer to that would be to integrate a standardized response to the type of message you send out in mail protocols. The problem with this is that all Joe Spammer would have to do is to point his reply-to to a valid business site.

This brings us to the next point. Forged headers are easy to detect by software and have few (although it would be wrong to say no) legitimate applications. I cannot personally understand why it is not standard operation for mail servers to recognize and bounce messages with forged headers. Sure, it would increase processing load, but if done by all servers, more spam would be stopped closer to the source, meaning less spam to process for all.

Or am I pulling a thinko here? Anybody?

--
I for one welcome our new SCOviet Russian overlords to whom all our base are belong.

Misleading by RainbowSix · 2002-08-16 04:29 · Score: 5, Interesting

He isn't fighting spam, he is filtering it. There is a difference. Filtering still costs in bandwidth. Fighting it would eliminate the source and free up the gigabytes of bandwidth lost for this marketing purpose.

Filtering is fine for now, but ultimately it must be fought and defeated.

--
--------
It's OK to be social, just don't tell anyone about it.

Re:Ok, that is hot.... by Plutor · 2002-08-16 04:36 · Score: 5, Insightful

1) [...] A nice Hoo-ha to anyone who says there are no practical applications of lisp based languages. (except haskell...which personally, i think sucks! [...])

You ridicule people who dismiss the usefulness of your personal "favorite" language, and then you dismiss the usefulness of one particular language that you happen to dislike? That's a bit hypocritical.

3) [...] what happens when a few smart spammers get their hands on this analysis[?]

Paul covers this. First, he suggests that each user's filters should be personalized, so that any spammer would not be able to circumvent everyone's filters. Second, the filters would be continually learning, possibly dumping older words from the corpus in favor of newer ones. And third, even if a spammer put at the end of his spam "describe describe describe describe", this still wouldn't work; the basic premise of the filter is that the spammer HAS to tell you what he's selling, and in the process of doing that, gives himself away as a spammer.

False positives... by dillon_rinker · 2002-08-16 04:44 · Score: 5, Funny

From the article:

In the spam filtering business, false positives are your biggest worry...Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability...an email containing both words would have a 99.97% chance of being a spam.

False positives could be a HUGE problem in this case...imagine the agony if you missed this email from your wife: "I'm feeling REALLY sexy today - meet me at the motel off 12th street at noon for some lunch-hour sex!"

Bullshit! by www.sorehands.com · 2002-08-16 05:18 · Score: 5, Insightful

Another spammer lie.

Freedom of speech is not the freedom to tresspass on my computer equiptment, use my resources for me to listen to your advertising!

This is not a prohibition on your paying your moneyto spread your advertising. This is a prohibition on you spending my money to spread your advertising.

Commercial speech does have some constitutional protection, but not to the same level as non-commercial speech. But even with pure political speech, there is no requirement for me to pay for your speech.

As for hitting the delete key, at that point, you have already tied up at least 2 of my computers used my disk storage, my time, my bandwidth without paying for it.

If you want to spam, no problem, just pay me in advance.

--
Fight Spammers!

At the risk of sounding like a broken record... by Guppy06 · 2002-08-16 06:32 · Score: 5, Interesting

Senator Mary Landrieu
724 Hart Senate Office Building
Washington, DC 20510-0001

Dear Senator Landrieu:

Earlier this month the Federal Communications Commission (FCC) issued a record fine of nearly $5.4 million to Fax.com for transmitting unsolicited advertisements via fax machine (ie. "junk faxing"). Coincidentally, the Associated Press published a series of three articles covering the state of unsolicited e-mail advertising ("spam"). I'm left wondering how the FCC can come down hard on junk faxers but how spammers (arguably of a lower moral class) are allowed to continue to operate nearly unmolested.

The law Fax.com was found to be guilty of breaking is Section 227 of Title 47 of the United States Code. The relevant text follows:

Restrictions on the use of automated telephone equipment:

It shall be unlawful for any person in the United States (...) to use any to use any telephone facsimile machine, computer, or other device to send an unsolicited advertisement to a telephone facsimile machine(.)

It is my understanding that the reasoning behind this law is based on the ownership of resources. Fax machines are purchased and maintained at the owner's expense and only the owner's expense. An unsolicited advertisement sent to this fax machine amounts to nothing less the use of these expensive resources without prior consent. In effect "junk faxing" is considered theft and as such the offenders are held accountable by law.

What does this have to do with spam? In my opinion, everything.

Receiving an e-mail is by all accounts more expensive than receiving a fax. While several companies are now offering stand-alone e-mail clients, I have yet to see one of those with a lower price tag than a fax machine. But even if their price tags were the same, an e-mail station requires that the owner not only pay a monthly fee for a telephone line but also a second monthly fee for the e-mail account itself.

Of course not even an end client is enough to receive an e-mail. The e-mail account you would be paying for is maintained on a very large (and very expensive) e-mail server, complete with its dedicated (and pricey) connection to the internet. There is simply nothing comparable to an e-mail server in the faxing domain. While a bank of fax machines doesn't cost more than the price of the machines and their associated telephone lines, the price a dedicated e-mail server and the associated connections can easily resemble that of a small car.

So why is it that the FCC is given free reign to crack down on junk faxers but spammers are free to consume valuable equipment they do not own?

If you are familiar with the AP articles I mentioned earlier you will know that spam is steadily eliminating the usefulness of e-mail itself. It has been estimated that spam accounts for up to 80% of the e-mail traffic to major e-mail domains such as Hotmail and Yahoo, a problem that their respective owners are all but powerless to fix. As more and more internet resources are tied up by these advertisements, the owners of these resources have had to resort to cutting off offending service providers from the rest of the internet entirely. Customers are finding themselves unable to use the internet access they have paid for simply because another customer of that same provider is abusing theirs.

But even then the providers are powerless to drop spammers. Spammers in the recent AP articles have proudly boasted of the way they outright defraud unsuspecting internet service providers when signing up for an account. And when the provider threatens action, the spammer threatens the provider with legal action. In recent months a spammer was even successful in receiving a legal injunction against their service provider, preventing the provider from stopping the spammer from abusing their resources.

I have little problem with receiving advertisements through the U. S. Postal Service. I know that the delivery cost for every article in my mailbox has been entirely paid by the sender. And while I am not happy with the current situation with telemarketers (I must pay for local telephone service before I have the "privilege"of being contacted by telemarketers), I must grudgingly admit that the state and federal laws designed to restrict telemarketing have been mostly successful. But I am not happy about paying several thousand dollars for a computer and $20.00 a month simply to have my e-mail account flooded to capacity with advertisements for products and services I have no interest in (and preventing legitimate e-mail from reaching me in the process). I am sure that you yourself have been bombarded with advertisements for websites featuring "nasty teens" or "penis enhancement." I have noticed that your office no longer maintains an e-mail address accessible to the public.

The examples of spam I mentioned in the last paragraph bring me to another point: I have noticed on your website your stated commitment to enforcing decency laws on the internet, to protecting children from access to objectionable material on the internet. It should be obvious by now to even the most casual of internet users that the biggest offender in this area is the spammer. While a user must actively attempt to locate a website in order to find such material on the world wide web, the mere existence of an e-mail account all but guarantees that the owner will have such material delivered to them on a daily (if not hourly) basis.

In my opinion the solution to this problem is very simple: expand 227 U. S. C. 47 to prohibit unsolicited e-mail advertisements in exactly the same way it prohibits unsolicited fax advertisements. Nothing more, and certainly nothing less.

I have seen some ineffective bills drift through both houses of Congress that are written to allow unsolicited messages so long as they have an "opt-out" mechanism. Ignoring the fact that such legal loopholes would essentially negate the law entirely (can you prove that you tried to opt out?), it quite literally sickens me the way some of your fellow members of Congress feel that spam is somehow an issue dealing with the freedom of speech. The mere existence of the internet and the supposed changes it has on how business and the legal system work (even though such "changes" have been shown to be a lie) have helped to convince these poor fools that people should somehow have a right to use and abuse the property of others. Does my neighbor have the constitutional right to break my kneecap so long as they provide me with the ability to "opt out" of future kneecappings?

The United States Constitution guarantees that all citizens are free to say what they want. It does not guarantee a soapbox upon which they can say it. Just as I am not guaranteed the right to have a billboard on Interstate 10, spammers should not have the "right" to use the resources of others simply because they're there.

Expanding 227 U. S. C. 47 to include e-mail is an extremely important issue to me and I hope with your stated interests on your website that it is also an important issue to you as well. I know that you are up for re-election this November and I intend to find out how your competitors feel on the issue as well.

12 of 675 comments (clear)