The New Yorker On Spam
aqk notes an article in the Aug. 6th New Yorker surveying the spam problem up-to-date. The New Yorker may not be exactly the MSM, but it is pretty influential. The author got only one fact wrong that I noticed: Canter and Siegel's seminal spam was propagated through Usenet and not email. Still, it's a good look at the history of spam and the scale of the problem today. The amount of spam that "spam king" Robert Alan Soloway, indicted under the CAN-SPAM Act, is accused of sending over a period of four years is now pumped out about every 30 seconds, around the clock, around the world.
This article is a great short history on spam but no new information was presented to me here (and judging from the summary neither did it shed light on anything new to you).
I laugh at either of these hopes because the average person already deals with spam daily (my relatives began reaching out for me on ways to censor that from my younger cousins years ago) and we have a different mindset than businessmen & marketers.
The article mentions the epic article by Paul Graham entitled "A Plan for Spam." It may look long and arduous but I heavily recommend you read that. I will not forget reading that article nor will Slashdot. I think it helps more for the "mainstream media" to publish things like this for their readers.
Yes, it has code in it. Yes, it requires a bit of a priori knowledge in some places (pun intended). But, you know, a lot of times the best stuff comes from outsiders and I personally think that newspapers should develop a 'tech section' where they can throw off the mittens & grade school knowledge that need to be on in order to handle your average reader. I know many newspapers have entire sections devoted to sports--sometimes even just one particular sport if it's in season! I've seen many newspapers have 'articles/ads' for new automobiles, why not new technology? I know Popular Mechanics is
Which brings me back to an important point, you're not going to change anyone's mind. Everyone knows about it and if you think that Wallstreet businessmen are going to pick up the New Yorker & their jaw will drop when they read this article, you're sadly mistaken. If you think marketers will read this and say "My God, I need to start thinking about what I'm doing to the networks of the world," you're deluding yourself.
What we need is an article that causes people to seriously ask themselves how we can keep e-mail free and uncensored while at the same time stopping spam. When I was asked by my aunt, they were concerned for their daughter using the internet and opening a spam message to see a guy with his legs split around a phallic-looking cactus in an ad for Viagra. I showed them how to use Thunderbird instead of Outlook Express and how to turn on junk mail filter. I also pointed out how vulnerable you leave yourself to spam if you print your e-mail in plain text on the internet. They never had a problem with it again.
So while this article is informational, it does nothing practical for the reader. I realize--and I think a lot of people will agree with me--that the best way to stop spam is to stop clicking on it and show others how to do the same. The 0.001% response will dry up and spammers will drop off. Articles on how to configure yourself to spot spam would probably be the best thing mainstream media could print--sure would have helped my relatives!
My work here is dung.
I run a mail server for our 5 person business. I left at 8pm last night and got in at 6:45am this morning. During that time, 191 messages where blocked due to the content of the headers. 1,799 connection attempts where rejected due to being on rbls or part of my block list(182,910 entries). 351 pieces of spam still got through that and got caught by the filter which I went through by hand to verify that none of it was valid for users. I just finished going through everyones inboxes to make sure nothing got through. Wanna know how many valid pieces of mail for all 5 users? 17. 17 out of the potential 2,341 attempted mail deliveries within a 11 hour time span.
Just because your inbox doesn't have a lot of spam doesn't mean someone out there isn't making sure you see it that way.
No kidding. I admin a medium sized ISP. We have 8 (soon to be 9) distributed servers dedicated to email.
3 load balanced e-mail filtering appliances, at the Internet facing edge. (Basically, BSD boxes running postfix, spamassasin, clamav, policyd, DCC checks, RBL and a few other checkers and daemons I'm forgetting.) They get about 90% of our spam.
2 load balanced postfix boxes, running policyd on our outgoing mail, they will greylist any naughty customers with a zombie that have sent to much. Also, they do inbound user verification with LDAP, if spam has BCCed an invalid recipient or two, reject. Add another layer of anti-virus on the way to the customers. This catches another 8-9%. I'm guessing around 1% gets through.
1 DCC server, because we exceeded the threshold for being able to use free DCC long ago. (I'll admit it's a bit under used.)
1 MTA running exim for the hosted domains. This has spamassain, and a few other services, supplementary to everything in front of it. I'd say it gets most of the rest for those with hosted domains.
1 big bad 8x processor pop server that runs webmail and pop for the customers. It does no spam checking, because it could never handle the load, just stores what we think is not spam for the customers, around 25,000 accounts.
By comparison, we need one (1) production, not counting backups, provisioning server. It handles minor things like DHCP for 15,000 customers.
Now you have an idea on what your ISP spends its money and resources on. There is no small industry selling you solutions to fight the SPAM.
You seem to be lumping several very different techniques together, thinking it's all about content filtering.
Content filtering costs a lot of cpu, greylisting and stuttering (replying 1 byte at the time) costs our end very little.
The cited techniqes are likely to save you significant costs by discarding the obvious cases at the gateway and letting your computation heavy content filtering deal with five percent or less of the load it is handling at the moment.
All I can say is read the articles. You really don't need that kind of extra gear you are imagining.
-- That grumpy BSD guy - http://bsdly.blogspot.com/
You know, I actually do use greylisting. And a lot of other techniques, too.
They all add up, and they really do require a lot of extra hardware.
Do you have any clue what percentage of the bandwidth I pay for is going to the initial TCP packets from hosts I drop immediately? I'll give you a hint: It's a lot.
I guess... I've heard serious discussion from people at large sites of what goes into their spam filtering. I'd guess they're not morons; in many cases, I know that they are quite intelligent, and have a lot of experience, and put a lot of time into learning about this stuff.
They think it's expensive. Hell, the mere fact that there are people who are putting in full-time jobs at this proves that it's expensive.
It's not solved. As long as it's taking a measurable number of people working full-time to "solve" it, it's not solved. It'll be solved when we no longer have to spend huge chunks of bandwidth on it, no longer lose mail to it, no longer have mail delayed by it -- you do know that greylisting often delays legitimate mail, right? -- and otherwise no longer have to pay for it. Until then, it's not solved, we just have workarounds that are more tolerable than not having the workarounds.
My blog: http://www.seebs.net/log/ --- My iPhone/iPad app: http://www.seebs.net/seebsfrac/
The author's source material is a great short history of spam, too: I didn't read anything new on the early history of spam in the New Yorker because I'd already read it elsewhere. Yet the New Yorker author only obliquely referenced his source materials when he mentions Brad Templeton (EFF chairman, etc.) via a quote. If I was the editor for that article I'd have pushed for more research credit to be given.
Brad Templeton's collection of essays on spam includes: