Spam Solutions from an Expert
Mod N writes "SecurityFocus has posted a nice survey of anti-spam technologies by spam expert Neal Krawetz, in which he delves deeply into the specifics and pitfalls of the numerous proposed solutions. Krawetz makes it obvious that securing the email infrastructure is a very complex problem that many of the current (simple) solutions can't solve alone."
Excuse me, what? Where's the proof? That's quite a brave statement to be making considering i've never seen this cracked, ever.
I challenge someone to find an automated response to C/R.
I did hear of a theory where C/R was being cracked by taking the C/R image, posting to a porn session, and letting a seeing person do the work. However, i've yet to witness this in practice. Show me the automated response to C/R that exists beyond a blog theory, and i'll believe. Until them, i hardly consider it "marketing hype".
The truth is 90% of spam comes from open relays, that is SMTP servers that can be tricked (a bit like lying to a 5 year old) into accepting and sending out massive ammounts of mail. Simply blocking open relays using The Open Relay Database at http://www.ordb.org/ or other open relay checking utility will save you lots of time if you run your own mailserver. When we can bascially negate the usefulness of open relays to spammers, they will then have to rely on their own bandwidth for the most part providing they cannot comprimise other "closed" relays.
I am in full support of using the broad-powered, freedom crushing Patriot Act in apprehending and imprisoning spammers. We might as well get some good out of it.
A nice fool proof system, while a bit of a hassle, would effectly remove spam. PGP uses a white list of sorts, that only allows people to send you encrypted messages that have your public key. This in a sense could be done with email. Someone wants to send you an email, and has your email address. They send the small request to your mail server (1-2 KB in size) with their name, email address, and name of their mail server. The mail server holds this information and notifies you that a new sender is awaiting access. You then:
1. Verify the identity of the sender, okay then, and the sender is then given the return request, and is notified that they will be allowed to send emails.
2. Deny the sender, and all their emails will be bounced back.
Yes, spoofing problems still exist, but this system could be expanded, and guess what, you only recieve email from people you want to, and the mail server acts at the first point of defense.
This would require more complex and smarter mail servers, but it would make the every day user's life so much more simple.
ce n'est pas un Sig.
My free anonymous (as in they can only be traced back to a common e-mail account on my server) e-mailer uses a simple quiz to keep spammers out.
The form page records the IP address of the visitor along the with the question number they were given in a file named with the IP address. That number is never sent to the client. When they hit submit the file of their IP is opened, the question number is read in and the answer given by the user is compared to the stored answer. The file is then deleted and if the answer was correct the e-mail is sent. Otherwise it's not.
This forces my custom form to be used to be able to send the e-mails. And it's not possible to simply keep refreshing the submit page to keep sending the message.
And the challenge is in the form of old riddles and a couple new ones like "what's your favorite color?"
Things a bot would never get but that anyone who knows how to use Google can. Someone would have to program a custom bot with the answers in order to even attempt to spam. And even then since everything goes through my mail server nobody is going to sneak garbage past me for long and I know who your ISP is.
I also include a disclaimer with every e-mail. It'd be quite silly for me not to.
Ben
Work Safe Porn
Well, at the risk of sounding like a broken record, SMTP itself is the problem -- it's badly broken, security-wise, and needs to be fixed. It's going to be painful to move to a new mail standard, or to change SMTP so that it's not broken, but that's what needs to happen to stop spam. Thankfully, our friends the Russian Mafia and the ever-growing number of Windows zombie machines are making spam levels so great that, sometime soon, spam will represent such a large percentage of e-mail traffic that fixing SMTP will be necessary, not just something mail admins like myself wish for.
BTW, does anybody have a good figure on what percentage of all e-mail spam represents these days? I'm talking about *all* traffic, too, not just what ends up in peoples' Inboxes after all the filtering going on out there has done its job.
How To Get Humans To Mars
I am not recommending mailblocks, I belive there is a sourceforge project called TMDA which does the same thing. Having said that, my experience comes from using mailblocks:
...
-cr deadlock: This does not exist because when you e-mail someone in a challenge and response system, it automatically assumes they are friendly. So if they have a challenge and response system, it will make it into your inbox, because you e-mailed them first
-automated systems He is correct here. Personally I hate when friends submit my e-mail to third parties without my consent so I do not mind missing these e-mails. I have caught a few while searching my pending folder, and inform my friends I rather have them e-mail me directly.
-interpretation challenge I believe he is wrong here because of a fundamental issue. When dealing with spam filters, the onus of working out refinements is left to the spamee, to make sure they filter out all spam. If a spammer adds a new technique, they get around the filter. With challenge systems, you have a few methods waiting as backup. When a spammer finally figures out how to read your words through AI, you simply change the challenge system and they are back to square 1 in trying to figure out how to defeat. As long as you have a few methods waiting in the wings, the spammers can easily be defeated, and have huge amounts of work to do.
if you doubt this, write an AI system to defeat hotmails gifs. Now what if the next day instead of showing a word, they show you a picture of 3 fire trucks and 2 police cars and ask you how many police cars are in the picture, etc
-Nuke the moon
Was out to lunch with three colleagues today and the subject of anti-spam measures came up.
I managed to appall the one from Berkeley by suggesting that the most practical solution was probably a moderate-size bomb.
B-)
But seriously:
In an arms race, weapons eventually defeat armor. Spam will continue until two real-world things are BOTH brought to bear on spammers:
- Economics
- Muscle
If a governmental solution applying both is not forthcoming soon, I predict that there WILL be vigilantism.
In fact we're already seeing it.
For instance: Subscribing the Detroit area spammer and his lawyer to enough real-world junkmail lists to bury his bills and other US Main correspondence in several daily truckloads of catalogues and other solicitations.
Soon to come: Retaliatory information-war software directed at DDoSer / spammer zombi-net machines. (As discussed in a recent Slashdot article.)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
I don't bother getting too deep into downloading too many 'new improved!...' filters. I block entire damn countries/netblocks. Besides I don't know anyone in korea, brazil, china, nor any other one of the massive spamming countries. I configure postfix to filter out a lot and the minute I receive one spammed message, I always whois -h whois.apnic/arin/ripe/lacnic offender and block their entire range. I also have spam assassin running and I have to admit I get about maybe... maybe... 4 spams a week not kidding. Again though this is my personal machine.
block return-icmp (8) in proto tcp from 24.76.0.0/14 to any port = 25
block return-icmp (3) in proto tcp from 81.208.64.0/18 to any port = 25
block return-icmp (4) in proto tcp from 163.121.163.0/22 to any port = 25
block return-icmp (4) in proto tcp from 82.77.83.0/24 to any port = 25
block return-icmp (4) in proto tcp from 61.247.224.0/19 to any port = 25
block return-icmp (4) in proto tcp from 217.132.0.0/17 to any port = 25
block return-icmp (4) in proto tcp from 62.103.204.32/27 to any port = 25
block return-icmp (4) in proto tcp from 210.111.224.0/17 to any port = 25
block return-icmp (4) in proto tcp from 144.135.0.0/8 to any port = 25
block return-icmp (4) in proto tcp from 195.166.224.0/18 to any port = 25
block return-icmp (4) in proto tcp from 61.228.0.0/8 to any port = 25
block return-icmp (4) in proto tcp from 207.144.229.0/24 to any port = 25
block return-icmp (4) in proto tcp from 193.252.22.160/28 to any port = 25
block return-icmp (4) in proto tcp from 200.0.0.0/8 to any port = 25
block return-icmp (4) in proto tcp from 209.202.192.0/18 to any port = 25
block return-icmp (4) in proto tcp from 83.32.0.0/8 to any port = 25
block return-icmp (4) in proto tcp from 68.38.64.0/8 to any port = 25
block return-icmp (4) in proto tcp from 219.240.0.0/10 to any port = 25
block return-icmp (4) in proto tcp from 195.57.218.0/25 to any port = 25
block return-icmp (4) in proto tcp from 129.79.245.98 to any port = 25
block return-icmp (4) in proto tcp from 24.150.0.0/19 to any port = 25
block return-icmp (4) in proto tcp from 24.205.28.0/21 to any port = 25
block return-icmp (4) in proto tcp from 220.116.0.0/8 to any port = 25
block return-icmp (4) in proto tcp from 200.128.0.0/9 to any port = 25
block return-icmp (4) in proto tcp from 212.81.64.0/17 to any port = 25
block return-icmp (4) in proto tcp from 32.10.58.0/19 to any port = 25
block return-icmp (4) in proto tcp from 210.183.110.0/20 to any port = 25
block return-icmp (4) in proto tcp from 134.196.0.0/16 to any port = 25
block return-icmp (4) in proto tcp from 24.60.88.0/23 to any port = 25
block return-icmp (3) in proto tcp from 24.190.8.0/24 to any port = 25
block return-icmp (2) in proto tcp from 24.98.77.0/23 to any port = 25
block return-icmp (2) in proto tcp from 24.173.29.0/23 to any port = 25
block return-icmp (2) in proto tcp from 205.206.176.0/23 to any port = 25
block return-icmp (2) in proto tcp from 172.128.0.0/10 to any port = 25
block return-icmp (2) in proto tcp from 200.171.99.0/24 to any port = 25
block return-icmp (2) in proto tcp from 200.171.97.0/22 to any port = 25
block return-icmp (2) in proto udp from 200.171.97.0/22 to any port = 25
block return-icmp (2) in proto tcp from 68.62.80.128/25 to any port = 25
block return-icmp (2) in proto udp from 68.62.80.128/25 to any port = 25
block return-icmp (2) in proto tcp from 218.76.0.0/17 to any port = 25
block return-icmp (2) in proto udp from 218.76.0.0/17 to any port = 25
MoFscker
One "solution" which seems to be missing from this article is the "verify each stage" solution. You know, close down all open relays and implement a C-R system between the mail client and the server (password authenticaton to send?) and perhaps between servers too (a public-key challenge before transfers between servers, e-mail transferred in bulk after said challenge for speed reasons). The idea being not so much to make spam disappear, but to make all e-mail clearly and easily traceable so that no spammer would want to keep operating, and allowing any spammer who continues to operate to be tracked down.
Perhaps one of those SMTP fixes or SMTP alternatives mentioned at the end implements this idea? Anyone have more info?
As stated in the article's summary, the main problem with most spam-filter is the need for constant maintenance. We need a solution that requires ZERO maintenance by the joe-users, and yet cost-effective enough to implement.
My ISP seems to have a so-called "Watch Dog" spam filter, where they actually hire people to read spams and filter them manually, that's probably the most effective way to filter spam, but I wonder if it is cost-effective though.
Prior to this October, telemarketing calls were a national scourge. Amazingly, since we signed up for the Do-Not-Call list, we've only received 2 illegal calls. I'm rather surprised, in fact, at the relatively uniform acquiescing to this law. While spam, coming from all corners of the earth and is more anonymous, will be harder to enforce, some law with real teeth may be a good start.
Type "two dead spammers" into Google. You might even get a link back to Slashdot where it was covered. (Stock spammers likely killed by their business .. partners.)
One line blog. I hear that they're called Twitters now.
First, Microsoft should bite the bullet and by default not execute executable attachments in email. They should also not obfuscate certain file extensions such as .pif.
Second, companies selling products via spammers should be held equally guilty as the spammers themselves.
The only thing that will work in the end is some sort of distributed reputation management system. To a certain extent that is what RBLs do, except they are on or off. SpamAssassin does offer shades of grey to the RBLs (differening weights to each one).
To a certain extent this is what we already do in real life. We 'judge a book by its cover' as a first pass (for example people will often walk past a beggar in the street completely ignoring them) and then include other factors. How polite they appear, where they are from, recommendations from friends etc
All other mechanisms suffer from a determined spammer being able to get around them as the article pointed out. Any mechanism that prevents some spammers makes things more lucrative for the rest.
while his credentials certainly would put him in a far better position to know these things than i am... i find his death and doom attitude annoying... he doesn't really address the parts of anti-spam that do work.. he glosses over them, and then hypes the parts that are broken.. without any sort of proof if i were to mod the article it would probably get something like +2 informative -2 Overrated -1 Flamebait and -1 Troll
And, no, I should not have used the goddamn Preview mode first.
One proposed solution I would love to see getting more attention is SPF ("Sender Policy Framework"), which allows each domain admin to specify their email sending policy using existing infrastructure.
See the SPF site or read this month's Linux Journal to find out more.
Executive summary of SPF: Just use DNS to specify where mail from your domain may originate from. If everyone used this, we could have domain blacklists that actually work.
Do an "nslookup -type=txt psychogenic.com" to see an example entry. And if you manage any domains, please consider doing the same.
I had a chat with a Veep that was hired on to a company I used to work at. Very down to earth guy, very friendly. We got to talking about spams and semi-legitimate emailings to customers, etc.
He had one very interesting tidbit; stick with me for a sec here. Most companies outsource their semi-legit stuff because they get reported as spammers and whatnot, or it bogs down their email server/network, etc. No surprise there- however, the interesting tidbit is that the outsourcing companies turn around and outsource to Indian firms for handling the bounces. There's literally a room full of people in India, sitting there answering those challenge/responses and updating the client's customer email list(unlike spammers, it really is in their best interests to minimize failed deliveries). It sounds "expensive", but it's not, considering how few people use challenge/response systems. Further- a reasonably smart human can get familiar with all the various systems quickly(an hour or two, I'd guess, tops) and probably process close to a message every few seconds with a client program set up to do that limited functionality smoothly. Best part- if your client does several mailings, unless the recipient goes in and removes you, you're clear for future emailings.
Please help metamoderate.
What is wrong with using a combination of a hashcash type approach in conjunction with cryptographic signing to address the shortcomings of both.
Thus the following rules for the user:
If an incoming email is cryptographically signed by someone on your whitelist, accept it.
If an incoming email has made hashcash payment, accept it. The user then decides whether to accept future signed messages from the sender.
Other incoming mail is returned to sender instructing them to make hashcash payment.
Sign all outgoing messages, and also generate hashcash if you haven't previously sent to the user.
How this affects the downsides:
Mailing lists: Would generate hashcash payment for the subscription process, but regular mail messages are just cryptographically signed (i.e. independent of the number of subscribers).
Unequal taxation: May still be a concern if your machine isn't up to the task of signing the bulk of your outgoing messages.
Robot armies: Users (should) quickly notice if their machine is burning the CPU generating hashcash tokens and address the problem.
Legal robot armies: I don't see what the problem is here -- the sender is still having to pay to generate the tokens, so the economics of spam are changed.
Automated abuse: Hashcash payment is required for all initial messages, so generating countless certs doesn't help.
Usability: Crypto signing is done with self-signed certs (e.g.: PGP) so no central CA is needed.
Well, to prove identity you could cryptographically sign mails. When the recipient gets the signed mail, they do a key lookup and verify that the signed mail was signed with the correct private key.
Now, how do you handle the situation where spammers are generating thousands of keys? Well, the spammers are forced to waste some cpu time, but that's trival for them. They're also polluting key registries with their garbage - that's a big negative.
However, in terms of trustworthiness, the spammer probably hasn't gotten all his keys signed by somebody else who is of a "trusted" ranking. Even more likely, much of the signed mail you do get will either be known to you (ie, you've signed their keys) or will be known to people you know (ie, someone you know has signed somebody else's key.)
Mind you, this is no replacement for other types of filtering (ie, SpamAssassin with Bayes, etc.) but it would make whitelisting useable against spammers who forge e-mails, UNLESS the spammers know the private key of the poor slob that they're impersonating.
Oddly enough the spammers name was "Fagin", as in the Oliver Twist villain, and he was born with that name.
The big problem with mail filters, as the article mentions, is that they need to be updated when new spam technologies appear... and there's also a lot of false positives... I gave SPAMfighter a try (from www.spamfighter.com) and although it was a bit worse at finding spam (At first), I never got any false positives. The way it works is that the "filters" are actually some kind of hash that users submit whenever they block or unblock an email (it analyses the whole content I think, not just the text). So if a new type of spam technique appears, the users will just block it. And unlike many other client-side plugins, it actually works on Outlook Express.
Another one I recomment is Spambayes...but there's the problem with false positives. All the other ones I've tried are utter crap.
Best regards,
Alex Ionescu
Relsoft Technologies
I have an interesting idea to force ISPs to crack down on spamming customers...
This basically works only if the spamming ISP is from your country. Which is why blacklisting of foreign IPs is still necessary.
But for domestic ISPs who don't reign in spamming, someone should post the 800 numbers of ISPs that don't crack down on spamming. Put up a web site listing the 800 numbers of the ISPs that are top-ranked in harboring spammers. Most of them have 800 numbers.. if everyone calls these ISPs and complains, or at least takes up air time, it costs them money, and money seems to be the only thing that motivates these companies.
I could not justify my existence if I were a turkey farmer. Would I terminate myself? Undoubtably, yes.
Humans do them lal teh tiem.
... and remember the Slashdot story a few weaks ago where a computer spam filter was MORE accurate than the human testers. (Yeah, it probably was spam filter reads whole message vs. human reads only subject, but still ...)
So you cant just block someone after one mistake.
You just have to get your computer program better than the average typo occurance.
Oh
I think there are many tasks where a well trained computer program will perform even better than the average human.
I have discovered a truly remarkable proof for my post which this sig is too small to contain.
So what about this:
You start with a central certificate authority. I know, I know, bottlenecks. But you only need them to issue keys to (or sign the keys of) about 100 (or 1000?) servers. The signing authority has to be central, but the *revocation* authority does not. That's the key here.
So those servers can sign the keys of 1000 servers of their own and so on.
So my mail server tries to send your server an email. Your server checks if my key is signed by someone who is signed by someone who is signed by the CA. It also checks against its nightly downloaded revocation list. If everything is good, the mail goes through. Very little processor time, and very little bandwidth.
Suppose someone issues a key to a dishonest server? Well, enough people issue complaints and the issuer's key gets revoked. Or some automatied spamassasin type thing that auto-revokes the key after enough spams get spotted. No more spam from them, and maybe next time the admins are more careful.
This totally eliminates (i think) the threat of zombie SMTP servers on DSL and open relays.
Then the ball is in the park of the ISPs and server hosters (those with their own email keys) to keep spammers out locally. SLL login for SMTP? sure. C/R for each email sent through them? Whatever. Send anything over their open relay? Not for long.
Sounds reasonable to me. It makes it easier for the end user I think, and minimizes spam.
Any suggestions?
Muerte
This totally eliminates zombie SMTP servers on cable lines spewing spam.
I didn't see any mention of a pretty good solution that i've run across:
Every time a message hits a server from a sender that it has never met before, it sends a TEMPFAIL back instead of accepting the message. All real MTAs will try again with whatever their retry delay is set to, and usually for about 4 days. If the server gets the same message being delivered again, it accepts it and adds the sender to a whitelist where it never has to 'ask questions' of this sender again.
The reason that this would work, at least for now, is that spammers mostly use badly written MTAs (or something akin to an Expect script posing as an MTA). Their software doesn't know how to deal with a TEMPFAIL and never tries again. All real MTAs will try again within a few minutes. Good times.
From what I understand, rewritting SMTP to fix most (if not all) of the spam loopholes is no problem (Am I seriously glossing over some big details here?). The trouble is that people want a 100% effective, immediatly pluggable solution. If new email clients support both the old and new smtp protocols, and use the new one as a default, it will be just a matter of time before there's a critical mass of clients and ISPs that are using the new one.
Once this critical mass is reached, boom, everyone is required to use the new protocol, and any email that uses the old one is immediately dumped way upstream, before it can start hogging bandwidth everywhere.
I'm aware that if my idea is so great, how come it hasn't been implemented?? Feel free to pick holes....
Buses stop at a bus station
Trains stop at a train station
On my desk there's a workstation....
-
SPF to deal with forged headers
- White- & Blacklists for people we already know.
- Challenge/Response for people we don't know yet.
- Bayesian filters
- Special tokens for web sites that let you send a news item to a friend's email by attaching a brief signed personal message (that includes the date and title of the news article to prevent replay attacks) that grants a one-time pass through the filters and C/R.
These tools can be used in various combinations:During the 'transitional phase' of SPF, source addresses that lack SPF records in DNS would go through challenge/response as an alternative. The challenge email could even include URLs with FAQs about how to implement SPF, handy for forwarding to your mail administrator.
Those tokens might be treated by the Bayesian filter as just one more hint as to whether something is spam. The preprocessor might replace a validated signature with:
which might not boost the rating of the email at all, if prior spampasses from this same friend have generally ended up manually marked 'spam' by the recipient.[100% ISO 646 Compliant]
SVM, ERGO MONSTRO.
What I'd rather see is every e-mail transmitted be digitally signed.
/dev/null it.
When the e-mail client is set up, it could generate a GPG key set to use for signing the e-mail.
The recipient's computer, if verification is required, could send a standardized e-mail back to the sender's computer asking for the sender's public GPG key. If and when it arrives, check the digital signature and either deliver the e-mail or
By caching the keys, you really wouldn't even have to have a white list. Or, more accurately, the white list would be by digital signature rather than the Reply-to or From address.
This could even be implemented on the server itself and with better results.
When adding the user, create a GPG key for that user on the server.
Require authorization for each incoming e-mail that is to be relayed. Digitally sign the e-mail with that key if it sender has not already done so on the client side.
The recipient's server or the recipient's client may then request the public key. If the public key used was the server's key used on behalf of the client, then return that. Otherwise, send the request on to the client for his public key.
Of course, this could be abused, but then the e-mail addresses have to be real and could then be used for blocking.
The traffic itself should be relatively small. The data portion of the request would just identify the public key desired based on what was used on the message (sender's key maintained by the server or the sender's key maintained by the client) and the data portion of the response would contain that id and the key.
For those who use multiple e-mail clients, allowing the server to handle the key would be preferable since the multiple clients would generally use different keys.
If the cached public key for that user failed, a request for the public key would be sent in case the public key had been changed. If the new key was different, the cached public key could be expired after a set period of time (in case there were any yet to be delivered e-mails from the old key around) and the new public key added to the cache.
You'd have the benefits of challenge-response systems without the users being annoyed.
One problem with challenge response systems is with mailing lists. With this method, there would be no problem since the mailing list's server would react to requests for the public key by providing it.
This would also take care of the automated e-mail case, say when you place an order and the sender sends an e-mail telling you the order has been fulfilled.
Non-forgeable From-addresses would be nice, but the most critical emails that I send or receive are when email is broken and/or one of us is not in a position to be able to use the normal channels.
It's like phoning the phone company to report that your phone is out of order.
It's like a backup system that works perfectly as long as you don't need it.
The from-address is where the email claims to be from. It should be easily forgeable. If I am using someone else's computer to send a quick note, I should be able to send it, from me, without messing up the computer's settings.
The headers also include where the email came from, at least the last leg of the trip. The headers should be blatantly obvious when mail is delivered. Otherwise it's like the postman delivering the letter inside and keeping the envelope.
The problem with spam is not that it is unsolicited, nor that it is commercial. The problem is that there is far too much of it, and it is being sneaky about delivering it. Spam is socially unaceptable and the solution will be social not technical. For the technical side, the email client needs to distinguish between what it knows and what the email purports to be. For HTML emails, it would help to see which domains are referenced by the email. The difference between the malware running loose now and the Unix Honor Virus is that with the latter you can see what is going on. Anything that pretends to be other than what it is is up to no good. Anything that encourages this pretense (hint, hint Microsoft) is encouraging the malware. Anything that calls something secure when it has only secured part of it is encouraging the malware. A tar-paper shack with a steel security door is not secure.