Spam Solutions from an Expert
Mod N writes "SecurityFocus has posted a nice survey of anti-spam technologies by spam expert Neal Krawetz, in which he delves deeply into the specifics and pitfalls of the numerous proposed solutions. Krawetz makes it obvious that securing the email infrastructure is a very complex problem that many of the current (simple) solutions can't solve alone."
Good overview, all things considered. I would like to add to one of his conclusions (from part 1):
This conclusion is correct, but why is this considered a stopping point? Mail admins-- get off your collective butts and add encryption and authentication to your mail servers! The author also forgot to mention that server side certificates are not necessary for SMTP, SMTP+AUTH addresses this quite nicely.Note that such measures are not necessary for most users. Home users that use their ISP's mail server don't have to implement any of this, since the ISP can already account for the user. Let us not forget that "most users" do not have the e-mail needs that many Slashdot readers do. For those needing roaming access and multiple addresses, use IMAPS and SMTP+SSL+AUTH.
The linked article is part 2, Part 1 is here.
Rock that crushes, Paper & Scissors that don't matter.
The year 2000 called, they miss your opinions.
In other words, your data is so out of date as to be positively misleading.
Open relays are dead. Open proxies are so 2003.
All the cool kids are using virus distributed trojans these days, some of 'em proxies, some dedicated spamware.
Thats what alot of theyse bayesian analyzer attempt to do. They statistically learn your patterns by what emails you like and what you dont like, and then try to "intelligently" discard the bad ones for you. I mean obviously the worry exists (mostly for companies) that good email may get stopped, but in my experience its very uncommon, aslong as the user has taught the spam bot/blocker properly.
video game, ecchi, bbs and classic computing fans unite to eat sushi
- I run SpamAssassin and ClamAV on my server and check all inbound mail against a series of RBL lists; and
- All mail POP'd into my Outlook (yeah, I gotta use it - no flames!) gets checked using the free-and-excellent SpamBayes.
Works in the bakcground with damn-near zero false positives, and doesn't require Microsoft-pushed e-mail postage, changes in the e-mail RFCs or anything else.The tools are out there. If you use them, spam isn't nearly as much of an issue as the press makes it out to be.
*Well not everyone in the Real World anyway -- here on /. we all run our own boxes, right?
"It was a summer's tale: Just a boy, his Linux, and a head full of dreams..."
There are plenty of tasks that you can do that computers find nearly impossible. Facial recognition is a good one. Humans do it easily all the time. Computers are trying, but still screw it up badly. Musical recognition is another one. A human can easily pick out individual instruments in a peice, and can tell that the song is the same even if it is a complete different orchestration and mix (like a remix for example). Computers are confounded by this, even when they break something into component sine waves. Pragmatic language interpreatation is my favourite. Even when people speak non literally and indirectly, you still have no trouble with their meaning. You can also tell which level of meaning they want, and successfully decode the other levels if asked. Computers are lucky if they can get the literal direct meaning out of a sentence, never mind anything else.
So, just because a human can do it, doesn't mean a computer can. I don't know about any of these image schemes, I've never played with it. However if you make it sufficiently hard for it to recognise characters form background, and one character form another, it's screwed. Computers have trouble with fuzzy and incomplete information that humans are so good with.
Also remember it needs to be feasable to do in a reasonable time. Maybe you develop some whiz-bang image recog program that can take amazingly distorted text and figure it out. If it takes 5 minutes to process a box, it does you no good anyways, too much time to be worth it for this use.
A related note- the current Microsoft anti-spam solution, Email Caller ID is currently being boycotted.
æeee!
You are so right, I use a few on all my servers and they work, cbl.abuseat.org works wonders at cutting down on the trojan spam.
I've also setup my own private RBL, any spam that makes it thru the public ones has the IP it originated from added with no hope of ever getting off it either since there is no contact info sent so spammers have no clue where the RBL is housed.
Just this morning I was forwarded the dynamic ranged from Shaw Cable here in Canada, we were getting hammered by the infected fools there and I complained to them to at least close port 25, instead they sent me the ranges I can safely block, sweet, now to work on Telus.
Obviously, not what you were talking about: it was fraud more than spam, and the spammer didn't suffer, but... that's certainly violence resulting from spam affliction. (Also, note from this article: According to State Department figures (PDF), 25 murders or disappearances of Americans abroad have been directly linked to 419 fraud.)
There are no trails. There are no trees out here.
What happens when someone on your whitelist opens an attachment that automatically sends email from their account, signing it? Now you have a spam that has been legitamately sent from your friend's account.
I created a C/R anti-spam system myself, but gave up on it and turned to Spambayes for two main reasons:
1.) I was losing challenges in others' spam filters
2.) I would still get emails from whitelisted folks when they were infected with an email worm.
If you're interested, I blogged about my switch from C/R to Bayesian filtering here.
I could not justify my existence if I were a turkey farmer. Would I terminate myself? Undoubtably, yes.
Here ya go, this will help you keep out Shaw's residential customers ...
8 .0.0/160 /13
24.64.0.0/13
24.76.0.0/14
24.80.0.0/13
24.10
24.109.0.0/18
24.109.64.0/19
68.144.0.
Those ranges are safe to block, they have other ranges for the static business clients.
Of course another simply step the ISP can take is to block outgoing SMTP entirely for those ranges except to their own mail servers.
I wouldn't block anybody after their first mistake. However, there comes a point where too many mistakes indicate either a robotic attempt that isn't learning from its errors, or a really stupid human who likely can't compose a useful e-mail either.
Many spammers who are trying to beat a Bayes filter are either using misspellings of their most spammy words, or large lists of random dictionary words to try to lower their score. However, a coutermeasure to that would be to factor in the results of a spell check and grammar check. Some errors can be tolerated, however having too many mispellings and too many word groups that can't possibly be a proper sentance should raise the score enough to counteract the attempts to lower it and then some.
I challenge someone to find an automated response to C/R.
Students at Berkeley have already beaten the C/R system setup by Yahoo! and with a selection of 191 different version of text obfuscation they were able to return a 92% success rate. In much more detailed images, with random background textures and overlaying text they were only able to achieve a 33% success rate but I am sure with time they would be able to do better.
In a paper published by Greg Mori and Jitendra Malik they explain the methods used to defeat the system. For the full write up you can visit their site on Breaking a Visual CAPTCHA
We don't have that many clients using our mail server, but one noticed one day that mail to him to friends was bouncing. He reported this and we discovered that we were on SpamCop's RBL list.
I did a quick audit of the mail server, fearing we'd been highjacked, but found no evidence anywhere of spam going out.
Being generally sympathetic to RBLs I was eagre to get to the bottom of this, and cooporate with whatever needed to be done to prove our innocence.
But i found the SpamCop web site to be extremely frustrating to find any information. I found some references stating that to refute being listed you must reply to the email that SpamCop sent you: I searched and searched but we recieved no mail from spamcop.
As I spent a precious day trying to figure out what to do, as mysteriously as we'd been listed, our IP disappeared from spamcop's list.
To this day I don't know what happened; but have a somewhat more bitter taste in my mouth regarding the arbitrary power of RBLs.
(Though I still tend to more blame the system which blindly obeys a single RBL: I think SpamAssassin is more democratic in that it only assigns a probability, and an IP has to be on multiple block lists before it goes over a threshold. This gives spammers more lead time before they are blocked, but also prevents any single RBL from weilding absolute power... a sort of check-and-balance.)
* Mailing lists. [...] if there is a way for legitimate mailing lists to bypass the challenge, then spammers can equally bypass the challenge.
Hashcash is generated for the mailing-list address. The recipient would add the mailing-list to their list of addresses they accept mail as, and a spammer can not send to the list without including hashcash. So the limitation for mailing-lists is that the spammer can send mail to many people (the list subscribers) for the cost of one stamp; if he sends directly he has to send one stamp for each recipient.
* Robot armies [of 0wned machines].
Clearly someone wit lots of owned systems can send lots of mail; but still less mail than they could without hashcash.
* Legal robot armies. [...] Large spam groups can afford purchasing hundreds of systems for distributing an computational cost.
They can do this (and doesn't matter with it's legal or not btw, they'll do it anyway), but it will cost them more per mail which will cost them, so they will send less mails and be economically incentivized to target their mails by buying demographic data etc. (eg. so you would be less likely to receive spams in languages you can't read, or on topics you are not interested in).
Another aspect is that legitimate users do not send mails to lots of new recipients; most email exchanges are conversations over a period of time with sends and receives. Some of the hashcash based systems use hashcash only for introductions, and exempt recipients from hashcash after that based on crypto tokens (or just whitelists) (eg CAMRAM, TMDA do this).
The argument here is that hashcash can be set to higher cost as it is only borne once per new recipient for normal users.