Spam Catchers Block Latest Crypto-Gram
An anonymous reader writes "Bruce Schneier sent out a note about SpamAssassin and possibly other spam filters blocking his excellent Crypto-Gram newsletter. Fortunately you can get it here (early no less!)." Schneier's email reads, in part "Tomorrow I will be sending out the February CRYPTO-GRAM, as I do on the 15th of every month. In the process of creating this month's Crypto-Gram, I discovered that SpamAssassin thinks that this issue is spam, probably because of certain links and descriptions of scams in the text. I have anecdotal evidence that other spam filters block Crypto-Gram as well. ... I'd apologize for the inconvenience, but I'm not sure what I could do to make it less so -- I don't intend to alter my content to accommodate spam filters."
block that important e-mail I was waiting for on enlarging my....never mind, I have to check my e-mail now.
That's why most good spam blockers (especially OS X's Mail.app) use their filters but compare the senders to a whitelist so that your friends can send you whatever they want to. If you've been receiving CRYPTO-GRAM for a while, it should be on your whitelist, and the blocker should just let it by.
But you don't always want to get everything people send you (everybody has those people who send you things they think are funny but you just can't stand). So there should be levels of "friendship" in the whitelist, so that some senders can be considered dubious (their mail shouldn't be deleted like spam, but perhaps placed in a different "Uninteresting" folder).
Lack of eloquence does not denote lack of intelligence, though they often coincide.
SpamAssassinAssassin could look at the folder where you put your filtered mail and learn what to pull back out, and flush the rest to /dev/null.
I'm sure Paul Graham will be glad to write it in lisp.
Or, of course, we could just do what the obvious solution is: get in a P.O. Box, send out spam for herbal viagra and penis enlargement, and when you get the checks in the mail HUNT THE CUSTOMERS DOWN AND KILL THEM.
It's simple, really.
obPlug: This is why I created Trustic.
...if I put "hot teens go crazy for debt-free viagra while earning $$$ from home" in the middle of some fine Shakespeare, that will get flagged as spam.
eMerchant of Venice. Act I Scene IV, right?
Aside from the spot-on comments that people have made regarding adding a whitelist entry Crypto-Gram (an obvious candidate for whitelisting if there ever was one, given that it frequently discusses spam, scams, and probably even includes text straight out of some spams), here is my initial analysis and response to him.
Oh, first one other comment: SpamAssassin does not block content. SpamAssassin only flags probable spam. What the site or user does with that flag is their own business. Some mail administrators misuse SpamAssassin to block email, but we do not recommend blocking email. Really.
------
[...] One false positive (or a related set of false positives) is not really a statistically useful sample size. To get to a high rate of filtering, most filters do have some false positives. You can get fewer false positives with customization of one form or another (personalized Bayes training, whitelists, rules, automatic learning algorithms). Our goal (everyone's goal, I think) is to get the best ratio of false positives to false negatives. It's a difficult balance sometimes and some legitimate content has a harder time.
On to the data:
I checked your newsletter with two versions of SpamAssassin: the current stable version (2.44) and the very-soon-to-be-released development version (2.50).
A score of 5.0 is the default threshold to be flagged as spam.
In SA 2.44, your mail receives a score of 3.20 (2.40 as I received it, but I believe the score would be about 3.20 for most people). That's on the high side, but has bit to go before being flagged as spam. The score is the same with network tests (DNS blacklist tests and Razor).
In SA 2.50, your message would probably receive a score of 1.90 without network tests and 1.00 with network tests. Note that the test scores may change a bit before the final release of 2.50, but those are better scores, more what we like to see for non-spam content. They would be even lower when using Bayes (part of SA 2.50). Those lower scores are not unexpected because... well, 2.50 is better. :-)
Based on these results, it's not clear to me why yesterday's newsletter was flagged as spam. Some possibilities:
Can you give me more information about the false positive that you experienced or was reported to you?
Thanks.
Dan
------
If I find out more of interest before the thread is closed to comments, I'll try to post a follow-up to my post.
My primary mailbox is with a small, local ISP. I can't buy broadband from them, so I get my connectivity via cablemodem. I do have a mailbox in the cablemodem company domain -- that's the one I give out when I expect abuse. (I do it this way because I expect to be dealing with that ISP long after the cable vendor has either ceased to exist or has treated me badly enough that I left.)
So I want my outbound mail to appear to have come from the ISP. Setting Reply-To is usually adequate, but not always -- when a human is looking for the address, they could easily grab the wrong one. And it creates potential confusion I don't want to create. So I set my from address to name@isp.com.
I can't relay through the ISP's relays, because I'm outside of their IP range. (If they did some form of authenticated SMTP, such as SMTP-after-POP, they could let me.) And the cable vendor's mail relays won't send mail out with some other domain name on it. So I send everything out directly, no relays.
If you look at many headers, I suspect you'll find that I'm not the only one forging my From: address for legit reasons. The presence of the X-Authentication-Warning header some MTAs add correlates fairly weakly with spam. (Some details of it -- e.g. no valid reverse DNS for the sending machine's IP -- could be useful indicators.)