The Growing Field Guide To Spam Techniques
Aneusomy writes "From Activestate: 'Compiled by Dr. John Graham-Cumming, a leading anti-spam researcher and member of the ActiveState Anti-Spam Task Force, the ActiveState Field Guide to Spam is a selection of the tricks spammers use to hide their messages from filters, providing examples taken from real-world spam messages.' The hope is that Activestate and others can contribute to continually expand this guide, so that anti-spam filters improve."
I use Thunderbird, and found it to be a good system.
Before I used PopFile but he blocked some good mails. That was reason enough to drop it..
Try Ctrl-+ in Mozilla or Mozilla FireBird.
One purpose of hiding text is to fool anti spam filters.
Let's say that everything between '[/]' is visually hidden. I can send you the message:
Fre[dom for th]e pen[ and th]is enl[ist l]argement.
The 'filter' will see:
Fredom for the pen and this enlist largement.
The user will see:
Free penis enlargement.
Cheers,
--fred
I think the purpose is to vary the hidden text to fool anti-spam systems which rely on blocking mail based on signatures of the message body.
If you send 150,000 messages which say "Free Porn Here" systems such as Britemail are going to quickly generate one signature for the mail and block most of it. If however you have the following example (using the fictional HTML HIDE tag)
Free [HIDE] from your meeting at 10:30 [/HIDE] porn [HIDE} cate suggested meeting for coffee [/HIDE] here [HIDE] I will be in work late today [/HIDE}
The message is still displayed in the browser as "Free porn here". However, filters such as those used by Mac Mail and Mozilla may not pick it up as junk because the hidden words look like real email. If you change the hidden sentences every 100 emails then the signature based spam blocking systems won't pick it up as every signature is different and (in this example) you are using real words.
One of the best solutions to this I have seen is KMail, this displays HTML mail as text and you can click a button to then render as HTML. This doesn't stop the spam, but does give you the abaility not to see many images you rather wouldn't at 10am on a Monday morning and allows you to stop web bugs (HTML code in images which can be used to indicate successful message delivery).
Hormel Foods has this to say on the subject
"We do not object to use of this slang term to describe UCE (unsolicited commercial email), although we do object to the use of our product image in association with that term. Also, if the term is to be used, it should be used in all lower-case letters to distinguish it from our trademark SPAM, which should be used with all uppercase letters."
so....
"SPAM" is Pork and Ham
"spam" is unsolicited email
"SPAM SPAM SPAM SPAM
SPAM SPAM SPAM SPAM
Lovely SPAM, wonderful SPAM!"
is a Monty Python song
I think you misunderstood my point. I do receive valid e-mail as HTML-only on occasion. That mail has however _never_ had any content that couldn't be presented as clearly and easily in plain text, which is what I was getting at.
This amounts to little more than an annoyance in itself, but means that I can't filter mail by throwing away everything of type text/html. If it comes from a commercial company (while still being valid) they are less likely to see my money again.
This post is free (as in cheese in a mousetrap).
So a fat lot of good all those HTML tricks do you, eh spammers? (Are spammers stupid? Yes! It's Rule #3.)
One line blog. I hear that they're called Twitters now.
This will all be blindingly obvious to most readers of /., but just for the record:
Don't use your personal email address for anything online. Don't post to usenet with it, don't use it to register for anything, don't ever use it where there's any chance of it being sold to a third party or picked up by a web crawler. Use a free throwaway web-based account like hotmail or yahoo, that's what they're for. I have a verizon.net primary email address, and I've never received a single piece of spam from it.
However, I still have a forward-only email address from my university circa 1992. Back then, there was no spam and that address has to be on every spammer's list on the planet. I still get a legitimate email every year or two, but spam outnumbers these by at least 10,000 to 1. SpamAssassin does a surprisingly good job of identifying the garbage.
I also use a proxy to surf the web, as well as a large hosts file that reroutes requests to adservers to 127.0.0.1:80, combined with a utility that returns a transparent 1x1 gif to any request on port 80. And of course I use mozilla to block pop-ups and whatnot. I'm so used to surfing in this way that I always recoil in horror when I have to use IE on a naked, unprotected box. How on earth can anyone stand it?
As for more traditional types of spam such as telemarketers, there's the national do not call list. It's free, so there's nothing to lose. You'll also want to check out the many excellent resources at the Junkbusters website. One of the most useful features is a Junkbusters Declare page, which builds custom form letters for you that you can use to opt out of Direct Marketing Association junkmail, as well as telling your financial institutions, etc., not to sell your name to third parties. I used it, it's painless, and my privacy is protected.
Of course, it would be much better if we didn't have to jump through hoop after hoop just to get through the day without being pestered by morons.
My Bayesian filter analyzes the message in raw text, including any HTML tags. A handful of HTML "enhanced" spams might make it through the first few times until I classify the new messages as junk. Once that happens the filter learns that random HTML tags increase the chances of it being spam and it's off to the junk pile.
The main problem that OCR would solve is when the text is contained in an image file, but it really wouldn't solve it. OCR would break down for the same reasons that the new wave of "a word appears in distorted text in this image, type that word below to proceed" filters that some sites are beginning to use: picking text out of an image file can be a very tricky problem if that image wasn't made for readability (as most web graphics aren't). Rather, I'd argue that the very presence of one big image & no supporting text is a strong spam indicator, and you can go with that assumption without having to bring in the heavy OCR machinery (which might or might not be right anyway).
I've been thinking that, if the idea is for spam filters to work on what the human sees, then the natural tool to use would be the standard html renderer that already is fine tuned for turning html (even wacky html) into rendered text. Rather than OCR, find a way to hook Gecko or KHTML into SpamAssassin and take it from there.
The problem with this though is the same as the OCR problem, though I'm guessing not as extreme: embedding a full featured html engine inside of a network level spam filter is a massive amount of overhead to add to a process that needs to be able to handle massive realtime throughput.
A more clever approach is to skip it and say that HTML itself is a spam indicator, if not an absolute one. But then there's a fine line to be found in determining which HTML mails are kosher & which aren't without resorting to a very heavy & still imperfect solution like Gecko or OCR. If it's all an image, trash it, but anything in between is going to take some strategy (and anything in between shouldn't need OCR).
DO NOT LEAVE IT IS NOT REAL
Seems like there are two likely reasons. First, they get paid to deliver emails, so removing a name from their list reduces the number of emails they send and the number of dollars they get paid. Second, they get paid for click-throughs, and a certain fraction of recipients-completely independent of whether they're interested in the "product" or not--will click a link, if only by accident. Dropping names from their list reduces the number of these unintentional click-throughs and takes dollars out of the email marketer's pocket.
You miss the point.
Yes, it assesses the email on the basis of "15 bad words", but it also assesses on the "15 good words" or words that indicate it's legitimate.
Chances are they have only one or two of the "bad" words (penis, viagra, v*i*a*g*r*a, etc...). Perhaps less once they munge it so that things are broken up into pieces. The HTML tricks are all designed so that the filter doesn't realize that you have one of the "bad" words split up into sections.
The insertion of "good" text is designed to try to trip 2-3 "nonspam" indicators, thus causing the filter to pass the mail as "good".
The insertion of the "good" text also serves, if you use a bayesian filter, to "poison" your filter so that legitimate mail using those same words has a tendency to get tagged as spam.
It's a three-pronged attack:
#1 -- munge out the bad words
#2 -- drop in "innocent" text to make it look legit
#3 -- send in such volume that the "innocent" text gets poisoned in the filter and starts causing false positives.
What they're really after, of course, is number 3; if they can cause enough false positives, people will turn off the filters again. That's why they think nothing of sending the same spam 500 times to the same person in three days: when they are using a technique like this, every spam that gets filtered and tagged as spam furthers goal #3.
I still say the best way to deal with spammers is with a good old non-technical solution: a two-by-four upside the head.
the Spammers MAY make money by selling an occassionaly Penis Enlarger, but they REALLY make money by selling LISTS!
Lists of VALID email addresses.
These lists are SOLD to people trying to actually sell things. "Clean" lists with valid email addresses.
The people who BUY these lists or services want as FEW bounces as possible.
This is one reason why I get gobs of these new spams that are really nothing more than spam filter tests. They are trying to figure out what gets through and also trying to poison the filters so they can claim a higher percentage throughput for the stuff they REALLY want to deliver.
The problem is not people who actually BUY the stuff advertised, the problem is the people who buy the LISTS of email addresses or the services of a spammer thinking that they are using some sort of valid "Direct Marketing" service.
I built a small website for a client and after it was up, he wanted me to find him a way to advertise the site VIA EMAIL! He wanted me to go find a spammer, and PAY the spammer to send his ad to millions of valid email addresses.
He saw absolutely nothing wrong with this. He thought it was no different than buying a snail mail mailing list and sending out thousands of flyers....but Cheaper!
and he was not selling Penis Enlargers, he was selling Printing Services!
It's the Buying of the LISTS and spam SERVICES that's the big problem! Not the people who are actually buying the stuff in the spam.
It's like the Gold Rush, there may be no Gold anymore, but that won't stop people from heading out west to try, and when they get there, they find some nice vendors who are more than happy to sell them all the tools they need to pan for gold. Whether they find gold is immaterial, there's a steady flow of customers buying supplies. The ones that give up, just go away. Plenty more suckers lined up outside to buy pans, picks and shovels.
If SpamAssassin did nothing but content analysis, that might work. But, SpamAssassin (by default) also checks several real-time blacklists and uses Bayesian filtering.
I've found that it's the combination of all of these factors that identifies almost every spam. I've had only two or three spams slip through in the 3-4 months since I installed SpamAssassin, with no false positives.
You might want to google for "spam" + "DHVP", "DMP", "RMX", "DRIP" or "SPF"
The closest would probably be DHVP.
DHVP checks that the HELO from the sender either has a special "This is valid" record in DNS,
or that an MX record for the HELO string matches the IP address,
or some superset of the HELO's fully qualified domain name has an MX that matches the IP address.
We don't do this because it has a high false positive rate.
Even if you personally would accept 5% of your email being discarded as "non-conforming",
an ISP can't accept that high a false postive rate and stay in business.
-- this is not a
I also use IP blacklists (locally compiled and various RBLs) but this is becoming less effective as the spam gangs are moving to using their own army of proxies rather than the traditional exploitation of open relays or throw-away accounts. I'm not saying that ISPs shouldn't be responsible for what emanates from their networks, but these trojaned users are a very different kettle of fish than spammers having "pink contracts" with spam-friendly ISPs.
My next sig will be ready soon, but subscribers can beat the rush
+4, insightful?
I beg to differ!
While this system is not perfect and, yes it may cause some headaches for most, having sendmail match the MX record to the IP of the sendind server would eliminate almost 100% of all the SPAM that I have encountered in the last 3 months.
You're right, this system is not perfect, and would cause a *lot* of headaches for almost all users (or at least, us admins).
Firstly, it creates a lot of technical headaches..
The way I see it, the only way I could send email under your proposed system would be through a relay whose IP address was the same as the server listed in the domain's MX record, right?
So, in order to send email from myaddress@somedomain.com, my MTA has to have the same IP address as somedomain.com's mail exchanger?
Not. Gonna. Work.
I send mail from several different physical locations (home, work, etc), as several different addresses/domains. This means in order to send email as my home address while I'm at work, I'd have to send through my home ISP's mail relay. Which I can't do, because I'm not on their network (and they don't have an open relay, to prevent *spam*).
I also send email as being from a couple of domains I own, but I send this email thru whatever system I happen to be on (ISP or work, whatever), as my domain just points at things, rather than running a full-time MTA just to deliver my email..
Not to mention the fact that most ISPs I can think of would have more than one server in charge of mail, and it would be possible, if not likely, that the outgoing mail relay is a different machine than the one that accepts incoming mail (ie, the one in the MX record).
But let's just assume, for argument's sake, that everything was working as you outline. Everyone sends mail thru a relay whose IP corresponds to the domain they're sending from.
All I need to do to send spam is get an account at an ISP, let's say I get username foo at ISP isp.com. Now I dial up, and send a big bunch of spam, from false.address@isp.com. So your domain/mx/ip check works ok, but it's still a false address. Sure, my IP address will be in the headers, but how different is that from the current situation?
Next you'll be suggesting that to combat terrorism, before getting on a plane passengers should have to pass a 1/2 hour series of tests with questions like 'are you a terrorist?' and 'Is this flight for: a) business; b) pleasure; or c) terrorism?'
Not going to make it any harder for the terrorists (except the really dumb ones), but a big pain in the ass for Joe Citizen.
(sorry, in a bit of a ranting mood)