Slashdot Mirror


The Growing Field Guide To Spam Techniques

Aneusomy writes "From Activestate: 'Compiled by Dr. John Graham-Cumming, a leading anti-spam researcher and member of the ActiveState Anti-Spam Task Force, the ActiveState Field Guide to Spam is a selection of the tricks spammers use to hide their messages from filters, providing examples taken from real-world spam messages.' The hope is that Activestate and others can contribute to continually expand this guide, so that anti-spam filters improve."

21 of 321 comments (clear)

  1. Does making this public help spammers? by Anonymous Coward · · Score: 4, Insightful

    Just a thought, but....

    Making it public, the methods used to intercept and filter spam will always mean spammers are one step ahead. If they know the strategy behind those stopping them, then that only helps them.

    Is there a better way?

    1. Re:Does making this public help spammers? by GigsVT · · Score: 3, Insightful

      This is an interesting question, it's similar to the security vulnerability full disclosure arguments, but with a couple differences, a spammer that is using a technique is broadcasting how to do it to nearly everyone anyway.

      It's also different from security in that the spammer has no motivation to keep the method secret, it's worthless unless it is used to send spam. Contrast that with the security disclosure problem, in that there is a large motivation to keep a vulnerability secret and use it covertly on specific targets.

      I'm leaning toward the idea that this really won't help spammers much, but with the caveat that it really doesn't help spam filter writers much either, since looking at the spams you get would make it obvious what techniques were being used anyway.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    2. Re:Does making this public help spammers? by Anonymous Coward · · Score: 1, Insightful

      Bayesian filtering is currently considered by many the best spam filtering mechanism. Since the detailed data set is different for everybody, and it learns from spam and non-spam messages, the only way a spammer could avoid Bayesian filters would be to either customize spam for each recipient (not practical) or make spam messages look a lot like normal messages (making them much less intrusive, but also impossible to filter through any mechanism other than a whitelist). See Paul Graham's spam pages for further info.

      Security through obscurity would be pointless. Unless you are using a spam filter you wrote yourself and aren't going to give anyone else, it won't help.

      Even if you would offer a filtering service without giving the filtering program to anyone (to prevent reverse-engineering), spammers could always use the service as an oracle to figure out ways around it through trial-and-error.

    3. Re:Does making this public help spammers? by bhanafee · · Score: 2, Insightful

      How do you know there have been no false positives? Are you reading your spam?

  2. Re:Dirty Little Secret by Surak · · Score: 0, Insightful

    ...and...? Linux is widely available, reliable, robust and free. If there were no Linux, spammers would just use some other system. So you can't really say that Linux and Linus Torvalds are responsible or liable for spam.

    A friend of mine worked for a professional spamming outfit that was exclusively Microsoft-based. It's not like it hasn't or can't be done. It's just generally cheaper and easier to do stuff on Linux.

  3. HTML mail is evil by trikberg · · Score: 5, Insightful

    Most of the tricks in the article (yes, I read it) require the mail to be in HTML format. If they were not, filters would be much more effective.

    I don't remember ever receiving an e-mail that actually had any content requiring it to be HTML. It would be pretty sinple to set up a mail server to bounce any incoming (or outgoing for that matter) HTML mail with a friendly notice that the server does not accept HTML mail, and to please try again using ASCII. The problem is that there are plenty of people who have no idea what they are supposed to do at that point.

    Also I wonder if it could be effective for filters to detect whether such obfuscation is used rather than try to parse the contents and filter based on that. Many of the methods used are pretty obvious if you try to detect that specifically.

    --
    This post is free (as in cheese in a mousetrap).
  4. Intresting article by WegianWarrior · · Score: 3, Insightful

    who can possibly resist if the word "Free" is in red and bold? Well, me for starters. Still, this one line of the article is taken from the opening, describing a more serious problem; the fact that much spam uses so called 'enchanted email', that is HTML-mail. For all the other bad thing about that, the one thing I find most sinister is that it is easy to have the html-code pull a picture or something from a remote server; thus making it easy to validate your e-mail adress (logicaly, if you open the mail, the adress they sendt it to is active). In short, banning 'enchanted email' would lessen the amout of spam, as well as the bandwith it steals.

    Apart from that I got a chuckle out the fact that spammers now seem to be speaking 1337;
    Ze Foreign Accent
    What: Replace letters with numbers or use nonsense accents
    Example from the wild:

    V1DE0 T4PE M0RTG4GE

    Fántástìç -- eárn mõnéy thrôugh unçõlleçted judgments

    The best spamfilter - withthe least false positives - are the one most people of common sence has between his ears. Anything else are mearly sorting your mail according to a fixed set of rules.

    --
    Everything in the world is controlled by a small, evil group to which, unfortunately, no one you know belongs.
  5. Re:"Tricks?" by Oddly_Drac · · Score: 2, Insightful

    Anyone else tickled by the fact that downloading the whitepaper requires an email address?

    --
    Oddly Draconis
    Too cynical to live, too stubborn to die.
  6. Re:Why bother? by mumblestheclown · · Score: 1, Insightful
    ISPs filter, people read. AOL filters, joe AOL buys herbal viagra.

    Make sense now?

  7. Re:Render the HTML then use OCR by Zocalo · · Score: 2, Insightful
    Alternatvely, you could just make the HTML parser aware of the tricks via some easily extensible mechanism and run the spam content detector on the output. For example:
    1. Receive HTML email
    2. Remove any HTML comments
    3. Remove any "non-standard" tags
    4. Remove any redundant tags ( Via<B></B>gra )
    5. Remove...
    6. Pass remnants to content filtering app.
    On the otherhand, any HTML email with an excessive HTML comment to content ratio is almost certainly spam anyway, and should probably be discarded as a result.
    --
    UNIX? They're not even circumcised! Savages!
  8. Re:What a waste of effort by Mostly+a+lurker · · Score: 2, Insightful
    Perhaps we should ignore the spammers and target the 0.1% of idiots who actually reply

    It seems logical, but the economics of spam are such that even one sale per million e-mails gives them a big profit. No matter how many idiots you can reach to discourage from replying, there are still going to be some who fall through the cracks.

    I do not think spam will ever be eliminated entirely. Eventually, though, mechanisms will be put in place to allow the situation to be brought under control. Perhaps something along the lines of ...

    1. Most regular e-mail using encryption.

    2. Spam detection of unencrypted e-mails built into the Internet infrastructure itself at various levels The objective would be to identify spam attacks as soon as (and as close to the original source) as possible. Methods analogous to those used today for control of DDOS attacks would then be employed.

  9. Re:Getting worse by Anonymous+Custard · · Score: 2, Insightful

    HTML rendering was added to Pine only fairly recently. Given the quantity of HTML spam out there, it might have been a mistake.

    I think that spam filters should perform HTML rendering before processing the message, or at least strip out anything in <sneaky tags> before analyzing a message. There's no excuse for something as simple as "via<invisible comment when html rendered>gra" getting through a filter.

  10. Re:Why do they try to trick the filters? by Urkki · · Score: 3, Insightful
    They don't want it, but some of them might read some of it, if the subject is just right. And some of these might fall for it. If it's just 1% and 1%, and you send a ten million spams, that's already 1000 successful messages.

    And then of course quite a few people use filters provided by others (like ISP), since it's easy and spam is somewhat bothersome to them, but aren't still totally pissed about it and might read some.

    And of course, the less spam gets through filters, the more likely it is that this "successful" spam gets read, if users mailboxes aren't filled with it. So it's competition between spammers, survival of the most evil, so to say. And I suppose also when marketting spamming services, being able to say "we know how to send mail to all AOLers" is prolly helpful...

  11. Re:insider help is the key. by Anonym0us+Cow+Herd · · Score: 2, Insightful

    that was worth millions to 'em.

    I am skeptical that spammers have millions.

    If you really could get rich as a spammer, then everyone would be doing it. It would be too good to be true. Sort of like free P2P music. Everyone would be doing it.

    If they had millions, there are far more effective ways to advertise whatever legitimate product that people are buying in such volume as to make them their millions. Or were you referring to millions of Iraqi Dinars?

    --
    The price of freedom is eternal litigation.
  12. The one thing I never got was... by jdvernon1976 · · Score: 4, Insightful

    Why DON'T spammers remove us from their lists when we ask? They're working REALLY REALLY hard (with all the filtering, header forging, etc.) to send mail to people that don't want it. If they would just target their email to those who had indicated that they wanted it, and removed us that had indicated they didn't, they'd save themselves a lot of grief, as measured in legal and technical hassle.

    Granted, it's easier for them to ignore the "remove me"s, but is the trouble saved in 'not removing' >= the trouble spent in 'getting past spam filters'?

    Besides, if the mails were targeted to those that THOUGHT their penis was small and needed extension....doesn't that mean it's not spam anymore? And wouldn't that make their click-through (or whatever) rate higher, therefore making their own attractiveness as a bulk emailer greater to their customers?

    I'm just thinkin' here...

  13. Re:My approach by bklock · · Score: 2, Insightful

    Using Text Classification techniques in a spam filter is overall a good idea. (Bayesian systems are only one system for text classification, but they seem to be getting all the attention when it comes to spam)

    The problem, though, is that they don't work on raw text. The text must first be 'featurized', using either a Feature Selection or Feature Extraction algorithm.

    The 'Bayesian' part of anti-spam filters is pretty robust, and should theoretically be able to handle almost all tricks spammers through at them, but the current state of Feature Selection is pretty embryonic.

    All of the tricks in the article fool the tokenizers currently used into producing features inconsistent between spams. No consistency == No classifier. The problem is that a email is not a 'bag of words', but we classify them as if they are.

    What we need, is to extract features which are more similar to the types of features a human looking at the message would use to make the spam / not-spam determination.

    There is a lot of ongoing research in this general area, but to the best of my knowledge, nothing has made it into spam filters yet.

    In the mean time, a lot can be gained but running the Feature Selector / Bayesian filter on the email after its been rendered. Ideally, the filter needs to see exactly what the user will. Anything less is a disconnect between the two that will allow spammers to get to the user messages that get past the filters.

    One good feature that could be extracted from an email and fed to a filter, would be statistical analysis results of rendered vs. not rendered text in the email. Look at the amount, type, and distribution of non rendered text, etc. in spam vs. ham

  14. MX records by MeNeXT · · Score: 2, Insightful
    I always wondered why we do not confirm that the sending IP matches the MX record of a domain.



    1. Most of the SPAM sent today has this little problem, where the sending server does not resolve to the IP which is listed in the header.



    2. It will permit people to first map a domain to an IP.(Makes it harder for a SPAMMER because now he needs to register a domain. Once the domain is used to SPAM it can then be blocked. All blocked domains can be easily maintained in a list and shared by ISP's



    3. Time is money. Moving domains from one ISP to another does not help the SPAMMER. The domain is blocked and the IP is identified. The SPAMMER has to be able to activate multiple domains, multiple DNS servers and such. The paterns will be easier to identify and it will be easier to block SPAM by either Blocking the Domain or the DNS server or all the IP's of a certain offending ISP



    4. In order to acquire a domain a payment transaction must occure. This can be traced if it's a credit card. ISP's who accept cash withou ID or who continually HOST SPAMMERS can be blocked. The work involved to acquire a domain may inclease the costs of a domain but I am sure that this will enable people to assign responsibility.



    While this system is not perfect and, yes it may cause some headaches for most, having sendmail match the MX record to the IP of the sendind server would eliminate almost 100% of all the SPAM that I have encountered in the last 3 months. We would still need to keep the existing anti-spam practices in place.


    When SPAMMERS find a way around this we can then address that issue when it's time.

    --
    DRM? No thanks, I'll just get it somewhere else...
    1. Re:MX records by Anonymous Coward · · Score: 3, Insightful
      I always wondered why we do not confirm that the sending IP matches the MX record of a domain.

      Because this isn't a reliable test.

      1. Most of the SPAM sent today has this little problem, where the sending server does not resolve to the IP which is listed in the header.

      Pay attention to your email some time. Lots of legitimate email doesn't match, either. Many companies and most hosting companies use one server for incoming mail - the server the MX record points to - and another for outgoing - one which doesn't have an MX record.

      2. It will permit people to first map a domain to an IP.(Makes it harder for a SPAMMER because now he needs to register a domain. Once the domain is used to SPAM it can then be blocked. All blocked domains can be easily maintained in a list and shared by ISP's

      Except that most spammers don't use servers under their control, anyway, so this test wouldn't work.

      3. Time is money. Moving domains from one ISP to another does not help the SPAMMER. The domain is blocked and the IP is identified. The SPAMMER has to be able to activate multiple domains, multiple DNS servers and such. The paterns will be easier to identify and it will be easier to block SPAM by either Blocking the Domain or the DNS server or all the IP's of a certain offending ISP

      Which also doesn't work, because the spammers don't use their own servers.

      4. In order to acquire a domain a payment transaction must occure. This can be traced if it's a credit card. ISP's who accept cash withou ID or who continually HOST SPAMMERS can be blocked. The work involved to acquire a domain may inclease the costs of a domain but I am sure that this will enable people to assign responsibility.

      A theory beloved of fascists and quick-fix pipe dreamers, but never actually proven to work in the real world. In fact, I don't know where this has ever worked, period.

      While this system is not perfect and, yes it may cause some headaches for most, having sendmail match the MX record to the IP of the sendind server would eliminate almost 100% of all the SPAM that I have encountered in the last 3 months. We would still need to keep the existing anti-spam practices in place.

      Then what's the freaking point? For me, and for most people I know, this would block about 40% of all *email*, spam and non-spam. The other 60% also includes spam and regular email, so you're not doing anything positive. And the current techniques, constantly improving as more and better filtering techniques become available (e.g. Bayes) already stop 99.9% of the spam I or my users receive. What else do you need? Why make sweeping changes like this to catch .1% or less of spam, particularly with the damage it would do to legitimate email?

      Amazing how all the people making these "brilliant" suggestions couldn't manage a real-world mailserver to save their soul. Running Sendmail on your home Linux box doesn't make you a mail admin.

  15. Re:Easy Solution by Technician · · Score: 3, Insightful

    spam filter render the HTML

    NEVER! Why would I want my client or server validate my address by visiting ther site to fetch some visual. I'd rather have it show up as a dead letter unopened and deleted.

    --
    The truth shall set you free!
  16. No, no, no... look at this another way by RT+Alec · · Score: 3, Insightful

    This article highlights why I have stopped using filters altogether. End-user filters address the symptom, not the cure. The problem with even the best filter is the mail is already there, taking up space, hogging bandwidth, and the filter is churning CPU cycles to hopefuly deal with it. My mail server uses 3 rbl (blacklists), and one I have programmed myself (rbl.restongeek.com). I get no false positives, and only a trickle of spam that gets through. I also get some small pleasure reviewing my server logs of the rejected mail, where the reject happened before any of the actual data was transmitted (see my /. journal for a sample).

    Of the anti-spam legislation currently being proposed, the most important clauses are those that deal with forged headers and illegal use of other servers (relay rape). Once such laws are in place, blacklists will become even more effective, because spammers will have fewer places to run and hide (if they sell something from the U.S.A.).

    One final piece to the solution is to get ISPs to act responsibly, and block egress traffic on port 25 for dynamic IP addresses (look up many of my previous posts for more detail on this point). Again, combined with blacklists, this will reduce spam tremendously-- not just in your inbox, but your (and your ISP's) bandwidth.

  17. Re:Render the HTML then use OCR by Anonymous Coward · · Score: 1, Insightful

    Thats a better idea but the spammer's will find away around it. If the spammer learns the filter then he can program (in javascript, vbscript) to display the text using a timer so when the spam filter takes the snapshot it won't get the spam. Is there any way around this??