Slashdot Mirror


Armoring Spam Against Anti-Spam Filters

moggyf points to a BBC article about how spam can be successfully tweaked to slip past current filtering methods, excerpting "To finding out how to beat the filters Mr Graham-Cumming sent himself the same message 10,000 times but to each one added a fixed number of random words. When a message got through he trained an 'evil' filter that helped to tune the perfect collection of additional words." iluvspam adds "It's an interview with POPFile author John Graham-Cumming that summarizes his talk at the recent MIT Spam Conference. You can still listen to the technical details here (choose the Afternoon 1 session, he starts about 75 minutes in)."

35 of 511 comments (clear)

  1. infinite monkeys by bluelip · · Score: 5, Funny

    SO the ultimate spam protection mechanism would be an infinite number of monkeys type my list of words to associate w/ spam. :)

    --

    Yep, I never spell check.
    More incorrect spellings can be found he
    1. Re:infinite monkeys by AllUsernamesAreGone · · Score: 4, Funny

      We better watch out for slashdot comments appearing in spam now.. ;)

    2. Re:infinite monkeys by Jonas+the+Bold · · Score: 5, Funny

      You kids and your monkeys

      In my day we didn't have monkeys. We had to filter spam by hand. And we liked it!

      You kids and your infinite monkeys... Shakespear wouldn't have used monkeys were he alive today. He would have rolled up his sleaves and written hamlet the right way!

      Damn kids..

      --
      Everything seemed to be going so nice
      'till the end of all beings punched right through the ice
    3. Re:infinite monkeys by TheDigitalRaven · · Score: 5, Funny

      Hands? Them're luxury! When I were a lad, hands were summat only posh people had. The rest of us had to make do with paws which hadn't evolved fully yet, and we had to filter all of our spam from each mailbox manually, but we had to go to the mailbox - across a river of lava, mind - to collect each message but couldn't filter it until we got back. We'd sort spam twenty six hours a day, getting up two hours before going to bed, and had to eat cold poison while we were doing it. And we had to pay for the priviledge of being allowed to filter our own!

    4. Re:infinite monkeys by letxa2000 · · Score: 5, Insightful
      I'm not sure I understand why they think this is a problem with Bayesian filtering. Basically, they're saying that if a spammer sends you the same message thousands of times but inserts a few slightly different words each time, and if the thousands of messages get through the Bayesian filter to the user, and if the user doesn't disable HTML bugs on his email client, then we have a problem...?

      First, if the spammer sends thousands of copies of the same message and just changes the "extra words" that he is testing, it will take very little time for Bayesian to adapt to the rest of the message. Suddenly, the rest of the message that previously contained non-spammy words will be considered very spammy and will overwhelm the "extra words" that each message contains. Each time the message is caught as spam, the probability that any future tests get through--regardless of the "extra words"--will be reduced even further.

      Second, as the article said, it's a lot of work on the part of the spammer. They'd have to send out thousands of messages to each target to "sniff them out" and most of those wouldn't even be effective since most of them would be caught by filters and those few that got through very few would load the HTML bugs to identify themselves.

      Finally, it assumes that those that are using Bayesian filters are filtering their email but leaving their security (inasmuch as HTML bugs) wide open. While there may be some people that use Bayesian and leave HTML bugs active, it has to be a small minority.

      In short, it seems to me they've "found" a way to get around Bayesian that won't work, so to speak. I just don't see the problem.... ??

    5. Re:infinite monkeys by Sique · · Score: 4, Insightful

      Second, as the article said, it's a lot of work on the part of the spammer. They'd have to send out thousands of messages to each target to "sniff them out" and most of those wouldn't even be effective since most of them would be caught by filters and those few that got through very few would load the HTML bugs to identify themselves.

      This is exactly the point. Most of the spam examples will die out because they have an ineffective collection of non spam words. But a few will survive and you now can train an own Bayesian filter which collects the versions of spam that generated webbug hits. After a while some words will shine prominently in your Bayesian filter database for being very effective at slipping through Bayesian spam filters.

      Basicly you a fighting the dote with itself. And yes. You can automate the process. Just take your everyday spam (penis enlargement, unsecured credit, Nigerian business opportunities...), take a dictionary and then randomly mix dictionary words into your spam messages and send them out to your email database. Create a website to get the webbug hits and associate every spam message with a hash of the random dictionary words to identify successful sets of anti spam words.

      --
      .sig: Sique *sigh*
    6. Re:infinite monkeys by Theresa1 · · Score: 5, Funny
      cold poison ?! you lucky buggers.

      We were so poor we had to eat spam.

      --
      This is a manual signature virus. Copy to your signiture file and help me spread.
    7. Re:infinite monkeys by Tripster · · Score: 5, Funny

      Don't know about you but my wife won't let me have one!

    8. Re:infinite monkeys by Jeremi · · Score: 4, Informative
      Ever \/\/0|\|D3R why you're getting so much spam with obfuscated words?


      Nope, because my Bayesian filter works just as well for 0bfu5c4t3d words as it does for properly spelled ones. They are all just sequences of letters, and anything that is deliberately misspelled is going to become identified as spammy very quickly.


      Or why you're getting so much spam where the text content is contained primarily in images rather than plaintext?


      Nope, because I have images turned off by default in my mail viewer. If a stranger wants me to read his email, he'll need to send it as plain text, because (as you point out) HTML email with images is used as a spam vector and little else.


      BTW, this article explains why there will never be a filtering-based solution to solving spam until SMTP itself is made more secure.


      Funny, my Bayesian filter is working fine at this very moment. Who should I believe, your article or my own eyes?


      Jeremy

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
  2. Ok fuck it by tomstdenis · · Score: 5, Funny

    I will pay 1000$ to anyone who seeks out and beats the living daylights out of a spammer. With as many pics on the web as possible for posterity.

    Screw these filters and shit. Start creaming spammers worldwide and they'll think twice about it.

    Tom

    --
    Someday, I'll have a real sig.
    1. Re:Ok fuck it by nigelc · · Score: 5, Funny

      Ahh, an international terrorist proposing an attack. We should be invading Canada any day now...

      --


      Cthulhu Barata Nikto
    2. Re:Ok fuck it by swb · · Score: 4, Insightful

      Another example of people assuming that EVERYBODY lives in the USA or is under US law...

      The solicitation was made on a server located in the US. I don't doubt that Ashcroft would consider that US jurisdiction, regardless of the physical location of the poster.

      There's a lot of guys in dog cages at Guantanomo Bay who've NEVER been to the US. I'm not so sure these days that when the US governemnt is pissed off at you, where you are and where you did something matter a whole lot.

    3. Re:Ok fuck it by AdamD1 · · Score: 4, Funny

      Is that illegal? After all he's not 'threatening' the spammer, he's merely presenting an offer he was pretty sure this guy was asking to receive. And besides: He can certainly "opt-out" at any time by choosing not to spam... ;)

      --
      Because I can! [Brainrub.com]
    4. Re:Ok fuck it by FreeUser · · Score: 4, Funny

      At a minimum, he would be arrested if he came to the states. However, if someone actually went through with the crime, I'm sure Canada would be willing to extradite him. Canada doesn't want maniacs running around free, anymore than the US does.

      That assumes that beating the shit out of a SPAMmer is a "maniacal" act. I would argue that it is a perfectly rational course of action, and indeed a public service.

      Canada's Finlandization by the US might compell it to hand the guy over anyway, but certainly not for fear of having maniacs run loose (unless you count our troups poised on their border to enforce US Political Correctness Bush Style abroad). :-)

      [ Disclaimer required by Our Surveillence State: the preceding was a joke (c.f. humor). ]

      --
      The Future of Human Evolution: Autonomy
  3. Obligatory POPFile Link by rmohr02 · · Score: 5, Interesting

    POPFile, maintained by John Graham-Cumming, is the best spam filter I've used. There may be small flaws with the fundamental concept of Bayesian filters, but POPFile still blocks all my spam.

  4. Tch tch... by supersam · · Score: 5, Insightful

    Didn't they know something as simple as...

    "Make it idiot-proof, and someone will make a better idiot"

  5. The only way by GuyinVA · · Score: 4, Informative

    As technology gets more complicated, so does the spam. The only way to protect yourself is to not give out your address. Period. Heck, I don't even give my work e-mail address to my parents.

    1. Re:The only way by junkymailbox · · Score: 4, Funny

      I dont give out my work address to anyone .. and it's not because i fear spam.. :)

    2. Re:The only way by Quill_28 · · Score: 4, Funny

      >The only way to protect yourself is to not give out your address. Period.

      Ummm.... then what good is it?
      Do you just e-mail yourself? :-)

  6. Here's a sneaky one... by Channard · · Score: 4, Interesting

    Mozilla's filtering catches most spam for me, but some gets through. However, the only one that actually fooled me was quite a sneaky one - headed RE: Question from E-Bayer or whatever the actual subject is where you E-Bay something. Given that I sell on E-Bay, the spammers must have taken a gamble that enough people would read the subject and deem it worth looking at.

    1. Re:Here's a sneaky one... by pclminion · · Score: 5, Informative
      People just have to realise that filtering based on content doesn't work, and will never work, until perhaps we have strong AI.

      That's an overly strong statement to make, and even a little bit irritating to people like myself who actually implement statistical content filters, natural language systems, etc.

      If you are equating "content based filtering" to "Bayesian filtering" then you really only understand 1% of the current state of document classification. Bayesian filtering is a rage right now because it's a linear time algorithm (i.e., implementable on PC hardware). There are document classification schemes that will eat Bayesian for lunch, which are not appropriate for email filtering at this time because of their computational cost. But with continual progress on the algorithms, new methods for reducing search spaces via extremely clever sense-similarity heuristics, and with computers doubling in speed every 18 months, it's closer than you think.

      The spam/ham problem is what data mining researchers would call a "toy problem." You want us to classify documents into only two classifications? Only two? Piece of cake. The problem is, you want us to do it on PC hardware where it isn't feasible to run O(n^2) or O(n^3) machine learning algorithms.

      Let the researchers continue what they're doing. People are just now starting to apply SVMs and other cool techniques to the problem of spam filtering. You'd be amazed at how many of the well-known data mining and statistical NLP researchers have not even thought of using their arsenal against spam.

      It's coming, please be patient.

  7. Re:Hmmm... by somethinghollow · · Score: 5, Insightful

    Like many other academic studies, such as skinning humans alive to see how long they can live, I think this one should only be placed into the right hands.

    It's a pisser that spammers now have another tool to circumvent filters; on the other hand, the people who write the filters know exactly what a spammer would do to make "better" spam.

    The question is: who will implement first?

  8. my spam filter by SkArcher · · Score: 4, Insightful

    if Message header = "type = text/html" then send to "Spam"

    It works a treat :)

    The other trick I have found useful is the CamelCase nature of my name - spammers tend to mail me either as skarcher or SKARCHER, and both trip filters on my mailbox.

    --

    An infinite number of monkeys will eventually come up with the complete works of /.
  9. Re:Great by stevesliva · · Score: 5, Funny
    Guess which words all tomorrows SPAM will contain...
    Touch my wireless Berkshire Marriot?
    --
    Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
  10. Headline tone by Faust7 · · Score: 4, Funny

    Armoring Spam Against Anti-Spam Filters

    That description sounds too noble for an activity like this. More appropriate headlines would be Making Spam Slick as Owlshit or Infusing Spam with Satanic Strength.

  11. Re:Hmmm... by JohnGrahamCumming · · Score: 5, Informative

    If people working in anti-spam don't try to break their own filters the spammers will do it for them and we'll be worse off.

    There's a direct analogy with cryptographic techniques where breaking them is most of the work... that way we know that they are secure.

    John.

  12. Line Noise by 4of12 · · Score: 4, Informative

    A previous story talked about the noise level of spam increasing.

    And a very entertaining NYT article that is in the process of expiring.

    The upshot is that spam is being forced to look more and more like line noise. It will probably become less and less effective as the message has to submerge to the point where people can't recognize it.

    --
    "Provided by the management for your protection."
  13. Duh by Ricin · · Score: 4, Informative

    Of course I can break my own Bayesian filtering.

    What matters is that while one person's spam might be very similar to another person's spam, their ham isn't. At best, it would require a semi-personal approach to sneak in spam. That's why you need to continually train your filter in the first place. Rinse and repeat, that's what it's all about.

    What's being described is not really a flaw, but rather a saturation point at which it's time to retrain your filter and perhaps even start over with a new database. The old one gets too much 'noise' after some time.

    They do point out one thing, be it from the spammers POV: Bayesian filtering is a continuous process and not and end to all solution. It requires fresh input and gets less effective if you keep old crud around for too long and if you train it too much on virtually the same spam/ham.

    It's still a much better solution than blacklists.

  14. Really don't understand it. by The+I+Shing · · Score: 4, Insightful

    I've said this before, but I'll say it again. I really don't understand why all this even happens.

    When I'm going through the webmail access to my spam-bait accounts (the ones that are listed on my websites that I don't bother retrieving with my POP email client anymore because of hundreds of spams a day to each), if I'm fooled into opening one up, most likely because of it having a subject header that might be someone legitimate, the moment I see that the message body says anything spammy I immediately click the Delete button. I imagine everyone else in the world is doing the same thing.

    It's gotten to the point where the preoccupation of spamming is just to get past filters, the result of which is that the message is grumblingly deleted by the irritated recipient. Who out there is saying, "Oh, look, this message got past all my spam filters and contains a lot of jumbled, garbled nonsense text alongside a plug for herbal penis enlarging pills. This must be legitimate. Now, where's my credit card,"? Do the spammers think that we're all clones of Dilbert's pointy-haired manager?

    Spamming is not only irritating, it's pointless. Who is paying these people to spam us? Are people actually buying penis enlarging pills and patches, herbal viagra, mortgage refinancing, credit repair kits, or any of that stuff? Enough to put millions of dollars a month into the hands of career spammers?

    I'm hopelessly at sea in this matter.

    --
    You are in error. No-one is screaming. Thank you for your cooperation.
    1. Re:Really don't understand it. by tbmaddux · · Score: 4, Funny
      Are people actually buying penis enlarging pills and patches, herbal viagra, mortgage refinancing, credit repair kits, or any of that stuff?
      Let me take a moment to tell you my sad story. I was in desperate need of penis enlargement, and so I did start ordering those pills. But they proved hard to swallow, and the patches were itchy, and I had an allergic reaction to the herbs in the herbal viagra. Unfortunately, I bought so much of this stuff that I had to refinance my home, and the bank wouldn't approve my loan because of all the penis purchases on my credit cards. So as a desperate last measure, I ordered some credit repair kits, but that didn't work either!

      Fortunately, this story has a happy ending! As I wrote this message, some polite people in West Africa contacted me and I think they are going to get me out of this financial mess.

      --
      Can't you see that everyone is buying station wagons?
  15. Re:That's dedication... :( by kris_lang · · Score: 5, Informative

    Yes, it's dedication to research. He sent himself the 10k messages to see if he could outwit his own Bayesian filtering of spam messages. He effectively deduced that if the incoming message can be similar enough to items that have been specifically marked non-spam by the end-user of the Bayesian-spam-filter, it will be not be marked as spam.

    There's a cunning recursiveness to this which is at that fine line between clever and stupid. The difficulty is, as he also deduces, that each person's Bayesian rules for spam vs. nonspam are unique and will require many attempt in order to infer the pass-through words that will create a false negative and allow the spam to come through. The one step that people are missing is that if the evil spammer wishes to work on spamming a domain (both in the internet sense and in the "domain of expertise/specialization" sense) she can tailor the pass through words to the market. If she's sending spam to Intel or AMD corporate addresses, then lithography might be the magic word; if she's spamming Xilinx, the fpga will route through the Bayesian filter; if she's spamming Dave Barry, then debenture and fish falling from the sky might help spam make it through, Natalie may or may not make it through a /.'ers filter, actually usually including slashdot in the subject or as the name usually will make it through a slashdotter's filter. And the ease of this lies in that tailoring the open sesame words to a market will probably open the doors to all of the e-mail recipients at a domain, particularly is the spam filtering is done at the mail-server level and not at the end-user level. Thus rather than having to send 10k messages to a single user to crack open the spam doors, sending those 10k messages to multiple users at a domain and analysing which ones get through will effectively open the floodgates for all of the users at that internet domain. And using the concept of a priori probability distributions makes the hunt for these sesame words {[tm] /me :) } easier by limiting the dictionary to be searched to the keywords of the field/domain about to be spammed. That is what makes this dangerous.

    The counterattack from the corportate mail-server will be to look for these similarly unique messages being sent to multiple users.

  16. Re:"and can be combated." by GMontag · · Score: 5, Funny

    but how do you combat the spammer?

    1. Find spammer

    2. Kill spammer

    3. Become hero of the interweb

    4. Write book from prison

    5. ???

    6. Profit!

    Your question is exactly why the death penalty belongs on the street, not in prison.

  17. How NOT to get SPAM 201 - a more practical guide by djrogers · · Score: 4, Insightful
    • 1) Register a domain (come on, they're cheap now)
    • 2) Get an email address from your ISP or other provider (yahoo, fastmail.fm etc) that is complex and convoluted - no names or words
    • 3) set up mail redirection with Zoneedit, redirection.net etc. with a catchall to your new mailbox.
    • 4) Use a different email address every time you must sign up for anything (ie amazon.com@newdomain.com)
    • 5) Filter on sent to headers at first sign of compromised id, or if the volume for a particular id gets too heavy and you're tired of client side filtering, set a specific redirection for it to sample@sample.com (do a whois on sample.com if you're curious).
    • 6) Enjoy the same spam free mailbox I've had for 2 years...
    Also helpful is to change your reply-to address every few months and give your friends different addresses based on how clueful they are
    --
    Think outside the... Hey, where'd the friggin' box go?
  18. I agree, there is no problem. by khasim · · Score: 4, Informative

    He managed to, randomly, find words that were high in _HIS_ "ham" list.

    He could have saved himself a lot of time and trouble and just looked in that file.

    And that file will be different for EVERY installation. So the words he found ("Berkshire", "Marriott", "wireless", "touch" and "comment") would NOT get spam past MY filter.

    So, the spammers have to keep (and update) a word list for EVERY PERSON on their lists.

    Which means that, with an incredible amount of effort, the spammers will be able to get spam to the people least likely to purchase a product from a spammer.

    There is no problem.

    1. Re:I agree, there is no problem. by WuphonsReach · · Score: 4, Interesting

      So, the spammers have to keep (and update) a word list for EVERY PERSON on their lists.

      That's one of the strengths of pushing bayesian filtering to as close to the final recipient as possible. Millions of customized bayesian scoring databases are much more difficult to defeat then a single centralized database. Bayesian databases are pretty much maintenance free, as long as the junk/not-junk/might-be user-interface is intuitive and makes life as easy for the user as possible.

      There is some value in putting the bayesian filtering at a workgroup level, where it helps that there's a bit of shared knowledge and everyone in the group pretty much agrees on their personal definition of what is/isn't spam. However, once you get past around 10-25 people, I'd say that bayesian is going to start becoming ineffective due to either over-zealous users, or overly-broad ham/spam classifications.

      What I'd be interested in is a bayesian that works both on the individual level and the workgroup level. With some sort of flag/switch/setting that tells the engine how much to consider the workgroup database as opposed to my personal database. This would be useful when adding a new member to the group, initially they'd rely heavily on the groups opinion as to what is ham/spam, but as time goes on it would adapt to their choices (as well as the group database slowly adapting to everyone elses).

      --
      Wolde you bothe eate your cake, and have your cake?