Slashdot Mirror


Armoring Spam Against Anti-Spam Filters

moggyf points to a BBC article about how spam can be successfully tweaked to slip past current filtering methods, excerpting "To finding out how to beat the filters Mr Graham-Cumming sent himself the same message 10,000 times but to each one added a fixed number of random words. When a message got through he trained an 'evil' filter that helped to tune the perfect collection of additional words." iluvspam adds "It's an interview with POPFile author John Graham-Cumming that summarizes his talk at the recent MIT Spam Conference. You can still listen to the technical details here (choose the Afternoon 1 session, he starts about 75 minutes in)."

12 of 511 comments (clear)

  1. Obligatory POPFile Link by rmohr02 · · Score: 5, Interesting

    POPFile, maintained by John Graham-Cumming, is the best spam filter I've used. There may be small flaws with the fundamental concept of Bayesian filters, but POPFile still blocks all my spam.

  2. Great by Polkyb · · Score: 3, Interesting

    I don't mind him trying to defeat the filters, if it comes up with a method of improving them, but the BBC should be shot for including the words that made it through

    Guess which words all tomorrows SPAM will contain...

    --
    I've never shoed a horse, but I once told a donkey to piss off!
  3. Here's a sneaky one... by Channard · · Score: 4, Interesting

    Mozilla's filtering catches most spam for me, but some gets through. However, the only one that actually fooled me was quite a sneaky one - headed RE: Question from E-Bayer or whatever the actual subject is where you E-Bay something. Given that I sell on E-Bay, the spammers must have taken a gamble that enough people would read the subject and deem it worth looking at.

  4. Mainstream Media Coverage by Anonymous Coward · · Score: 3, Interesting

    I hate to see mainstream media coverage of this practice. I have started to get a lot of these spams lately.

    Typlically they include a large image at the top which is the entire intended content of the image and then a bunch of dictionary words at the bottom. It's basically impossible to filter these out unless you filter out ALL HTML e-mail because they don't contain any typical spam text.

  5. Re:nice name by JohnGrahamCumming · · Score: 3, Interesting

    Yes, that's a constant problem for me (and anyone else named Cumming or Cummings in the world). For example I can't get a Hotmail email account because of my name, but I did manage to sign up an account using the name Ivana Watch-Teens-Give-Head :-)

    John.

  6. Re:Discovering Keyword Demographics by Alien54 · · Score: 3, Interesting
    [hit the submit key too fast ....]

    The keywords would be different for each person.

    But I suppose you could discover a select set of keywords for specific demographics, if you defined them very precisely. This would move spam out of the normal "spew it everywhere" phase, where they would have to pay for real marketing data.

    Which sort of misses the point of free advertising in the first point, at least for the small guy. Of course, the big boys can pay for this sort of thing.

    --
    "It is a greater offense to steal men's labor, than their clothes"
  7. Sigh. It's depressingly predictable by heironymouscoward · · Score: 3, Interesting

    Why is everyone surprised that every technique designed to eliminate spam can be fought? It's obvious that this is going to happen.

    The question should be: how do we live in a world where 99.9(n)% of email is spam? When the virus writers and zombie masters and spysters start using their communications infrastructure for its intended goal of delivering advertising?

    It's inevitable, and no amount of spam filtering will avoid it.

    Here's a prediction I made maybe 6 months ago on Slashdot: we're going to start seeing viruses that modify real outgoing emails to include their advertising messages. (And no Outlook jokes, thanks...) How does one filter spam when real emails are also infected?

    --
    Ceci n'est pas une signature
  8. Nowhere near as effective as my attack by Jerf · · Score: 3, Interesting

    Well, I may not have made it into the BBC but my attack is much more effective and much, much harder to defend against: Bayes Attack Report.

    It even counters the "personalization" quality of Bayes filters by finding the "common core" of personalization that we all share.

    Fortunately, spammers continue to be too stupid to understand this attack. Last time I posted this on Slashdot I got joe jobbed, because apparently it's easier to do that then to actually figure out what I was talking about.

    In summary, I wouldn't worry about your Bayes filters for a while: While they are attackable, spammers are too stupid to understand the attacks. (My article has been posted for over a year.) Thank goodness, sort of. (This will eventually be a temporary situation... but I see no particular evidence that the breakthrough will happen anytime soon.)

  9. Re:That's dedication... :( by kent_eh · · Score: 3, Interesting

    One thing we can do is to make the spammers==virus_writers connection every time anyone asks us about (or even mentions) virusses.

    Aren't we the ones our friend(s) and co-workers ask about computer stuff?

    I have taken this a step further and contacted a few "computer journalists" locally and suggested that they make the spam/virus connection the next time they are writing about the latest virus. It's natural to answer the question 'where do these virusses come from' when talking about the latest scource of the internet.

    --

    ---
    "I can't complain, but sometimes still do..." Joe Walsh
  10. Re:Ok fuck it by Ineffable+27 · · Score: 3, Interesting

    No true jury of his peers would convict him, since chances are they're sick of spam too! :)

    --
    "He'd be a broader guy if he had dropped acid once." - Steve Jobs on Bill Gates
  11. I am building my own by Tablizer · · Score: 3, Interesting

    Any spam filter used by more than a few thousand people will be disected and and used to make filter-proof spam by the spammers. I am sure Bayesian has lots of holes if you work hard enough to find them. Bayesian depends on constistency in patterns. If spammers ruin that consistency, they won't work.

    Just the other day I found one spam that used a white font to put in legitamate-sounding text that would not visually show up on the screen. The spam text was a mix of graphics and pieces of real text. Thus, the word "penis" might start out with "pen" and end with a graphic for "is". Bayesian might start looking for the word "pen" after a while, but by that time the spammers will have a new trick up their sleeve. For example, if it looks for white fonts, then spammers might start using slightly off-white fonts, or black fonts on a black background. The combinations are probably endless.

    Thus, by making my own, my gizmo is not the target of spammers. They don't know about my filter nor care.

    The only alternative I can see is filter vendors constantly changing their algorithms every month or so, which would probably get expensive and risky. It is not like virus checking software that mostly just adds to their database and only tweak the algorithm a bit once every few years; it is like having to completely rewrite the virus filtering algorithms, not just the data.

    Ultimately, I think some sort of monetary postage system is the only effective solution. ISP and backbone makers will only have an incentive to track down spammers if they lose money on anonymous or forged spammers. This will make mass spamming far less lucrative.

    Either that, people will eventually find out the hard way that penis enlargers don't work and stop wanting to refinance their house. (I wonder if I can refinance all those expensive penis enlargers that I bought?)

  12. Re:I agree, there is no problem. by WuphonsReach · · Score: 4, Interesting

    So, the spammers have to keep (and update) a word list for EVERY PERSON on their lists.

    That's one of the strengths of pushing bayesian filtering to as close to the final recipient as possible. Millions of customized bayesian scoring databases are much more difficult to defeat then a single centralized database. Bayesian databases are pretty much maintenance free, as long as the junk/not-junk/might-be user-interface is intuitive and makes life as easy for the user as possible.

    There is some value in putting the bayesian filtering at a workgroup level, where it helps that there's a bit of shared knowledge and everyone in the group pretty much agrees on their personal definition of what is/isn't spam. However, once you get past around 10-25 people, I'd say that bayesian is going to start becoming ineffective due to either over-zealous users, or overly-broad ham/spam classifications.

    What I'd be interested in is a bayesian that works both on the individual level and the workgroup level. With some sort of flag/switch/setting that tells the engine how much to consider the workgroup database as opposed to my personal database. This would be useful when adding a new member to the group, initially they'd rely heavily on the groups opinion as to what is ham/spam, but as time goes on it would adapt to their choices (as well as the group database slowly adapting to everyone elses).

    --
    Wolde you bothe eate your cake, and have your cake?