Slashdot Mirror


Armoring Spam Against Anti-Spam Filters

moggyf points to a BBC article about how spam can be successfully tweaked to slip past current filtering methods, excerpting "To finding out how to beat the filters Mr Graham-Cumming sent himself the same message 10,000 times but to each one added a fixed number of random words. When a message got through he trained an 'evil' filter that helped to tune the perfect collection of additional words." iluvspam adds "It's an interview with POPFile author John Graham-Cumming that summarizes his talk at the recent MIT Spam Conference. You can still listen to the technical details here (choose the Afternoon 1 session, he starts about 75 minutes in)."

111 of 511 comments (clear)

  1. infinite monkeys by bluelip · · Score: 5, Funny

    SO the ultimate spam protection mechanism would be an infinite number of monkeys type my list of words to associate w/ spam. :)

    --

    Yep, I never spell check.
    More incorrect spellings can be found he
    1. Re:infinite monkeys by AllUsernamesAreGone · · Score: 4, Funny

      We better watch out for slashdot comments appearing in spam now.. ;)

    2. Re:infinite monkeys by Jonas+the+Bold · · Score: 5, Funny

      You kids and your monkeys

      In my day we didn't have monkeys. We had to filter spam by hand. And we liked it!

      You kids and your infinite monkeys... Shakespear wouldn't have used monkeys were he alive today. He would have rolled up his sleaves and written hamlet the right way!

      Damn kids..

      --
      Everything seemed to be going so nice
      'till the end of all beings punched right through the ice
    3. Re:infinite monkeys by TheDigitalRaven · · Score: 5, Funny

      Hands? Them're luxury! When I were a lad, hands were summat only posh people had. The rest of us had to make do with paws which hadn't evolved fully yet, and we had to filter all of our spam from each mailbox manually, but we had to go to the mailbox - across a river of lava, mind - to collect each message but couldn't filter it until we got back. We'd sort spam twenty six hours a day, getting up two hours before going to bed, and had to eat cold poison while we were doing it. And we had to pay for the priviledge of being allowed to filter our own!

    4. Re:infinite monkeys by letxa2000 · · Score: 5, Insightful
      I'm not sure I understand why they think this is a problem with Bayesian filtering. Basically, they're saying that if a spammer sends you the same message thousands of times but inserts a few slightly different words each time, and if the thousands of messages get through the Bayesian filter to the user, and if the user doesn't disable HTML bugs on his email client, then we have a problem...?

      First, if the spammer sends thousands of copies of the same message and just changes the "extra words" that he is testing, it will take very little time for Bayesian to adapt to the rest of the message. Suddenly, the rest of the message that previously contained non-spammy words will be considered very spammy and will overwhelm the "extra words" that each message contains. Each time the message is caught as spam, the probability that any future tests get through--regardless of the "extra words"--will be reduced even further.

      Second, as the article said, it's a lot of work on the part of the spammer. They'd have to send out thousands of messages to each target to "sniff them out" and most of those wouldn't even be effective since most of them would be caught by filters and those few that got through very few would load the HTML bugs to identify themselves.

      Finally, it assumes that those that are using Bayesian filters are filtering their email but leaving their security (inasmuch as HTML bugs) wide open. While there may be some people that use Bayesian and leave HTML bugs active, it has to be a small minority.

      In short, it seems to me they've "found" a way to get around Bayesian that won't work, so to speak. I just don't see the problem.... ??

    5. Re:infinite monkeys by Sique · · Score: 4, Insightful

      Second, as the article said, it's a lot of work on the part of the spammer. They'd have to send out thousands of messages to each target to "sniff them out" and most of those wouldn't even be effective since most of them would be caught by filters and those few that got through very few would load the HTML bugs to identify themselves.

      This is exactly the point. Most of the spam examples will die out because they have an ineffective collection of non spam words. But a few will survive and you now can train an own Bayesian filter which collects the versions of spam that generated webbug hits. After a while some words will shine prominently in your Bayesian filter database for being very effective at slipping through Bayesian spam filters.

      Basicly you a fighting the dote with itself. And yes. You can automate the process. Just take your everyday spam (penis enlargement, unsecured credit, Nigerian business opportunities...), take a dictionary and then randomly mix dictionary words into your spam messages and send them out to your email database. Create a website to get the webbug hits and associate every spam message with a hash of the random dictionary words to identify successful sets of anti spam words.

      --
      .sig: Sique *sigh*
    6. Re:infinite monkeys by Theresa1 · · Score: 5, Funny
      cold poison ?! you lucky buggers.

      We were so poor we had to eat spam.

      --
      This is a manual signature virus. Copy to your signiture file and help me spread.
    7. Re:infinite monkeys by Patik · · Score: 2, Funny

      You forgot to put quotes around Yorkshire and close the span tag

    8. Re:infinite monkeys by nate1138 · · Score: 2, Funny

      Shakespear wouldn't have used monkeys were he alive today. He would have rolled up his sleaves and written hamlet the right way!

      Yeah, he would have had Christopher Marlowe or Bacon write it for him!

      --
      Where's my lobbyist? Right here.
    9. Re:infinite monkeys by NanoGator · · Score: 3, Funny

      "We were so poor we had to eat spam."

      Ah we're such fun loving people. How come none of us have girlfriends?

      --
      "Derp de derp."
    10. Re:infinite monkeys by Anonymous Coward · · Score: 2, Funny

      And thus, in the ancient lineage of "COWBOY NEAL!!!", "In Soviet Russia..." and "???, Profit!!" comes Slashdot's newest guaranteed "Score: 5, Funny" genre of posts. The "Back in my day...".

    11. Re:infinite monkeys by joebok · · Score: 2, Interesting

      I think it's more than no problem - what I believe he is saying is that a Bayesian filter will evolve some "ham" words that will carry an email into an inbox. They are individual and hard to figure out, but there is no reason why a spammer can't append your ham words, my ham words, and everybody else's ham words to the same message and thus bypass all our filters. So instead of the random "word salad" that we would see, we'd be getting a non-random selection of known ham words.

      Even if the HTML business didn't work, spammers still have a mechanism for gauging effectiveness - money. They can assume a fairly even distribution of suckers and start sending out groups of messages with random words and, with some analysis, probably eventually come up with some statistically significant ham words.

      Perhaps in addition to trading email addresses, ham word lists will also start to be traded. The anti-spam/spam industry will evolve like insurance and re-insurance : whoever has the best actuary will win.

      Over time the ham words would also change - I wonder if the fight against spam will start having a noticable effect on our use of language?

    12. Re:infinite monkeys by AaronW · · Score: 2, Funny

      In my day we didn't even have spam, and we liked it! You kids and your fancy smancy spam and filters and whatnot have no idea of the difficulties before spam. Hell, if we wanted to find out about penis enlargement pills we had to go out hiking through the snow uphill to search for them. And we liked it!

      --
      This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
    13. Re:infinite monkeys by FireBreathingDog · · Score: 3, Insightful
      It's much easier than that to defeat Bayesian filtering. Ever \/\/0|\|D3R why you're getting so much spam with obfuscated words? Or why you're getting so much spam where the text content is contained primarily in images rather than plaintext? Those things bypass Bayesian filters, that's why!

      Bayesian filters rely on words. That means it is dependent upon word breaks and certain spellings. Well, spammers have been avoiding word breaks (either by removing spaces or introducing unnecessary ones) and obvious "spam words" by mangling the word or introducing "1337"-type spelling.

      And Bayesian filters can't parse graphics, so a lot of spammers are careful to put words likely to trigger spam filters into graphics.

      BTW, this article explains why there will never be a filtering-based solution to solving spam until SMTP itself is made more secure.

    14. Re:infinite monkeys by Tripster · · Score: 5, Funny

      Don't know about you but my wife won't let me have one!

    15. Re:infinite monkeys by CleverFox · · Score: 3, Funny

      Or I could just sell the spammer a list of the words from 300,000 message Bayesian database that are 1% probability tokens.

      $50,000 gets you the whole 300,000 message Bayesian database.

      lindsayleeds _at_ comcast.net

      Pay up spammers.

    16. Re:infinite monkeys by Jeremi · · Score: 4, Informative
      Ever \/\/0|\|D3R why you're getting so much spam with obfuscated words?


      Nope, because my Bayesian filter works just as well for 0bfu5c4t3d words as it does for properly spelled ones. They are all just sequences of letters, and anything that is deliberately misspelled is going to become identified as spammy very quickly.


      Or why you're getting so much spam where the text content is contained primarily in images rather than plaintext?


      Nope, because I have images turned off by default in my mail viewer. If a stranger wants me to read his email, he'll need to send it as plain text, because (as you point out) HTML email with images is used as a spam vector and little else.


      BTW, this article explains why there will never be a filtering-based solution to solving spam until SMTP itself is made more secure.


      Funny, my Bayesian filter is working fine at this very moment. Who should I believe, your article or my own eyes?


      Jeremy

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    17. Re:infinite monkeys by FireBreathingDog · · Score: 2, Informative
      Nope, because my Bayesian filter works just as well for 0bfu5c4t3d words as it does for properly spelled ones. They are all just sequences of letters, and anything that is deliberately misspelled is going to become identified as spammy very quickly.

      The problem with obfuscated words is that there is a pretty sizable set of permutations for any given word. If one obfuscated variant ends up in your spam word list, that doesn't take care of the thousands of other obfuscated versions of the exact same word.

      Nope, because I have images turned off by default in my mail viewer. If a stranger wants me to read his email, he'll need to send it as plain text, because (as you point out) HTML email with images is used as a spam vector and little else.

      Ahh..yes! I have them turned off, too! But isn't the whole point of Bayesian filtering to stop the spam before it reaches your inbox? Sure, you've got images turned off so you don't see the spam, but if Bayesian is so great, why is the spam in your inbox to begin with?

      Funny, my Bayesian filter is working fine at this very moment. Who should I believe, your article or my own eyes?

      You can believe your own eyes if you wish, but your misconception is assuming that if Bayesian is working for you it is also working for everyone else. Don't get me wrong...Bayesian filtering is a pretty nifty technology. But let's not pretend it's a universal solution that works for everyone.

      For whatever reason, the mix of spam I get isn't caught all that effectively by my Bayesian filter. So, believe your eyes if you wish, but don't claim that my eyes must see exactly what yours do.

    18. Re:infinite monkeys by Nyarly · · Score: 2, Funny
      The funniest thing about the parent is that "pedantic" is misspelled.

      The saddest thing is that quoting the values of html attributes isn't required by the standard.

      --
      IP is just rude.
      Is there any torture so subl
    19. Re:infinite monkeys by meeotch · · Score: 3, Funny
      Stop it, you fools! Slashcode was never designed to support jokes more than four levels deep - you'll cause a core breach!

      You maniacs! Goddamn you all to hell!

      mitch

    20. Re:infinite monkeys by elemental23 · · Score: 2, Informative

      Well, you know the great thing about standards is that you have so many to choose from!

      However, if you choose the current (dated 26 January 2000) W3C XHTML recommendations then yes, the quotes are required.

      --
      I like my women like my coffee... pale and bitter.
    21. Re:infinite monkeys by Mistshadow2k4 · · Score: 3, Funny

      Well, my husband seems to think that me having a girlfriend is a great idea, but I'm not so sure...

      --
      I dream of a better world... one in which chickens can cross roads without their motives being questioned.
  2. Ok fuck it by tomstdenis · · Score: 5, Funny

    I will pay 1000$ to anyone who seeks out and beats the living daylights out of a spammer. With as many pics on the web as possible for posterity.

    Screw these filters and shit. Start creaming spammers worldwide and they'll think twice about it.

    Tom

    --
    Someday, I'll have a real sig.
    1. Re:Ok fuck it by swb · · Score: 2, Informative

      You do realize you've just comitted a pretty serious Federal crime, don't you? I know you're kidding or just emoting the same frustration many others, myself included, feel about the willful disregard spammers seem to have for many things.

      But you might've wanted to add a smiley...

    2. Re:Ok fuck it by cperciva · · Score: 2, Interesting

      You do realize you've just comitted a pretty serious Federal crime, don't you?

      He hasn't, actually -- those laws don't apply extraterritorially, and Tom's in Canada.

    3. Re:Ok fuck it by nigelc · · Score: 5, Funny

      Ahh, an international terrorist proposing an attack. We should be invading Canada any day now...

      --


      Cthulhu Barata Nikto
    4. Re:Ok fuck it by Gaijin42 · · Score: 2, Interesting

      Well, since this is an international forum, he has an out. But if it could be shown that he was soliciting someone to do that crime in the US, even if he did the solicitation from Canada, it would still be a crime in the US.

      At a minimum, he would be arrested if he came to the states. However, if someone actually went through with the crime, I'm sure Canada would be willing to extradite him. Canada doesn't want maniacs running around free, anymore than the US does.

    5. Re:Ok fuck it by swb · · Score: 4, Insightful

      Another example of people assuming that EVERYBODY lives in the USA or is under US law...

      The solicitation was made on a server located in the US. I don't doubt that Ashcroft would consider that US jurisdiction, regardless of the physical location of the poster.

      There's a lot of guys in dog cages at Guantanomo Bay who've NEVER been to the US. I'm not so sure these days that when the US governemnt is pissed off at you, where you are and where you did something matter a whole lot.

    6. Re:Ok fuck it by AdamD1 · · Score: 4, Funny

      Is that illegal? After all he's not 'threatening' the spammer, he's merely presenting an offer he was pretty sure this guy was asking to receive. And besides: He can certainly "opt-out" at any time by choosing not to spam... ;)

      --
      Because I can! [Brainrub.com]
    7. Re:Ok fuck it by Ineffable+27 · · Score: 3, Interesting

      No true jury of his peers would convict him, since chances are they're sick of spam too! :)

      --
      "He'd be a broader guy if he had dropped acid once." - Steve Jobs on Bill Gates
    8. Re:Ok fuck it by FreeUser · · Score: 4, Funny

      At a minimum, he would be arrested if he came to the states. However, if someone actually went through with the crime, I'm sure Canada would be willing to extradite him. Canada doesn't want maniacs running around free, anymore than the US does.

      That assumes that beating the shit out of a SPAMmer is a "maniacal" act. I would argue that it is a perfectly rational course of action, and indeed a public service.

      Canada's Finlandization by the US might compell it to hand the guy over anyway, but certainly not for fear of having maniacs run loose (unless you count our troups poised on their border to enforce US Political Correctness Bush Style abroad). :-)

      [ Disclaimer required by Our Surveillence State: the preceding was a joke (c.f. humor). ]

      --
      The Future of Human Evolution: Autonomy
    9. Re:Ok fuck it by Anonymous Coward · · Score: 3, Funny

      I will pay 1000$ to anyone who seeks out and beats the living daylights out of a spammer.

      Dear Slashdot,

      I am seeking volunteers to join me in a business oppurtunity which has recently come to my attention. Please volunteer if you meet the following three qualifications:

      1) Willing to send 1 spam email.
      2) Willing to have ass beaten.
      3) Want $250.

      If you said yes to all three of the above, please contact me. :D

      P.S. For those who consider #1 to be unethical, consider #2 your punishment.

    10. Re:Ok fuck it by theLOUDroom · · Score: 2, Interesting

      yeah lets just go around beating up spammers. no trial, just vigilante justice. why stop there? lets go around beating up anyone we dont like. screw the court system. i dont like evil conservatives, lets just kill them. no trial, no evidence necessary.

      [sarcasm]Yeah, let's just trust the government to take care of every aspect of our lives and never go against anything it says.[/sarcasm]

      Saying something's "vigilante justice" doesn't automatically make it bad. In order to make that conclusion, you have to start with the assumption that the gov't will always do the right thing.
      Since that's not the case, one must realize that sometimes the rules need to be broken and other solutions applied to the problem.

      Look at it this way:
      You live in a country named dystopia. In this country rape is legal. Every day on the way to school, your daughter gets raped by the same guy. You go to the police, but they do nothing about it because it's not illegal. You try to get a law passed but it gets knocked down. This rape is causing your family real harm ever day. How long are you going to wait before you resort to vigilante justice?.....and more importantly is it a bad thing when you do?

      Now back to the spam problem:
      Spam is pretty much legal (the canspam act was a joke...it made things worse). The gov't is doing basically nothing to stop it. It is causing real harm to internet users around the world. Now I'm not necessarily saying that vigilanteism is the answer, but what I am saying is that your response is an extremely oversimplistic view of the world.


      They law is not always right, nor is it carved in stone. Sure, society is supposed to follow the law, but the law is also supposed to follow society. The law is not this thing a guy came down from a mountain and handed us. It is constant tug-of-war.

      --
      Life is too short to proofread.
    11. Re:Ok fuck it by swb · · Score: 2, Interesting

      They may be able to do that, but having JUST finished serving as a juror in a Federal criminal trial, I can tell you it wouldn't go over very well in most cases where there is strong evidence.

      In all liklihood the judge would declare a mistrial. I'm not familiar (we weren't told) with the judge's powers over a jury and what laws apply to jury conduct. It might be possible for the judge to declare the jury in contempt for disregarding the judge's instructions on how the law(s) are to be applied.

      It's not like you go to court and do whatever you want and interpret the law any way you want. The judge has total control of the rules of interpretation used by the jury. The court and the trial are kind of the judge's own little kingdom, and you mess with a federal judge at your own peril.

  3. Obligatory POPFile Link by rmohr02 · · Score: 5, Interesting

    POPFile, maintained by John Graham-Cumming, is the best spam filter I've used. There may be small flaws with the fundamental concept of Bayesian filters, but POPFile still blocks all my spam.

    1. Re:Obligatory POPFile Link by Tassach · · Score: 2, Informative

      Would that be the same John Graham-Cumming referenced in the article who figured out how to defeat said filter?

      --
      Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
    2. Re:Obligatory POPFile Link by rmohr02 · · Score: 3, Informative

      Yes. He says there's ways to beat it, but that they're complicated to do.

    3. Re:Obligatory POPFile Link by joebok · · Score: 2, Insightful

      Yes - POPFile is fantastic! Since April 4th, my filter is 99.47% accurate at sorting my mail into 6 buckets. Over 18,000 spams have disappeared without me seeing them.

      While it is true that I still have to waste bandwidth and CPU cycles to get rid of this unwanted mail, I no longer have to waste time. I've got my parents, friends, and neighbors all hooked up with POPFile - I believe this is realistically the only way to fight spam - move the decimal place on their success ratio over a couple notches; dig into their bottom line.

    4. Re:Obligatory POPFile Link by rmohr02 · · Score: 2, Informative

      I choose to view all headers, but then I click the [-] in the top left corner of the headers and then see a single line with Subject:, From:, and time. Then when I want to reclassify something, I click the [+] (same place as the [-]) and copy the X-POPFile-Link header to Firebird or whatever browser you use. <http://bugzilla.mozilla.org/show_bug.cgi?id=23114 > is probably what they were referring to when they said this is an email client issue. If that bug is fixed, POPFile will be perfect for me. (Remember that Bugzilla doesn't take /. referrals--you'll have to copy and paste the link location.)

  4. That's dedication... :( by bc90021 · · Score: 2, Insightful

    It's unfortunate that spam must be lucrative enough that one man will send himself the same message 10,000 times and train an evil filter! We need to get people to stop buying products advertised through spam (granted, easier said than done), as in the end, it's the financial incentive that makes a spammer spam. :(

  5. Tch tch... by supersam · · Score: 5, Insightful

    Didn't they know something as simple as...

    "Make it idiot-proof, and someone will make a better idiot"

    1. Re:Tch tch... by interiot · · Score: 2, Interesting
      Well, that's not necessarily ALWAYS true... for instance, most crypto is at least heavily mathematics based, and therefore is much easier to analyze from a purely theoretical standpoint how much CPU is required to break. And in some cases (eg. DES) a lot of theoretical work HAS gone into them to identify weaknesses and analyze exactly how much CPU is required to break a given key length.

      Just that certain technical protections are of the nature that it's not a "I try some random protection, the idiots and/or hackers try random ways to break in, with various techniques being better than others but we really only know by testing them out in the real world."

      But spam unfortunately doesn't fall into that area unless we completely remove anonymity from email, which isn't necessarily the greatest idea. Though I know there are academic proposals for ways to anonymously vote and anonymously send cash in ways that satisfy certain very important criteria (eg. one person can't vote more than once, the receiver of anonymous cash can't retrieve the cash twice from the sender's bank account, the sender can't send a given transaction twice, etc). Do any of these techniques apply to allowing anonymous individual mail and bulk solicited email using a technically verifiable method?

  6. The only way by GuyinVA · · Score: 4, Informative

    As technology gets more complicated, so does the spam. The only way to protect yourself is to not give out your address. Period. Heck, I don't even give my work e-mail address to my parents.

    1. Re:The only way by junkymailbox · · Score: 4, Funny

      I dont give out my work address to anyone .. and it's not because i fear spam.. :)

    2. Re:The only way by Quill_28 · · Score: 4, Funny

      >The only way to protect yourself is to not give out your address. Period.

      Ummm.... then what good is it?
      Do you just e-mail yourself? :-)

  7. Great by Polkyb · · Score: 3, Interesting

    I don't mind him trying to defeat the filters, if it comes up with a method of improving them, but the BBC should be shot for including the words that made it through

    Guess which words all tomorrows SPAM will contain...

    --
    I've never shoed a horse, but I once told a donkey to piss off!
    1. Re:Great by stevesliva · · Score: 5, Funny
      Guess which words all tomorrows SPAM will contain...
      Touch my wireless Berkshire Marriot?
      --
      Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
  8. Here's a sneaky one... by Channard · · Score: 4, Interesting

    Mozilla's filtering catches most spam for me, but some gets through. However, the only one that actually fooled me was quite a sneaky one - headed RE: Question from E-Bayer or whatever the actual subject is where you E-Bay something. Given that I sell on E-Bay, the spammers must have taken a gamble that enough people would read the subject and deem it worth looking at.

    1. Re:Here's a sneaky one... by aussersterne · · Score: 2, Interesting

      I have received piles of these recently. The names, item, item number, and amount change randomly, but it is always structured like a legitimate eBay message. I'm nervous about adding them to my bayesian filtering because I don't want to miss any eBay messages. I, too, sell a lot on eBay...

      --
      STOP . AMERICA . NOW
    2. Re:Here's a sneaky one... by Threni · · Score: 2, Interesting

      What, exactly, is wrong with the `make it computationally expensive to send email` solution Microsoft and others have proposed?

    3. Re:Here's a sneaky one... by Scodiddly · · Score: 2, Interesting

      "the spammers must have taken a gamble that enough people would read the subject and deem it worth looking at"

      A lot of spam works that way. I get stuff headed "Re: your account", "Credit Card Overdue", etc. Spammers accept incredibly low response rates, because sending is so cheap. So the chances are that they're going to have some header you really don't want to filter.

      The odds are almost good enough that perhaps someday they'll randomly send me (and many other people) a header with my own credit card number, just by blind chance.

    4. Re:Here's a sneaky one... by pclminion · · Score: 5, Informative
      People just have to realise that filtering based on content doesn't work, and will never work, until perhaps we have strong AI.

      That's an overly strong statement to make, and even a little bit irritating to people like myself who actually implement statistical content filters, natural language systems, etc.

      If you are equating "content based filtering" to "Bayesian filtering" then you really only understand 1% of the current state of document classification. Bayesian filtering is a rage right now because it's a linear time algorithm (i.e., implementable on PC hardware). There are document classification schemes that will eat Bayesian for lunch, which are not appropriate for email filtering at this time because of their computational cost. But with continual progress on the algorithms, new methods for reducing search spaces via extremely clever sense-similarity heuristics, and with computers doubling in speed every 18 months, it's closer than you think.

      The spam/ham problem is what data mining researchers would call a "toy problem." You want us to classify documents into only two classifications? Only two? Piece of cake. The problem is, you want us to do it on PC hardware where it isn't feasible to run O(n^2) or O(n^3) machine learning algorithms.

      Let the researchers continue what they're doing. People are just now starting to apply SVMs and other cool techniques to the problem of spam filtering. You'd be amazed at how many of the well-known data mining and statistical NLP researchers have not even thought of using their arsenal against spam.

      It's coming, please be patient.

    5. Re:Here's a sneaky one... by pclminion · · Score: 3, Insightful
      The stuff you're talking about is all fine, but it will fail because the spammers will evolve to defeat it.

      I think you overestimate the intelligence of these creeps. The fact that spammers are using more and more of these garbage terms, randomizers, and other hacks to get around the filters actually encourages me -- it demonstrates that they really don't have the slightest clue how statistical content based filtering actually works. Currently, they are taking advantage of the extremely bad decision to assign a 0.4 score to unknown words. The spammers are exploiting a crack in the armor, which means the armor needs to be fixed.

      A human can filter spam. A spammer can't weasel his way around human intelligence, so this sets an upper bound on how advanced the spammer techniques can get. All we have to do is get document classification up to the point of competitiveness with human performance, and the problem is solved. And research into these directions isn't wasted, because the motivation for the research is for actual important document organization tasks. The effect of stomping out spam will be a cool side effect.

      If a spammer was ever actually intelligent enough to get around serious, well-constructed classifiers, I highly doubt he would be in the business of spamming. To suggest that spammers could intellectually compete with people whose have spent years specializing in statistical language processing is a tad bit ridiculous.

      At some point, to sell something, the spammer has to say something intelligible which is an advertisement. They can't hide this. Techniques which are foiled by bogus terms at the bottom of the email are broken. It's not a valid reason to believe that spammers are actually getting smart.

  9. Re:Hmmm... by somethinghollow · · Score: 5, Insightful

    Like many other academic studies, such as skinning humans alive to see how long they can live, I think this one should only be placed into the right hands.

    It's a pisser that spammers now have another tool to circumvent filters; on the other hand, the people who write the filters know exactly what a spammer would do to make "better" spam.

    The question is: who will implement first?

  10. Mainstream Media Coverage by Anonymous Coward · · Score: 3, Interesting

    I hate to see mainstream media coverage of this practice. I have started to get a lot of these spams lately.

    Typlically they include a large image at the top which is the entire intended content of the image and then a bunch of dictionary words at the bottom. It's basically impossible to filter these out unless you filter out ALL HTML e-mail because they don't contain any typical spam text.

  11. my spam filter by SkArcher · · Score: 4, Insightful

    if Message header = "type = text/html" then send to "Spam"

    It works a treat :)

    The other trick I have found useful is the CamelCase nature of my name - spammers tend to mail me either as skarcher or SKARCHER, and both trip filters on my mailbox.

    --

    An infinite number of monkeys will eventually come up with the complete works of /.
  12. Outlook 2003's non-Bayesian junk filter by Anonymous Coward · · Score: 2, Informative

    All spammers have to do is read this analysis of the filter, then included the weighted non-spam strings, while avoiding the spam weighted strings. Pretty simple to blow past their filter.

  13. He'd have an easier time avoiding filters... by shrubya · · Score: 3, Funny

    ...if his surname weren't Cumming. At least his first name isn't Richard.

  14. One word: WHITELIST. by jamehec · · Score: 2, Informative

    If you've whitelisted your email, that crap won't get through if you're not on the whitelist. That goes regardless of your Subject line. Same story if you do challenge/response, for that matter. Or you can munge, as I do.

    I still say spamming needs to be a felony, though.

    --
    This post made with the Dvorak layout.
    "Friends don't let friends use QWERTY"
  15. Re:nice name by JohnGrahamCumming · · Score: 3, Interesting

    Yes, that's a constant problem for me (and anyone else named Cumming or Cummings in the world). For example I can't get a Hotmail email account because of my name, but I did manage to sign up an account using the name Ivana Watch-Teens-Give-Head :-)

    John.

  16. Headline tone by Faust7 · · Score: 4, Funny

    Armoring Spam Against Anti-Spam Filters

    That description sounds too noble for an activity like this. More appropriate headlines would be Making Spam Slick as Owlshit or Infusing Spam with Satanic Strength.

  17. Educate the people by Theresa1 · · Score: 2, Interesting

    When I was on holiday in tunisia, we were bothered quite a lot by trinket salesmen, who would not take no for an answer. Initially we had a lot of difficulty getting rid of them because my kids kept wanting me to buy the trinkets. plleeeese !!!!!!!! can we have one ? . Eventually even my kids got fed up with them, and a united front defeted them. Anyway my popint is, eventually the whole world will wise up and just ignore spam. There will bne no incentive for companies to pay the spammers, and they'll just go away. It might take a while though.

    --
    This is a manual signature virus. Copy to your signiture file and help me spread.
  18. Nothing to worry about. by Kidbro · · Score: 3, Informative

    This would, for most slashdotters, be nothing to worry about. For those of you who didn't RTFA, the entire attack is limited by this particular little gem of info:

    He had to send himself thousands of copies of the same message each one holding an encoded chunk of HTML that reported back to him when it got past the filter.

    The concept is that the spammer has to find words that are so common in a person's ham that including them in spam would fool the filter. However, as those words are unique to each person, a lot (thousands or more) of spam must be sent to test the filter. The problem for the spammer is to figure out which spam actually got through (in order to identify the important words) - something s/he's not able to do for users with a decent email client...

    I still feel quite confident that SpamBayes will keep my inbox free from spam.

  19. Re:combat the flaw? how? by RHS+Bomber · · Score: 2, Insightful

    How about going after the people who own the links in the body of the spam?
    Although it may be difficult to discover where the spam came originated, it's pretty clear where it wants you to go (probably the person who commisioned the spam in the first place.)

  20. Why bother? by nakedbonzai · · Score: 2, Interesting

    I am still perplexed as of why a spammers wants to bypass someone's spam filter. Obviously, the person will simply delete any spam that gets through. They won't read it, they won't buy the product in question! Well, that's the case for me at least. I'd imagine the .001% of people who do respond to spam have no intention of ever using a spam filter.

    1. Re:Why bother? by the+real+darkskye · · Score: 2, Insightful

      The answer is simple, the spammers (the ones doing the spammage, not the ones selling the products) are probably making money from every e-mail sent. As such if they dropped the 1,000,000's of e-mail address they knew were being blocked from their lists, they'd lose 1,000,000 * [profit per e-mail]

      Just my 0.03c (adjusted for inflation)

      --
      Music is everybody's possession.
      It's only publishers who think that people own it.
      Fuck Beta
      ~John Lenno
  21. Re:Hmmm... by JohnGrahamCumming · · Score: 5, Informative

    If people working in anti-spam don't try to break their own filters the spammers will do it for them and we'll be worse off.

    There's a direct analogy with cryptographic techniques where breaking them is most of the work... that way we know that they are secure.

    John.

  22. Re:That's dedication... :( by andih8u · · Score: 3, Insightful

    We need to get people to stop buying products advertised through spam

    As you alluded to, it'd be easier to teach fish to fly. The internet essentially carries with it a stupid-user tax. Worms, virii, spam, et al are the by-products of stupidity, but as with most taxes, it just something that you have to deal with.

    --


    slashdot, news for crazed liberal socialist zealots
  23. how NOT to get SPAM 101 by musikit · · Score: 3, Insightful

    1. don't sign up on any page that requires you email address to verify *cough*like this one *cough*

    2. don't use free email services hotmail etc.
    3. don't use AOL
    4. don't let anyone have your address that forwards messages like "cute bunny pic" or "funny anti-geek joke" etc.
    5. don't post your email anywhere.
    6. don't sign up for majordomo lists.

    1. Re:how NOT to get SPAM 101 by grandmofftarkin · · Score: 2, Insightful
      1. don't sign up on any page that requires you email address to verify *cough*like this one *cough*
      2. don't use free email services hotmail etc.
      3. don't use AOL
      4. don't let anyone have your address that forwards messages like "cute bunny pic" or "funny anti-geek joke" etc.
      5. don't post your email anywhere.
      6. don't sign up for majordomo lists.

      Yeah great and I'm sure it works a treat BUT. 1 and 6 are not practical for many people. 2 and 3 for whatever reason these services may suit some people (money constraints, location). Some people have friends or relatives who do 4, should they just start ignoring them? What if they want to converse with those people [are these playboy bunny pics by the way? ;-)]? 5 one simple mistake an you are done for anyway.

      Also, why should a spammers be allowed to prevent people from using the internet as they see fit. No, I'm sorry but there are better solutions then trying to follow all your advice. I mean, whilst your points are vaild you might as well say:

      7. Don't use the internet

      I guarentee that last one will work perfectly!

  24. Line Noise by 4of12 · · Score: 4, Informative

    A previous story talked about the noise level of spam increasing.

    And a very entertaining NYT article that is in the process of expiring.

    The upshot is that spam is being forced to look more and more like line noise. It will probably become less and less effective as the message has to submerge to the point where people can't recognize it.

    --
    "Provided by the management for your protection."
  25. Only if you're the author. by Eevee · · Score: 3, Insightful

    In the article, it points out those words listed are good for getting past his filter. If you don't normally have mail that uses those words, then your filter will still catch it as spam.

    Now, if you do deal with the Berkshire Marriott frequently, asking them for comments on your wireless setup, then yes you're up the creek.

  26. Re:Discovering Keyword Demographics by Alien54 · · Score: 3, Interesting
    [hit the submit key too fast ....]

    The keywords would be different for each person.

    But I suppose you could discover a select set of keywords for specific demographics, if you defined them very precisely. This would move spam out of the normal "spew it everywhere" phase, where they would have to pay for real marketing data.

    Which sort of misses the point of free advertising in the first point, at least for the small guy. Of course, the big boys can pay for this sort of thing.

    --
    "It is a greater offense to steal men's labor, than their clothes"
  27. Duh by Ricin · · Score: 4, Informative

    Of course I can break my own Bayesian filtering.

    What matters is that while one person's spam might be very similar to another person's spam, their ham isn't. At best, it would require a semi-personal approach to sneak in spam. That's why you need to continually train your filter in the first place. Rinse and repeat, that's what it's all about.

    What's being described is not really a flaw, but rather a saturation point at which it's time to retrain your filter and perhaps even start over with a new database. The old one gets too much 'noise' after some time.

    They do point out one thing, be it from the spammers POV: Bayesian filtering is a continuous process and not and end to all solution. It requires fresh input and gets less effective if you keep old crud around for too long and if you train it too much on virtually the same spam/ham.

    It's still a much better solution than blacklists.

  28. Sigh. It's depressingly predictable by heironymouscoward · · Score: 3, Interesting

    Why is everyone surprised that every technique designed to eliminate spam can be fought? It's obvious that this is going to happen.

    The question should be: how do we live in a world where 99.9(n)% of email is spam? When the virus writers and zombie masters and spysters start using their communications infrastructure for its intended goal of delivering advertising?

    It's inevitable, and no amount of spam filtering will avoid it.

    Here's a prediction I made maybe 6 months ago on Slashdot: we're going to start seeing viruses that modify real outgoing emails to include their advertising messages. (And no Outlook jokes, thanks...) How does one filter spam when real emails are also infected?

    --
    Ceci n'est pas une signature
  29. Let them do so and beat them where it hurts... by DocSnyder · · Score: 2, Interesting
    What they can't hide is the spamvertised target, as they want their victims to click onto a link and order something. Now you can resolve a link's IP address and check it against some common DNSBL blacklists (most spamvertised hosts are listed on SBL, SPEWS or chinanet.blackholes.us), or extract its domain and test it against some RHSBL or manual lists.

    What is more, if you multiply Bayesian or "word list" spam scores with results obtained with other methods, spammers may put "non-spammy" words into their spams as they like, but they only score their crap up instead of down.

  30. Re:nice name by joostje · · Score: 2, Funny
    For example I can't get a Hotmail email account because of my name

    That's OK, 'cause any may you would have sent using that From: Graham-Cumming@hotmail.com header would have been filtered away anyway by the recipient's SPAM filters.

  31. Nowhere near as effective as my attack by Jerf · · Score: 3, Interesting

    Well, I may not have made it into the BBC but my attack is much more effective and much, much harder to defend against: Bayes Attack Report.

    It even counters the "personalization" quality of Bayes filters by finding the "common core" of personalization that we all share.

    Fortunately, spammers continue to be too stupid to understand this attack. Last time I posted this on Slashdot I got joe jobbed, because apparently it's easier to do that then to actually figure out what I was talking about.

    In summary, I wouldn't worry about your Bayes filters for a while: While they are attackable, spammers are too stupid to understand the attacks. (My article has been posted for over a year.) Thank goodness, sort of. (This will eventually be a temporary situation... but I see no particular evidence that the breakthrough will happen anytime soon.)

  32. Re:One word: WHITELIST. by andih8u · · Score: 2, Interesting

    I think whitelists end up discouraging quite a few legitimate users as well as spammers. I've received emails from people asking questions about this or that, I hit reply, and get shot back a message saying that I have to ask their permission to send them an email, even though I'm replying to them. I dunno if they're not setting up their whitelist properly to automatically add any address they send mail to, but I'm not going to hassle with writing out a reply to them, then having to go back a few minutes later and ask their permission to respond to the message they sent me in the first place.

    --


    slashdot, news for crazed liberal socialist zealots
  33. Really don't understand it. by The+I+Shing · · Score: 4, Insightful

    I've said this before, but I'll say it again. I really don't understand why all this even happens.

    When I'm going through the webmail access to my spam-bait accounts (the ones that are listed on my websites that I don't bother retrieving with my POP email client anymore because of hundreds of spams a day to each), if I'm fooled into opening one up, most likely because of it having a subject header that might be someone legitimate, the moment I see that the message body says anything spammy I immediately click the Delete button. I imagine everyone else in the world is doing the same thing.

    It's gotten to the point where the preoccupation of spamming is just to get past filters, the result of which is that the message is grumblingly deleted by the irritated recipient. Who out there is saying, "Oh, look, this message got past all my spam filters and contains a lot of jumbled, garbled nonsense text alongside a plug for herbal penis enlarging pills. This must be legitimate. Now, where's my credit card,"? Do the spammers think that we're all clones of Dilbert's pointy-haired manager?

    Spamming is not only irritating, it's pointless. Who is paying these people to spam us? Are people actually buying penis enlarging pills and patches, herbal viagra, mortgage refinancing, credit repair kits, or any of that stuff? Enough to put millions of dollars a month into the hands of career spammers?

    I'm hopelessly at sea in this matter.

    --
    You are in error. No-one is screaming. Thank you for your cooperation.
    1. Re:Really don't understand it. by One+Louder · · Score: 2, Insightful
      It all depends upon where the blocking is taking place. Clearly some people are responding to spams, so there appears to be some incentive for the spammers to get their message through.

      Obviously, if an individual has gone to some trouble to set up spam filters, then she doesn't want to be bothered and the spam is pointless. However, the vast bulk of these filters are set up by the ISPs, and there's some value to the spammer to get through them to the idiot on the other side who apparently might actually respond to the spam.

    2. Re:Really don't understand it. by andih8u · · Score: 3, Funny

      Here's the simple solution. Simply have your friends send you mail with "hot viagra teen sex mortgage" in the subject. Since all the spam is getting past the filters into the inbox, all of your real mail will be waiting for you in your junk mail folder

      --


      slashdot, news for crazed liberal socialist zealots
    3. Re:Really don't understand it. by argStyopa · · Score: 2, Funny

      Spamming is not only irritating, it's pointless. Who is paying these people to spam us? Are people actually buying penis enlarging pills and patches, herbal viagra, mortgage refinancing, credit repair kits, or any of that stuff? Enough to put millions of dollars a month into the hands of career spammers?


      SHH!! If people paying for these things start looking carefully to see if they actually get a return on their investment, all sort of lunacy may follow:
      - Companies may start asking: Let's see, I spend $1 million on making the ad, and another $1 million for a 30-second spot on the superbowl - did I really get $2 million more PROFIT (not sales) that I wouldn't have gotten anyway without it?
      - Producers might realize that there are hundreds and thousands of extremely talented actors willing to work for salaries many orders of magnitude less than big Hollywood stars, are we really getting that many more people walking into a movie BECAUSE it's starring the Governator or Julia Roberts?
      - Sports franchises might wonder why they are paying $40 million in salaries for 5 guys to play basketball to (if you take out the advertising revenue, above) sell 15,000 seats that are probably worth about $15 each in net profit - that's a measly $225,000 per soldout game. 100 games later, they've paid for about half the team.
      - People might start wondering why they are paying $8 to go to a movie, or $100 for an event (concert/sport) ticket, when there are about 10,000 other things better that they could do with their lives.

      That's crazy talk, man.

      --
      -Styopa
    4. Re:Really don't understand it. by tbmaddux · · Score: 4, Funny
      Are people actually buying penis enlarging pills and patches, herbal viagra, mortgage refinancing, credit repair kits, or any of that stuff?
      Let me take a moment to tell you my sad story. I was in desperate need of penis enlargement, and so I did start ordering those pills. But they proved hard to swallow, and the patches were itchy, and I had an allergic reaction to the herbs in the herbal viagra. Unfortunately, I bought so much of this stuff that I had to refinance my home, and the bank wouldn't approve my loan because of all the penis purchases on my credit cards. So as a desperate last measure, I ordered some credit repair kits, but that didn't work either!

      Fortunately, this story has a happy ending! As I wrote this message, some polite people in West Africa contacted me and I think they are going to get me out of this financial mess.

      --
      Can't you see that everyone is buying station wagons?
  34. Re:nice name by jamehec · · Score: 2, Funny

    Naw, his name would have to be Cumm1ng or C.u.m.m.i.n.g to be filtered. ;)

    --
    This post made with the Dvorak layout.
    "Friends don't let friends use QWERTY"
  35. Re:That's dedication... :( by kent_eh · · Score: 3, Interesting

    One thing we can do is to make the spammers==virus_writers connection every time anyone asks us about (or even mentions) virusses.

    Aren't we the ones our friend(s) and co-workers ask about computer stuff?

    I have taken this a step further and contacted a few "computer journalists" locally and suggested that they make the spam/virus connection the next time they are writing about the latest virus. It's natural to answer the question 'where do these virusses come from' when talking about the latest scource of the internet.

    --

    ---
    "I can't complain, but sometimes still do..." Joe Walsh
  36. Re:Hmmm... by BigBadBri · · Score: 2, Interesting
    Have you tried reducing the significance of your 'ham' list, to see if the spammer's analysis is made more difficult?

    Granted, it may increase the number of false positives, but a relatively small change in the values assigned to 'ham' words might make a big difference to the amount of work required by the spammer.

    I'm not an expert on Bayesian filtering, but I seem to remember that there were a few tweakable parameters.

    --
    oh brave new world, that has such people in it!
  37. Re:Discovering Keyword Demographics by tbannist · · Score: 2, Insightful

    You're not thinking like a spammer, it won't change things very much. If a spammer discovers different keywords that reach different demographics, what do ou think he'll do? I'm betting he'll just send the spam to every address once for each of the sets of keywords. So instead of half of all e-mail being spam, we'll see a huge jump where half of delivered e-mail is spam and 90% (or more) of all e-mail is spam.

    --
    Fanatically anti-fanatical
  38. Re:That's dedication... :( by kris_lang · · Score: 5, Informative

    Yes, it's dedication to research. He sent himself the 10k messages to see if he could outwit his own Bayesian filtering of spam messages. He effectively deduced that if the incoming message can be similar enough to items that have been specifically marked non-spam by the end-user of the Bayesian-spam-filter, it will be not be marked as spam.

    There's a cunning recursiveness to this which is at that fine line between clever and stupid. The difficulty is, as he also deduces, that each person's Bayesian rules for spam vs. nonspam are unique and will require many attempt in order to infer the pass-through words that will create a false negative and allow the spam to come through. The one step that people are missing is that if the evil spammer wishes to work on spamming a domain (both in the internet sense and in the "domain of expertise/specialization" sense) she can tailor the pass through words to the market. If she's sending spam to Intel or AMD corporate addresses, then lithography might be the magic word; if she's spamming Xilinx, the fpga will route through the Bayesian filter; if she's spamming Dave Barry, then debenture and fish falling from the sky might help spam make it through, Natalie may or may not make it through a /.'ers filter, actually usually including slashdot in the subject or as the name usually will make it through a slashdotter's filter. And the ease of this lies in that tailoring the open sesame words to a market will probably open the doors to all of the e-mail recipients at a domain, particularly is the spam filtering is done at the mail-server level and not at the end-user level. Thus rather than having to send 10k messages to a single user to crack open the spam doors, sending those 10k messages to multiple users at a domain and analysing which ones get through will effectively open the floodgates for all of the users at that internet domain. And using the concept of a priori probability distributions makes the hunt for these sesame words {[tm] /me :) } easier by limiting the dictionary to be searched to the keywords of the field/domain about to be spammed. That is what makes this dangerous.

    The counterattack from the corportate mail-server will be to look for these similarly unique messages being sent to multiple users.

  39. Spam - CounterSpam by Aumaden · · Score: 2, Interesting
    I have opted to wage a personal war against spammers. Here's my battle plan:

    Roughly once each week, I go fishing through the spam that has been filtered out of my various accounts for URLs. (Sometimes this involves a little digging to get to the final site.) I extract the host names from the URLs and for each hostname, I create 10 fake email addresses.

    I pack these emails into messages that I post to Usenet in groups likely to be trolled by Spammers. The spammers scrape these addresses from Usenet and add them to their database. Thus, future mailings will also spam the spammer's clients.

    If enough people do this, the generated traffic will begin to overload the client's mail server. After a while the spammer's clients will figure out that every time they employ a spammer, they themselves get spammed.

    Even if nothing comes of this, I get the satisfaction of knowing the real perpetrator (the spammer's client) gets to share some of my pain.

  40. I don't see how this is necessarily a problem by PixelCat · · Score: 3, Insightful

    What he's doing is a brute-force attempt to find words with--for himself--a high ham probability. I don't see how this is necessarily going to be an effective general-purpose technique. If you need to start bombarding people with thousands of messages to find the good words you're just going to drive more people into using filters--and this will almost certainly coerce ISPs into doing more filtering as well. Plus, you've got to deal with the issue of keeping data on all those users to find out which words are good for them. This would require you to tailor your spam to each individual user, which probably is going to increase the cost to the spammer (at least in terms of disk storage and time, anyway) and, as Graham-cumming implemented it, is going to fail utterly for anyone who isn't viewing mail as HTML, anyway.

  41. Re:"and can be combated." by GMontag · · Score: 5, Funny

    but how do you combat the spammer?

    1. Find spammer

    2. Kill spammer

    3. Become hero of the interweb

    4. Write book from prison

    5. ???

    6. Profit!

    Your question is exactly why the death penalty belongs on the street, not in prison.

  42. Re:Hmmm... by ichimunki · · Score: 2, Interesting

    (sorry for the dupe, didn't intend to post as AC the first time)

    It's not rocket science. The statistical filter I've been writing doesn't ignore random words in general (during scoring they just get counted like any other token), but it will ignore them on incoming mail.

    I think trying to classify email as spam/not-spam based on characteristics (which you seem to be suggesting) is a big waste of time. Have you ever tried to wade through Spam::Assassin to see what it actually does? It's painful... and not just because it's written in Perl. Trying to classify based on rules is an arms race with the spammers.

    I'm in the process of replacing S::A with about 100 lines of Ruby code. I stopped using S::A immediately after I realized it had trashed emails from my daughter based on some broken-ness in her email client (the default client on a new Windows XP computer). Obviously the fault was mine for sending spam to the trash folder where it got deleted when I closed KMail, but I don't like that a default S::A called those mails spam in the first place. But it just points up the problems with rules-based filtering approaches.

    The hardest part of a statistical spam filter is not the math, but writing a good "tokenizer" routine. I think mine works well because I push HTML tags to the end and discriminate against header-tokens uniquely (as suggested by Paul Graham). By pushing HTML tags to the end I defeat the attempts by spammers to break up obvious spam words by infixing them with nonsense (i.e. non-displayed) HTML tags.

    --
    I do not have a signature
  43. Re:combat the flaw? how? by Winkhorst · · Score: 3, Insightful

    The best solution I have found so far is to have your own domain and generate specific email addresses for specific types of communications. You keep your actual ISP email address totally secret and don't give it to anybody except your domain registrar. You then generate an address for your best friends and aquaintances you can trust and keep it separate from everything else so you don't have to change it but once every few years if that. You have a specific Shopping and Registration address you kill and replace after it becomes spammy. And you have an address for things like newletters and email groups you can also change and reregister if they leak out to the spam boobs. There are all kinds of variations on this theme, but that's the basic gist of the matter: Secrecy and flexibility.

    --
    "Is this Winkhorst a nova criminal?" "No just a technical sergeant wanted for interrogation."
  44. Go after the Sellers not just the spammers by OlivierB · · Score: 2, Informative

    I don't know about you but here in France we have rules to deal with illicit Poster ads. It's a 100 year old law that people/companies put up on their walls stating that posters will be prosecuted as well as those for whom they are advertising. This takes care of that. If spam laws targetted as well retail stores advertised by the said spams, than far more less Viagra/Nigerian etc stores would be paying spammers to do this. It's as simple as that, why can't it be done? Don't tell me these stores are abroad, there are international laws for that. Also most of these spam advertised companies are US based.

    --
    Artificial intelligence is no match for natural stupidity
  45. Obligatory Rich Cook Quote by FreemanPatrickHenry · · Score: 2, Funny

    "Programming today is a race between software engineers, trying to build bigger and better idiot proof programs, and the universe, trying to build bigger and better idiots."

    --
    I have discovered a truly marvelous .sig which, unfortunately, this space is too small to contain.
  46. Re; Phase matched noise - invert and cancel by Technician · · Score: 2, Interesting

    In the analog world many times if noise in a system is a repeating wave (hum in an audio line), it can be duplicated, inverted and added to the original to eliminate the noise and leave the signal.

    Apply this to a mail server. Hold all mail for about 5 minutes (from outside only). Compare them all. Look for matches of more than 50%. Cancel the matches out and filter the incomming for the same. This nails lots of the worms and spam by rejecting the common mode noise. Most spammers create a message and mass mail the same message, not create new messages for each reciepent (except some boilerplate name use).
    Hotmail could catch a lot of spam this way and yank it out of mailboxes before they are retreived and halt the remaining incomming very effeciently. Only the first few would make it past the filter, but then be recalled back out of mailboxes if the user hasn't retrieved them yet.

    Sending the same mail from dozens of relays would have no effect on the filter. Where it comes from simply doesn't matter. If it has a large protion that is a match, it's dead. Newsgroup mail lists would have to be white listed on a case by case basis.

    --
    The truth shall set you free!
  47. How NOT to get SPAM 201 - a more practical guide by djrogers · · Score: 4, Insightful
    • 1) Register a domain (come on, they're cheap now)
    • 2) Get an email address from your ISP or other provider (yahoo, fastmail.fm etc) that is complex and convoluted - no names or words
    • 3) set up mail redirection with Zoneedit, redirection.net etc. with a catchall to your new mailbox.
    • 4) Use a different email address every time you must sign up for anything (ie amazon.com@newdomain.com)
    • 5) Filter on sent to headers at first sign of compromised id, or if the volume for a particular id gets too heavy and you're tired of client side filtering, set a specific redirection for it to sample@sample.com (do a whois on sample.com if you're curious).
    • 6) Enjoy the same spam free mailbox I've had for 2 years...
    Also helpful is to change your reply-to address every few months and give your friends different addresses based on how clueful they are
    --
    Think outside the... Hey, where'd the friggin' box go?
  48. I agree, there is no problem. by khasim · · Score: 4, Informative

    He managed to, randomly, find words that were high in _HIS_ "ham" list.

    He could have saved himself a lot of time and trouble and just looked in that file.

    And that file will be different for EVERY installation. So the words he found ("Berkshire", "Marriott", "wireless", "touch" and "comment") would NOT get spam past MY filter.

    So, the spammers have to keep (and update) a word list for EVERY PERSON on their lists.

    Which means that, with an incredible amount of effort, the spammers will be able to get spam to the people least likely to purchase a product from a spammer.

    There is no problem.

    1. Re:I agree, there is no problem. by WuphonsReach · · Score: 4, Interesting

      So, the spammers have to keep (and update) a word list for EVERY PERSON on their lists.

      That's one of the strengths of pushing bayesian filtering to as close to the final recipient as possible. Millions of customized bayesian scoring databases are much more difficult to defeat then a single centralized database. Bayesian databases are pretty much maintenance free, as long as the junk/not-junk/might-be user-interface is intuitive and makes life as easy for the user as possible.

      There is some value in putting the bayesian filtering at a workgroup level, where it helps that there's a bit of shared knowledge and everyone in the group pretty much agrees on their personal definition of what is/isn't spam. However, once you get past around 10-25 people, I'd say that bayesian is going to start becoming ineffective due to either over-zealous users, or overly-broad ham/spam classifications.

      What I'd be interested in is a bayesian that works both on the individual level and the workgroup level. With some sort of flag/switch/setting that tells the engine how much to consider the workgroup database as opposed to my personal database. This would be useful when adding a new member to the group, initially they'd rely heavily on the groups opinion as to what is ham/spam, but as time goes on it would adapt to their choices (as well as the group database slowly adapting to everyone elses).

      --
      Wolde you bothe eate your cake, and have your cake?
  49. Re:Lets Help Him Out by JohnGrahamCumming · · Score: 3, Informative

    How exactly is attacking me going to help? Unless you yourself are a spammer? Since I make a living working on anti-spam and released POPFile for free I can't see how attacking me is going to make the spam problem any better.

    Perhaps you didn't read the article: I am not a spammer, I work for a company that makes anti-spam software.

    John.

  50. Personally I prefer SpamBayes by Julian+Morrison · · Score: 2, Interesting

    http://spambayes.sourceforge.net/

    In particular, I like their "unsure" categorization. All the "false positives" go in there, and cleaning that one folder out regularly is easy.

  51. Re:combat the flaw? how? by Nightlight3 · · Score: 2, Insightful

    How about going after the people who own the links in the body of the spam?

    You are starting with a heretical premise that government, or rather, the large corporations which pull the strings, have the same objective as the end user (the end of spam). Of course, it could be stopped (by cracking down hard on those contracting the spammers). But it is much more useful for them if the "war on spam" goes on and on, while the measures with side-effects (on your wallet, your freedom and your privacy) are gradually introduced to "combat" the spam. Just recall other such "wars" such as "war on drugs" or "war on poverty/racism" or "war on smoking" or "war on guns" or the most recent "war on terrorism". This is an ancient recipe of control and enslavement, perfected by churches and priesthoods over millenia (war on sin/devil, war on death), merely translated into modern jargon and current circumstances.

  52. Perhaps not! by The_DOD_player · · Score: 2, Insightful

    I don't feel that would be an effective spamming technique. A person's outgoing e-mail is such low-volume that a spammer isn't really spreading the word.

    It doesnt take very much volume to defeat the function of spam-blocking.
    I have a very effective spamfilter on my server (customised spamassassin + some procmailscripts) 95-98% catch, virtually no false positives. The remaining spam is just nonsense, the mails make no sense, and the spammers are unable to sell anything from these spam-mails. Their primary purpose seems to defeat the filter, so if I setup the filter to block them, it will also generate false positives.

    Not to mention that it'd have to include a mechanism for the spammer to get paid for the victim sending the message.

    They dont need to get pay for the "conterminated" emails. The purpose would be to create false positives, by doing so force the operator to loosen the filter, and THEN get the real spam trough.

    I'd lose my patience quickly if someone I knew sent me spam a second time after I alerted them to their problem. Fortunately, I don't know that many clueless people.

    I dont see how that will stop spammers trying to conterminate legit emails. A few clueless users is all it takes.

  53. I am building my own by Tablizer · · Score: 3, Interesting

    Any spam filter used by more than a few thousand people will be disected and and used to make filter-proof spam by the spammers. I am sure Bayesian has lots of holes if you work hard enough to find them. Bayesian depends on constistency in patterns. If spammers ruin that consistency, they won't work.

    Just the other day I found one spam that used a white font to put in legitamate-sounding text that would not visually show up on the screen. The spam text was a mix of graphics and pieces of real text. Thus, the word "penis" might start out with "pen" and end with a graphic for "is". Bayesian might start looking for the word "pen" after a while, but by that time the spammers will have a new trick up their sleeve. For example, if it looks for white fonts, then spammers might start using slightly off-white fonts, or black fonts on a black background. The combinations are probably endless.

    Thus, by making my own, my gizmo is not the target of spammers. They don't know about my filter nor care.

    The only alternative I can see is filter vendors constantly changing their algorithms every month or so, which would probably get expensive and risky. It is not like virus checking software that mostly just adds to their database and only tweak the algorithm a bit once every few years; it is like having to completely rewrite the virus filtering algorithms, not just the data.

    Ultimately, I think some sort of monetary postage system is the only effective solution. ISP and backbone makers will only have an incentive to track down spammers if they lose money on anonymous or forged spammers. This will make mass spamming far less lucrative.

    Either that, people will eventually find out the hard way that penis enlargers don't work and stop wanting to refinance their house. (I wonder if I can refinance all those expensive penis enlargers that I bought?)

  54. How about something a little more legal? by Gzip+Christ · · Score: 2, Interesting
    I will pay 1000$ to anyone who seeks out and beats the living daylights out of a spammer. With as many pics on the web as possible for posterity.
    How about putting that $1K towards a legal use and offer it as a bounty to anybody who tracks down a spammer, sues him, and gets him thrown in jail and/or bankrupts him (via court imposed fines)? It may not have the same immediate satisfaction that you were originally seeking, but it's far more legal and I think you could find plenty of people here on Slashdot to chip in some extra $ to raise the pot even higher.
  55. Y'all are going to hate this, but... by duck_prime · · Score: 3, Insightful
    ... The internet essentially carries with it a stupid-user tax. Worms, virii [sic, heh], spam, et al are the by-products of stupidity, but as with most taxes, it is just something that you have to deal with.
    With respect to spam, let's take a step back. Obviously somebody out there is gleefully munching handfuls of Viagra and (ahem) "enhancement" pills to psych himself up to (ahem) r0x0r his wife until her weight-loss pills kick in.

    It is silly to assume that all these people are just morons. After all, Viagra is proven to work, it is a legitimate product of sorts. The internet is there for hefty short limp (ahem ahem) non-digerati as well as for propeller heads, God bless 'em.

    It seems to me that spam is the runaway bastard-child of something which actually is good and useful -- that is, targeted marketing to the willing. Don't throw out the baby with the bathwater. There is a huge legitimate market out there, just begging to be flee^wmarketed.

    The anti-spam people are fighting against the Invisible Hand. Good luck.
  56. Re:Hmmm... by dingbatdr · · Score: 2, Funny

    You mean that we should actually test our code?
    Against real data?
    Aren't you worried that could start some kind of
    scary precedent?

    dtg

    --
    The truth is an offense, but not a sin.------R. N. Marley
  57. Meanwhile, this guy is screwed. by Dukael_Mikakis · · Score: 3, Funny

    He posted his "free-pass" words on the net.

    Never mind that his last name is "Cumming".

  58. Now spam can be targeted advertising ... by dsojourner · · Score: 2, Interesting

    The idea is to find words that someone needs to let through, and add them to your spam.

    Exactly which words will be a function of job, life style, income level ...

    So when I use my anti-anti-spam filter, I can generate lists of words that will target specific populations, w/o having to figure out who on my (huge) list of recipients is in which population.

    Big news ...

  59. easily combatable by CAIMLAS · · Score: 2, Insightful

    This is easily defeated by an intelligent spellcheck built into antispam filters. It'd be able to recognize things such as commonly misspelled words, PGP/GPG keys, and file signatures, but would then create a rating based on number or percentage of non-words.

    It could then mark it with a spam rating and be combined with spamassassin or such.

    plus, wouldn't the spamassassin logic be able to say, "hey, we're getting a lot of non-word stuff - our filters tell us it's spam" and defeat this spam already?

    --
    ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers