Slashdot Mirror


Response to Gordon Cormack's Study of Spam Detection

Nuclear Elephant writes "In light of Gordon Cormack's Study of Spam Detection recently posted on Slashdot, I felt compelled to architect an appropriate response to Cormack's technical errors in testing which ultimately explain why one of the world's most accurate spam filters (CRM114) could possibly end up at the bottom of the list, underneath SpamAssassin. I spend some time explaining what is a correct test process and keep my grievances simplified about the shortcomings of Cormack's research."

28 of 229 comments (clear)

  1. Studies create discussion by Timesprout · · Score: 5, Insightful

    I usually frown when I see many of these so called studies offering conclusions, several of which differ radically from my own experience. There recent Java/C++ performance one was a classic example. It gets annoying when a pro MS result is immediately decried as marketing FUD because it just cant be better and a pro Linux result is taken gospel truth here on /. Usually I tend to take all results with a grain of salt or just plain ignore them and focus on the debate around them.

    The benifit of these studies though is that fantical crap aside informed people will usually take the time to interpret results or suggest corrections/improvements that actually benifit developers and improve their knowledge base more than any information provided by the actual study.

    --
    Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
    What truth?
    There is no dupe
    1. Re:Studies create discussion by dasmegabyte · · Score: 1, Insightful

      But the purpose of studies is to offer insight into the best tools for a specific set of dependencies. Decry the dependencies, and you're essentially eliminating the purpose of the study.

      For example, I am working on a GIS application. I looked at offerings from ArcView and MapInfo and found that while they do what I need to do out of the box, they are quite expensive and required a license for every seat of my application. So I looked to Open Source. There I found hundreds of tools, none of which did what I needed to do. I could adapt a bunch of them to accomplish my goals, but the time to do that, as well as port them all to Windows, would cost at least ten times more than either of these applications. What's worse is that I am not a GIS expert...I can load and search maps using the interface, but I couldn't write my own algorithms to do so...and to commission the work would cost more than if I did it myself.

      So, for my particular use, Open Source Software is insufficient. If I just had to display a raster graph, it would be perfect. But for my use, it doesn't work.

      Now, if I had released my findings as a study -- "Developer finds Open Source No Good in Specific Area" -- slashdrones would first attack the specific area, attack me for not being bright enough to know some obscure, undocumented toolset that could have solved my problem, and then proceed to talk about how great it was that the computer in their car runs Linux, thus making it the ultimate operating system.

      This is defeatest bullshit. Ignoring your problems doesn't make them go away. This is like blaming McDonalds for your big, fat ass, or blaming Microsoft because you got a virus when you didn't run the patch they released to prevent it.

      --
      Hey freaks: now you're ju
    2. Re:Studies create discussion by killjoe · · Score: 4, Insightful

      "This is defeatest bullshit. Ignoring your problems doesn't make them go away. "

      You miss an important point. This is not "our" problem, it's YOUR problem. I don't need a GIS program and neither to millions of other other people. YOU need one and too bad for you they cost tens of thousands of dollars. You have no right to complain that somebody else hasn't taken the time and effort required to give you a free equavalent.

      What you need to understand is the open source is nothing but scratching an itch. This is your itch and you need to scratch it.

      OPEN SOURCE ONLY WORKS IF PEOPLE CONTRIBUTE. This very simple and obvious point seems to be lost on most people. You are not supposed to sit around till somebody else does the work and give you something for nothing. You need to contribute.

      You need to start an organization and start raising money to fund an open source development effort or to accelerate and existing one. You need to get involved and contribute. BTW bitching on slashdot does not count as contributing.

      "This is like blaming McDonalds for your big, fat ass, or blaming Microsoft because you got a virus when you didn't run the patch they released to prevent it."

      Or blaming the open source community because they didn't give you something for free.

      --
      evil is as evil does
  2. You don't like my software so I'll flame you by ifreakshow · · Score: 2, Insightful

    This guy seems a little harsh and just a bit jealous of the success of Gordon Cormack's article. I'd like to know what makes his opinion any more valid than Gordon's.

    Information on his professional career was very hard to find on the site.

    This just seems like a flame because his software(dspam) didn't perform well in the test.

    1. Re:You don't like my software so I'll flame you by Threni · · Score: 3, Insightful

      > This guy seems a little harsh and just a bit jealous of the success of Gordon
      > Cormack's article.

      Articles aren't 'successful` - they're either useful, or they're just fun to read. Perhaps his is the latter.

      From the response:
      ---
      It turned out that Cormack was using the wrong flags, didn't understand how to train correctly, and seemed very reluctant to fully read the documentation. I don't mean to ride on Cormack, but proper testing requires a significant amount of research, and research seems to be the one thing lacking from this research paper.
      ---

      One thing I've noticed is that more and more people seem to want an answer NOW - even if it's not the correct answer, or even if the original question asked wasn't the correct one.

      > I'd like to know what makes his opinion any more valid than Gordon's.

      Everyones opinion is as valid as you - the observer - decide it to be.

      But in terms of which filter is the best - what does anyone's opinion have to do with it? If you're bothered about this issue, why not read both articles, think about it, and then perform the tests yourself? Or wait for an impartial third party to perform the relavent tests. There doesn't appear to be any alternative.

    2. Re:You don't like my software so I'll flame you by Otter · · Score: 5, Insightful
      There are some technical objections in there (old versions of software, the fact that Spam Assassin was tested with a spam collection generated by spam assassin). But honestly, after wading through all the whining and sneering, I didn't have the energy to pick the points out of the overall flow.

      Jonathan, next time:

      • Start by summarizing your technical objections.
      • Continue by detailing your technical objections.
      • Leave the nasty rants to the end, or better yet, leave them out entirely.
      • Stop talking about "geeks" in every paragraph.
      • Please stop referring to spam filter comparisons as "science".
    3. Re:You don't like my software so I'll flame you by pclminion · · Score: 4, Insightful
      This guy seems a little harsh and just a bit jealous of the success of Gordon Cormack's article.

      Let me explain why he's irritated, as somebody who has conducted spam filter statistical tests and made publications on the topic.

      Yes, it is irritating when somebody demonstrates that his method is better than yours. However, most researchers are able to accept this, and continue improving their own work.

      However, what is far more irritating (by an order of magnitude at least) is when somebody "demonstrates" the inferiority of your work, and they do so in a completely scientifically bogus way.

      Let me give a concrete example. Suppose you were Galileo. You have just put forth the postulate that all objects fall at the same speed regardless of mass. A "debunker" attempts to demonstrate that this isn't true by dropping an iron ball and a feather. Obviously, the feather falls much more slowly.

      "Ha ha, neener, neener!" cries the debunker. Of course, Galileo knows his method is flawed. If people actually listen to this supposed debunker, Galileo might become very, very irritated indeed.

    4. Re:You don't like my software so I'll flame you by ComputerSlicer23 · · Score: 2, Insightful
      Please stop referring to spam filter comparisons as "science".

      I believe the author of the article would have two issues with that assertion.

      First off, you can have science about how fast grass grows. You have science about how many sexual partners a person has. You have science about how to manipulate people with irrational arguments. Science can be applied to anything that you apply scientific princepals to. Science in a lot of ways, is merely a matter of measuring in a controlled manner and then commenting on such measuring. The usefulness of science is when those measurements are useful and applicable to common every day situations. Like say, your twice as likely to die in a car accident at 50MPH, then 40MPH.

      Second, the author sounds like a mathematician, and somewhat of a scientist, and he has a mathematical interest in the filtering of SPAM. It's just as mathematical as using markov chains to model queuing problems to measure how long you'll have to stand in line at the checkout counter. To him, it's an interesting mathematical problem, which in a lot of ways, means that for him personally SPAM classification and the comparison of SPAM classification techniques IS science.

      Finally, the results the author is referring to, are due to be published in a peer reviewed journal if I understood it correctly. So in a very technical sense, it is in fact being published for scientific review.

      I think a lot of his issue is that you can't use the results of that paper to draw any useful conclusions for yourself if you aren't in a similar situation. As an example, I can get about 18 gallons to the mile in my F150, even though it's only rated for 13/15 city/highway. I manage that by setting the cruise control at the speed right after I switch into the 5th gear, turning off the A/C, driving on predominately flat roads, buying the highest rated fuel, and not stopping for any reason other then purchasing gas. So I could publish a paper saying that a F150 can easily get 18 miles to the gallon. However, that's incredibly useless to anyone who doesn't realize the conditions they have to drive in. His argument is that, the paper doesn't represent the results anyone else would get.

      Kirby

  3. Re:Architect is not a verb. by pclminion · · Score: 2, Insightful
    I hope you're proud of your anal retentiveness.

    Haven't you ever Googled something? Haven't you ever input data into a computer? (The use of the word input as a verb is, of course, the result of verbing, and it's now considered acceptable usage.) In recent years it has become common in English to "verb" nouns. In fact, I just did it. English, like any other language, evolves over time.

    To deny this fact makes you just another prescriptivist language maven, completely disconnected from reality and any sense of the advancement of human language.

    Folks, don't listen to this dinosaur. He's not insightful, he's simply living in the past.

  4. Special Pleading by Lulu+of+the+Lotus-Ea · · Score: 1, Insightful

    There's really very little to be said in favor of Jonathan A. Zdziarski's "defense". I guess it just amounts to him wanting to sell his product. Of course, I remember when CRM114 first came out, it was subject to some very dubious--or often simply incoherent--claims. It's pretty clear Zdziarski is in quite a bit over his head... not quite as bad as the amateurs who discover their own "breakthrough" encryption techniques, but tending in the same direction.

    As near as I can tell (I skimmed, admittedly, I didn't read every word carefully), his defense amounts to "please don't test the different filters because..." Fill in what feature of the test MUST not be the same as the CRM114 users who get 99.95% accuracy. This is precisely the meaning of "special pleading" in rhetoric. Also the same argument about "if only he had tried the latest-and-greatest (even though we made our wild claims before that version came out, too)."

    Cormack &alia make a reasonable best effort to test several tools; and as with any test, they make certain assumptions, and choose certain methodologies. Frankly, I find that a lot more useful that "just trust us, ours works best...but we can't quantify what 'works' means."

    FWIW, I wrote an empirical study of different spam filters, way back shortly after the Paul Graham buzz:

    Spam Filtering Techniques: Six approaches to eliminating unwanted e-mail.

    I know my study is based on quite old tool versions by now. But AFAIK, it's one of the few that actually came at the comparisons from an unbiased viewpoint. Most figures are based on the "experiences" of the strongest proponents of a given tool (or occasionally from a strong detractor). I had/have no agenda for or against any particular tool, I was just curious.

    1. Re:Special Pleading by Anonymous Coward · · Score: 1, Insightful

      There's really very little to be said in favor of Jonathan A. Zdziarski's "defense". I guess it just amounts to him wanting to sell his product. Of course, I remember when CRM114 first came out, it was subject to some very dubious--or often simply incoherent--claims. It's pretty clear Zdziarski is in quite a bit over his head... not quite as bad as the amateurs who discover their own "breakthrough" encryption techniques, but tending in the same direction.

      Well, his personal attacks were out of place, but his paper still has merit.

      As near as I can tell (I skimmed, admittedly, I didn't read every word carefully), his defense amounts to "please don't test the different filters because..." Fill in what feature of the test MUST not be the same as the CRM114 users who get 99.95% accuracy. This is precisely the meaning of "special pleading" in rhetoric. Also the same argument about "if only he had tried the latest-and-greatest (even though we made our wild claims before that version came out, too)."

      That he got results which are lower than hackers who tweak their filters is not surprising. But what is surprising is that he got results which are not characteristic of the filters, eg biased false positives in CRM114. This is something that basically nobody gets, and indicates that he may have used it wrong, eg by flooding the .css files with too many messages (as the documentation specifically tells you not to do).

      Zdziarski also points out false claims ("DSPAM doesn't support train everything" when it is in fact the default, etc) which indicate that Cormack didn't RTFM. As for the "latest and greatest," he's comparing wild claims about DSPAM 3.0 to results on 2.8... certainly that's not fair.

      The most damning point was the use of SpamAssassin: Cormack didn't classify the messages by hand (there were 49,000 after all), but instead used SpamAssassin to set up his test. When SpamAssassin is acting as a judge, is it surprising that it should win? Surely errors that the two versions made would thend to overlap, thus counting in favor of SA and against filters which had classified the mail correctly. This could explain CRM114's apparent bias towards false positives, if many of those were spams that SA did not detect.

  5. Re:Architect is not a verb. by corporatemutantninja · · Score: 2, Insightful

    Well said. HOWEVER, I have to agree with the poster who pointed out that using "architect" as a verb in the context of writing is a little out of place. If we're going to help the language grow, let's at least do so in useful ways. "Architect a solution to an engineering problem", sure, "architect a whiny, defensive rebuttal", no. If we're going to make it a verb let's at least have it relate somewhat to the noun.

    --
    Actually, I was trying to be Insightful, not Funny.
  6. Re:????? Did you even... by calebb · · Score: 2, Insightful

    RTA?

    Read the article, then post!

    There's really very little to be said in favor of Jonathan A. Zdiarski's "defence?"
    Now, I could start posting how ignorant that statement is, but then I'd just be rewriting Zdiarski's article. Cormack's entire test was flawed - He used SpamAssassin (95% accuracy) to create his 'ham' corpus. He used software versions that were 6+ months old. Even the email address he used for testing is incredibly unique and atypical! (He uses an address that he's had for 20+ years; One that has been posted all over the WWW numerous times. An address that has many forwarders pointing to it. How is that typical in any way??)

    Ok, go read the article (don't just 'skim' it, as you mentioned), then come back and tell me why you believe he is only trying to 'sell' his product.

    Please back up your claims with some evidence this time ;-)

  7. What is typical by Anonymous Coward · · Score: 4, Insightful
    Due to X's extremely high volume of traffic and the fact that X's email addresses were available to harvest bots on the Web and in newsgroups for 20 years, it is no surprise that X has an abnormally high spam ratio, 81.6%.


    I'm not happy about this, first he says that this account has a abnormally high spam ratio and then says that a normal user can have 60%. Where do we get these figures from I would like to know as my average is pushing up against 100%. I don't think that there is such as thing as an average user, some people seem to get nearly no spam and the rest of us get almost complete spam.

    Reviewing todays inbox reveals around 200 emails, of which 8 were legit. You do the maths, I would be making progress if it was only 81%.
  8. To cut through the spam by NigelJohnstone · · Score: 4, Insightful

    Oh boy he goes on and on, if ever you wanted to cut out the spam in an article...

    His main points (at least the ones I agreed with):

    1. No training period, many features only turn on after lots of real emails have been processed. Fair enough.

    2. No purge window, stale emails get purged over time (e.g. 4 months), but in a test everthing is shoved through at once (in minutes) and so nothing gets purged. Again fair.

    The rest of it complains about the tester, or complains that it was less than ideal conditions & settings for the particular filter.
    We call that 'the real world' here.

    Sys admins are not experts in configuring filters.

    Also he should realise that any new filter gets a better rating than the dominant filter. Spammers try to defeat the most popular filter of the day. So sure a new filter might perform better than an existing one *initially* simply because the spammers are targetting it. Until it becomes dominant and then the spammers adjust the spam to defeat the new dominant filter.

    So in the real world the data set will always be unusual because the spammers make it that way.

  9. Re:Excellent review by mev · · Score: 2, Insightful

    Unfortunately it seems like the author is too intent on slamming Cormack for his review to fit my description of an "Excellent Review". I wish he had toned this down as he could still have delivered the same technical message in a more credible fashion.

    "Excellent counterattack" might be more fitting.

  10. Re:And to that... by calebb · · Score: 3, Insightful

    "You mean like any other normal person who might be wanting to use such a product?"

    And to that, I would say... Someone writing an article for publication in a peer-reviewed journal should become experienced in their area of research before attempting to publish their results!

    For example, I'm sure you don't have much experience with Nuclear Magnetic Resonance imaging - And you might or might not have experience with X11 forwarding. But unless you are fluent with both of those topics, I would not expect you to attempt to publish a paper in a peer-reviewed journal discussing those topics!
    (Like I did, last December)

    However, for the sake of presenting some evidence to back up what I'm saying here, I'll take your example of Consumer Reports.

    From their site: CR has the most comprehensive auto-test program and reliability survey data of any U.S. publication; its auto experts have decades of experience in driving, testing, and reporting on cars.

    ...nevermind, I don't need to say anything else.

  11. Re:architect by psykocrime · · Score: 2, Insightful

    For the love of Cthulu, people, "architect" is a noun, not a verb.

    Languages are dynamic, not static. If enough people begin to use 'architect' as a verb, then it is a verb. I have a strong hunch that 20 years from now, the verb form of architect will appear in Merriam-Webster...

    --
    // TODO: Insert Cool Sig
  12. Re:False positives. by Donny+Smith · · Score: 2, Insightful

    Exactly - what's the point if you have to re-check it anway?
    That is the main reason I don't use any spam filters.

    Without a filter I can check emails as they come rather than create myself a "homework" of having to check 50 messages at once...

  13. why can't we all just get along? by Anonymous Coward · · Score: 1, Insightful

    Irritation is a perfectly reasonable reaction. It is not, however, constructive to vent the irritation in response.

    It is not my desire to flame the test or the tester, but...
    Somehow came not long after this:
    Many misled CS students, Ph.Ds, and professionals have jumped on the spam filtering bandwagon with the uncontrollable urge to perform misguided tests in order to grab a piece of the interest surrounding this area of technology.


    Something I learned from girlfriend #4: validate feelings. Yes, the Nuclear Elephant was hurt. He's right to be hurt. But no, lashing out is not adult, it is not constructive.

    To characterize other researchers as ignorant, wagon-jumping glory hounds with poor self-control does not encourage cooperation.
  14. Re:Constructing arguments by bourne · · Score: 2, Insightful

    The ultimate point where I lost patience was where he claimed that the results were invalid because they didn't conform to accepted, real world knowledge. The study was empirical; it shows something, based on how it was set up; and what it shows is valuable.

    But without knowing how the test was set up, how can you trust the test's so-called empirical results?

    In medicine, research results aren't generally trusted unless 1) the study was sound, e.g., double-blind and 2) a separate team has recreated equivalent results using the published methodology. If, as Zdziarski says, Cormack is not making his config files available, then that alone should be a reason not to blindly accept the study's results. The methodology is unknown.

    I can see not publishing the mail messages - in medicine, for example, you don't want to re-use the same test subjects from the first test, so there's no point to it as well as the privacy issue - but the config files? What possible reason could there be for not making them available?

  15. Crap writing by fuzzy12345 · · Score: 2, Insightful
    I was turned off as soon as I hit that word "architect" being used as a verb. After our hero "architected" his response, did he assign the task of actually writing it to someone else? Nooo.

    English does evolve, and good writers sometimes repurpose words to great effect. Alas, judging by the rest of the reviews here, our hero is NOT a good writer -- having built a shoddy and ramshackle outhouse, he proudly crowns himself the architect of it.

    As for all those people who shout "prescriptive grammarian!", I often suspect they're just too lazy to learn to write well, and have decided that claiming that rules are passe is an effective workaround.

    --

    Everybody's a libertarian 'till their neighbour's becomes a crack house.
  16. CRM114 is impossible to get installed by Anonymous Coward · · Score: 3, Insightful

    I remember going through the CRM114 installation docs, and vividly remember the 20 or so steps that I had to go through, and after about 3 or 4 hours of trying to get it installed, I finally gave up. I think part of the goal of software design is to make your software so that people will be able to quickly install and use it. The author of this program lost sight of this important point. I'm not going to sit there and reverse engineer some esoteric codebase just to get it working, and I'm sure alot of other people feel the same way. Therefore, I use SpamAssassin among other things, and it works really well and was quick and relatively painless to get working. I didn't have to go through their source code to figure out how to get it installed.

  17. Cormack got Pwnt. by Anonymous Coward · · Score: 1, Insightful

    The Article was necessary. It comes down to this glaring fact:
    ".... If you use a tool that is only 95% accurate to prepare a test for tools that are 99.5% accurate, then the lesser tool will appear to outperform the better tools whenever the better tools are correct. ...."

  18. Re:Hello? by killjoe · · Score: 2, Insightful

    "He cannot scratch his itch because he cannot reach it."

    You don't have to be a developer. As I said you can start a campaign to ask for donations, you can write letters to companies asking for sponsorship, you can donate some of your own money, you can try to get like minded individuals together to solve the problem.

    OPEN SOURCE DOES NOT WORK UNLESS YOU CONTRIBUTE.

    " Rather, you should acknowledge that the area is weak and that more focus needs to be given there in the future."

    More focus needs to be given by who? Are you saying I should grab random programmers off the street and yell at them until they write a GIS program for me?

    --
    evil is as evil does
  19. RBL (black lists) do not help with zombie systems by wintermute42 · · Score: 2, Insightful

    I have noticed that black lists are indeed effective. Many spammers now use "bullet proof" spam hosts, so they use static domain names. However, there has been an marked rise in zombie systems sending spams. These are systems that are infected by viruses and then used as spam hosts. Since these systems come on line rapidly (when they are infected) and then drop out (when they are cleared of the virus or booted off their ISP) it seems unlikely that black lists will help.

    At least in the spam stream I see, there is more than 1-2 percent of the spam flow from zombies. The best technique seems to be to use a black list first and then content filter.

    An a related topic in the parent post:

    In a previous post, in another discussion, I also suggested that the sophistication of spam filters like SpamAssassin, which use several algorithms to filter spam, would consume lots of system resources. Another poster wrote that these tools do not consume much in the way of processor and memory resources. This seems counter intuitive, but I don't have any contrary evidence.

  20. Re:False positives. by SpaceLifeForm · · Score: 2, Insightful

    No, you can scan your spam folder in seconds, because you will recognise the subject lines. The duration is not comparable. When you have a folder for spam, any non-spam sticks out, but if you need to think looking at alternating spam and non-spam messages, you spend more time thinking.

    --
    You are being MICROattacked, from various angles, in a SOFT manner.
  21. Re:Cormack and Lynam re Zdziarski's factual errors by EatAtJoes · · Score: 2, Insightful

    While obviously Cormack and Lynam are central to this discussion, it's depressing that this is +4, Informative when instead they obviously resent any serious questioning of their work. Is there a '-1, Wussy' moderation?

    "We shall not respond" -- huh? Pull the log out of your ass guys. Like it or not, he's got legitimate beefs with your study. What's more, he's got cred: dude puts SERIOUS effort into GPL'd software that helps people, so his input is relevant and valid. Get over it.

    Besides, his questioning of your credibility are neither 'ad-hominem' or irrelevant. Claiming that it is betrays a decidedly unprofessional sensitivity to criticism. as he points out, it is more than legitimate to question the credentials of the tester when interpreting results -- UNLESS the test has been repeated. 'Ad-hominem' attacks means irrelevant insults, whereas he's merely questioning your approach and relevant experience. don't go public with your stuff if you don't like the heat.

    How about instead, you address his most damaging points:

    - put all of your configuration data and any other information required to re-run the test online, immediately. there is absolutely no reason to resist this. you might want to explain why you haven't already.

    - your errata is so far entirely due to his corrections. professional class would merit gratitude for his attention. try it on for size. after all this is supposed to be a *review* period yes?

    - he directly questions the use of human error-checking. is he right? wrong? i don't know but it's a damn interesting question, and one your response does not address.

    - finally, what's up with saying you won't respond ... and then RESPONDING, and using his work in your errata?

    there are more problems here but you get the gist. you guys get paid to do this so do it right.