Slashdot Mirror


MIT Spam Conference Conclusions

RT Alec writes "The 2003 Spam Conference has concluded, reports InfoWorld. (related read: abstracts of the conference discussions). I was unable to attend the conference, but it appears all that was discussed was filters (client and server). I think the key problem is ISPs that do not block egress traffic on port 25. If you need to send mail through a different SMTP server than provided by your ISP, the admin of that server ought to provide you with a means of using it with authentication on a port other than 25 (you do have permission to use that SMTP server, don't you?). It is not too tough to set up an SMTP server to require authentication, or at a minimum to run off a different port. I am suprised that this is never mentioned as a cure for spam. If just AOL blocked port 25, this could reduce spam by 50% (I base this figure on close examination of the headers of the spam I receive). I was pleased to see that Barry Shein, president of The World (a Boston based ISP) was included in the talks. I am not sure by the abstract (see link above) posted if he mentioned blocking port 25. In a recent interview he did not mention it."

16 of 373 comments (clear)

  1. Spamming vs. sending legit mail. by autopr0n · · Score: 4, Interesting

    but what if people want to run their own mail servers? For their own domains?

    Are you saying that if I want to run my own mail server, I should get in touch with the mail admins of every single mail server of everyone I might ever want to send an email too so that I can send it on another port?

    That's ridiculous. I shouldn't need to subsidize MX providers.

    Otoh, a good solution might be traffic shaping, or even a sort of intelligent traffic shaper that limits the number of actual emails per day.

    Personally, I think SMTP is just obsolete. Schlepping anti-spam mesures onto it is like trying to put copy protection on CDs. It's just not going to work. What we need to do move to new protocols. Ideally two separate ones. one for personal mail, and one for commercial/bulk mail. The personal system would make it difficult to send out tons of mail, but easy to get into people's boxes, while the commercial system would make it hard to get into the box (i.e. you need to be pre-authorized) but, by definition, you could send out as much as you want.

    Digital certificates and encryption would be helpfull, for one thing

    --
    autopr0n is like, down and stuff.
  2. Oh please don't do that. by rknop · · Score: 4, Interesting

    Please don't promote blocking port 25, whatever happens. That would be very annoying.

    I'm already annoyed at being collatoral damage in the war against SPAM. I use mutt as my e-mail MUA, which is not an MTA and doesn't support use of an SMTP server. No problem; use sendmail or exim on my macine to actually *send* the mail. Except that I find out that some of my mail is bouncing, because my cable modem is in a blacklisted range (the range that includes "all cable modems"), and therefore being rejected by some SPAM filters. I don't run an open relay, I'm just using a program to send mail from my computer in the way that it is designed.

    Very annoying.

    So I have to configure my MTA to forward to a gateway SMTP server which won't be on the various RBL lists. A pain, but fine, I can do that. I've managed to get that set up... but I'm not using Comcast's SMTP server. Maybe I should, but after briefly using @Home's mail services, I've leanred simply not to trust the cable modem ISP services for anything. I've got web hosting outfits I pay for, so I can use those SMTP servers, configuring my exim to forward to them and use SMTP AUTH. But if Comcast starts blocking port 25, then *that* won't work, and I'll be stuck again. (And, of course, "getting another ISP" isn't an option, because where I live, the cable company's got a monopoly as far as broadband access goes. I *do* have another ISP I pay for for things like news and mail, on top of the cable modem. But, unlike where I used to live, I don't have the option of going with DSL and choosing the ISP to use with it.)

    Let's please not put forward this idea. There's enough collatoral damage as it is. And it won't really cut back on the spam, either. It's very very fuzzy logic to assert that since 50% of the spam now comes from AOL customers, that shutting that down would cut spam by 50%. The spammers out there will just find other places to spam. Going after the spammers themselves, and not just some of the tools they use, is the only way to stop spamming. Anything else only temporarily inconveniences them, and meanwhile greatly inconveniences innocents.

    -Rob

  3. Tarpit! by Checkered+Daemon · · Score: 5, Interesting

    Theo deRaadt of OpenBSD fame has put together a nasty little spamd, a daemon that attempts to tie up a spammer's resources. Basically, it slows down connection attempts and then sends a temporary error code back, sticking the spam in the mailqueue and letting the spammer try again, and again, and again. Designed to use up as few of your resources and as many of the spammer's as possible.

    Excellent description of how to use it with your own self generated blacklist at http://www.benzedrine.cx/relaydb.html.

    Unfortunately, it's only on OpenBSD so far. Can some one please port this to Linux by tomorrow?

  4. Barry Shein's modest proposal. by Xthlc · · Score: 4, Interesting
    Barry gave a tremendously entertaining (if disorganized) talk. His main points were:

    1. Spam is a stupid, boring problem that smart people shouldn't have to think about. "Why should some of the best minds in computing be forced to have a conference about this stuff?"
    2. The arms race between spammers and anti-spammers is going to get much worse before it gets better. We can come up with all kinds of cool technology to block spam, but spammers have a very direct financial incentive to dodge that technology in increasingly innovative ways.
    3. The only feasible, permanent solution will be a fix at the social and economic level, not technological.

    Barry's proposal for that last point was a fundamental change in the economics of spam, as follows:
    1. Create a coalition of ISPs with the will to implement and enforce these changes.
    2. Legitimize spam by selling "spam accounts" (with unlimited email quotas, etc) as a premium service.
    3. Create a system where ISP A can invoice ISP B for excessive load on the ISP A's system due to spam sent from ISP B.
    4. ISP B passes the cost on to their customer (if he's a legit spammer) or sics the law on him for theft of services (if he's not).

    Basically, it boiled down to "Spam is currently in a gray area legally, so let's legitimize spam in order to divide the spammers into legal spammers (who pay handsomely for the privilege) and illegal spammers (who do hard time, just like people who cheat a utility company).

    Challenging proposal, and great fun to hear him speak.
  5. I don't. by Mustang+Matt · · Score: 2, Interesting

    There are potential customers using AOL. A significant percentage of my existing client base either is using or have used AOL since before they became a client.

    I really don't like the idea of ISPs blocking ports. That should be the responsibility of the end user.

    Instead of blocking ports why don't they force users to sign an agreement that they won't send spam and if they do they'll pay each recipient $50/incident.

    Then if a bonehead sends spam they can go after them and enforce their TOS. I believe AOL requires a valid credit card number to even do the free trials, but I'm just guessing.

    --
    The man who trades freedom for security does not deserve nor will he ever receive either. - Benjamin Franklin
  6. Active Spam Killer / TMDA not mentioned by hazzzard · · Score: 3, Interesting

    It's interesting to see that the talks focused on heuristics exclusively. The main problem with all of these techniques is that they may classify legitimate email as spam as well.

    Since two months, I've been using the Active Spam Killer (ASK) now, and this has been mostly successful. In short: If a person writes me an email, they will have to confirm the mail, unless they are on my whitelist or the email contains a magic key (which is included in my sig and will thus be included in a reply). Confirmation also places a person on the whitelist, automatically. Since most spammers forge the From: address, they are not able to confirm their mail, even if they wanted... -> Pretty much no spam (dropped from approx. 20-30 spam-messages per day to 1-3 per week). Sure, if you order a book at amazon, their computer might not confirm. Thus I look into the confirmation queue from time to time whether anything in there is legitimate. Thus far it has not yet occurred that a person would not confirm his/her email, by the way. ASK is well documented, written in python and easy to setup.

    There is another similar system (which I haven't checked out): TMDA.

    I am wondering why big corporations, universities, ISPs are not providing such a (preconfigured) system as an option in their email packages ...

    1. Re:Active Spam Killer / TMDA not mentioned by rkent · · Score: 2, Interesting

      It's interesting to see that the talks focused on heuristics exclusively.

      Most of them focused on statistical methods, primarily Bayesian ones, actually. And yes, sometimes even a well-trained Bayesian filter will result in a false positive sometimes.

      One presenter made an excellent point, though: you can easily say "I've never had a false positive" if you just don't filter very much. So, I'm glad your system hasn't been tagging your good messages as bad; how effective is it at getitng rid of the bad ones, though?

      Paul Graham's presentation revolved around a Bayesian algorithm he'd devised which put more weight on features in the headers, as opposed to the bodies, of email; he claimed something like 99.5% effectiveness with only something like 5 false positives in 4000 emails sorted.

      The really interesting part was the nature of the 3 false positives that he showed. Two of them were mailing lists that he "didn't care much about anymore," and the other was a note in all caps from a person in egypt requesting some info on one of Graham's academic projects. In other words, they all *did* resemble unsolicited mail.

  7. Anti-spam by DaveOnNet · · Score: 4, Interesting
    Has anyone heard of a system like this:
    Your email provider delivers an email to you only if

    it has a "Reply-To" field in the header AND

    the Reply-To value has been accepted as a valid email address by another customer.

    So in order for a person that just created an email address to email you, they would have to get their new address validated first and would receive a message to that effect the first time they tried to email you. They would have to get in touch with you or someone else under your email provider to get validated.

    If you get some spam, you report it to your email provider and the ISP deals with the customer who validated the "Reply-To" address.

    Email providers would set up peering relationships wherein they can share validated email addresses.

    If the Reply-To value is faked, it would have to point to a validated email address and would probably bring severe damage to that email account. This method would push spammers into using this strategy, but it would certainly get them into more trouble that they currently get into.

    I'm sure there are holes in my idea, so shoot away and educate me.

    --
    Rank comments and posts against each other at We-Rank.com
  8. Naive Bayesians probably don't work in long run by WolfWithoutAClause · · Score: 2, Interesting
    I've been running one for a while; I'm getting about 90% successful blocking, and I've practically never seen a mail item I seriously wanted be flagged in a few thousand messages perhaps. But there are some limitations:

    a) short messages don't get caught- no words that are going to be blocked, just a URL. The URL doesn't match because it's several words stuck together without spaces.

    b) misspelt words don't get caught. If the spammer deliberately misspells the key words, then it goes through.

    c) common words- if the spammer only uses common words, it is unlikely that the spam can get caught; the spammer can check all the words he uses for being common before he sends it.

    d) pictures- if the spammer sends his advert in a GIF, the Naive Bayesian can do nothing.

    Overall, I am pessimistic about whether filtering will work in the long run, but in the short run it works pretty good.

    --

    -WolfWithoutAClause

    "Gravity is only a theory, not a fact!"
  9. Something *slightly* different by nsayer · · Score: 2, Interesting

    I used to run a tiny ISP. What I did was *redirect* traffic outbound to port 25 to a local mail server. The mail would still be delivered, and that server was (obviously) set up to allow 3rd party relay from the correct set of addresses. I had a small customer base, but I never once had any complaints about this policy. The users could forge the From: header all they wanted, but the outgoing mail would always have a proper Received: header, at least.

    As long as the mail server doesn't do anything more agregious to the mail than add a Received: header, I find it unlikely that any legitimate complaints could be made about this practice. It's certainly a much more gentle answer than simply blocking port 25 egress completely. At least this way it's more or less invisible to the end-user.

  10. You can't fix all ISPs, but their users can. by The+Panther! · · Score: 2, Interesting

    The problem with changing SMTP is that it's well-established and generally a good protocol. The problem with changing the default configuration for installation is it only affects new installations. Basically anything you propose which requires changes on the server, requires operators to agree. No strategy as such will work, unless operators are not given a choice, because their customers demand the upgrade.

    I'd propose a slight change to SMTP servers so that they automatically block incoming mail from other servers that act as an open relay. It would not discriminate against open relays when sending mail, however.

    What this does is effectively drops all users of open relays off the map. Once enough servers out there start doing this, all the open relays start getting fixed, because their users demand mail to stop bouncing. Open relay spam ceases to annoy everybody behind a protected server immediately, however, and you don't really care when or if those servers get fixed.

    This isn't going to fix the general spam problem, where valid addresses are used for spam, but at least you can block domains that annoy you.

    But the truth is, spam will never calm down until every unsolicited/untrusted message costs a nominal sum, which curteous people return in the form of a reply from valid messages.

    --
    Any connection between your reality and mine is purely coincidental.
  11. New mail protocols needed by Fastball · · Score: 2, Interesting
    I've avoided the spam debates until now, because I haven't had a solution for the problem. But nobody else has offered much of substance either. So here's my humble opinion...

    Legislation is not the answer. We know how tech-savvy politicians are. Do laws stop corrupt CEOs from plundering corporate pensions or cooking the books? Do laws solve problems?

    Terrorizing spammers is not the answer. Again, this is not solving the problem. Pestering less than intelligent people who exploit less than intelligent methods of mass communication does not solve the problem. It might be a thrill short term, but there are too many people who will spam if the current mail protocols persist.

    So what is the problem? Strangers send me e-mail I don't want. What is the solution?

    I won't pretend to be an expert. I'm not. However, I'm surprised better men and women have not come up with something, ANYTHING, to solve the spam problem. I am NOT suprised to see 90-100 unsolicited e-mails (from strangers) in my inbox every day. Somebody needs to come up with something. So here goes...

    First, classify e-mail accounts. Home/personal accounts should be bulletproof. You only receive messages from people you have on your list of acceptable senders, your "inner circle." Shopping/e-commerce accounts: you can receive messages from merchants who register with some central agency/server. Business/work accounts: I dunno. Ideas? How should we handle mailing list type accounts? Second, every e-mail sent has something solid identifying it with a sender included. The identification is sent to the recipient. If the recipient has this identification in his list and it matches 100%, then the recipient fetches the message from the sender. So instead of the sender wielding the power, the potential recipient makes the call. Why allow just anybody to send an entire friggin' message to scores of people? Messages go no where until the recipient says so.

    Finally, and this is where the law comes into play, if someone manages to fake out your list by saying he is someone he is not, sic the prosecutors on him. That's identity theft, pal. As it is now, e-mail headers are raw schitzophrenia.

    So step one, classify e-mail accounts. Different classifications have different list of people you are willing to accept mail from. Step two, the sender sends his identification and maybe a subject header to the recipient. Step three, the recipient accepts the senders request and fetches the message himself, rejects it outright, or adds the sender to his list and fetches the message.

    I don't know 90 people whose mugs I'd piss on if they set themselves on fire. Why should any of these rat bastards be able to dump a second or third bit in my inbox?

  12. Web of Trust by dracocat · · Score: 2, Interesting

    My guess is one day we'll see a web of trust used by our e-mail client to determine whether our e-mail gets delivered to our inbox or junk-mail folder.

    Someone using a signature for spam would see himself removed from the web of trust, and those that verified the person as a non-spammer.

    Just don't ask me how somebody that doesn't know anybody else with an e-mail account gets somebody else to vouch for him. (Maybe your ISP will vouch for you if you verify yourself with a CC or something?). Any thoughts?

  13. Temporary rejection - but only temporarily by waynemcdougall · · Score: 5, Interesting

    Somewhat related is this approach I've been trialing quite successfully for the last month. I haven't been able to find any reference to anyone else doing this, and would welcome any comments. If it's a 'new site' (not dealt with regualrly and not seen recently) and it shows up clean on the variosu DNSBL's I use, then I send a temporary error code back. If they retur (after a suitable time delay - I use 15 minutes) and still come up clean, then I let it through. Advantages: * many spammers don't retry - ever (perhaps they get shut down, or someone closes their open relay, or they concentrate on more receptive targets) * those that do retry (often many hours later - average is 7.6 hours for spammers) are usually listed on the DNSBL's by then * I get to collect the list of mail addresses they are trying to send, and if they hit one of my spam traps (and there are many obvious dictionary attacks) then they immediately get marked bad even if they are not DNSBL'd * Doesn't waste bandwidth (or the hijacked resources of a open relay 'victim') which continually using a tar pit does Disadvantage * Genuine email from a new/infrequent source gets delayed 15 + (until their servers retry) minutes. Most geuine ISPs try at reasonable intervals - though some wait an hour. I'm willing to wait an hour for mail from someone new, who's not on my whitelist, given the amount of spam this simple technique filters. Obviously if everyone adopts this approach then spammers would deliberately work around it - but it would complicate matters for them - the time delay and reptetive nature of their attempts would make them even more obvious as spammers, and more easy to shut down. And they can't avoid the spam traps. Forgive me if this is obvious and well known - I'd appreciate any pointers to where this has been applied and any comments.

    --
    Recycle PCs and build a wireless community network www.hillsborough.org.nz
  14. My notes for the proceedings (very long post!) by babbage · · Score: 5, Interesting
    I was waiting for the review to show up on Slashdot, as the conference was really good. The audio proceedings have been put online, but I'm not sure if they can take a Slashdotting, so please be gentle :) If you have 8 hours to spare, the whole day was pretty good & worth listening to, but the schedule as planned isn't exactly the sequence people spoke in, so you may have to jump around the RealAudio stream a little bit.

    Turning my notes for the day into something vaguely coherent, here are some hightlights from the proceedings. There are a couple of speakers that I didn't write anything down for, but from mid-morning on this should be pretty comprehensive. Apologies in advance if my notes lead me to attribute certain comments to the wrong speaker -- if anyone notices any mistakes please feel free to add corrections:

    • Bill Yerazunis - CRM114 & MailFilter

      Because Perl "freaks him out", Yerazunis came up with the CRM114 minilanguage (points for anyone that gets the joke in the name without googling for it :), then wrote MailFilter in CRM114 as an implementation of a filter that can be used with Procmail or SpamAssassin or what have you. The basic idea is to decompose a message into a set of "features" composed of various permutations of single words, consecutive words, words appearing within a certain distance of one another, etc, such that the set of features N is very much bigger than the set of words X. You then analyze the features in various ways and if you get above a certain arbitrary threshold, you flag the message as spam & handle it accordingly.

      He claimed that with this software he could get better than 99.9% accuracy in nailing spam, and a similar percentage in avoiding "ham" (the term everyone was using for false positives -- legit mail that was falsely identified as spam). One of Yerazunis' observations is that the best way to defeat the spam problem is to disrupt the economics: if a 99.9% or better filter rate were to become the norm, then the cost of delivering spam can be pushed higher than the cost of traditional mail and the problem will naturally go away without requiring legislation (which would be nice anyway, but we can't count on it).

      The drawback of CRM114/MailFilter is that it can only handle about 20k of text per second, so it's not appropriate for large scale use yet. Still an interesting project to watch though: crm114.sourceforge.net

    • John Graham-Cumming - POPfile

      Most of his very entertaining talk was about the ingenious tricks that spammers resort to to obfuscate spam against filters, including most diabolically one example that placed each column of monospace text in the message into an HTML column, so that the average HTML-capable mail client would render the message properly, but it would be absolute gibberish to most mail filters. The ultimate lesson was that any good filter has to focus not on "ascii-space" (the literal bytes as transmitted) but the "eye space" (the rendered text as seen by the user), which by extension may mean that any full scale spam parser/filter could also have to include a full-scale HTML & Javascript engine. Yikes!

      As for Graham-Cumming's software, it's a Perl application, available for all platforms (Windows, Mac, & of course Linux) that allows users to filter POP3 mail. Interesting stuff if you're a POP user: popfile.sourceforge.net

    • John Draper - ShopIP

      Most of Draper's work seemed to be focused on profiling spammers, as opposed to profiling spam itself, by throwing out a series of honeypot addresses & using data collected to hunt down spammers. spambayes.sourceforge.net

    • Paul Judge, CipherTrust

      Judge's big argument, which no one really disagrees with, is that spam has become not just a nuisance, but an actual information security issue. To that end, he is advocating much more collaborative effort to address the problem than we have seen to date: conferences like this, mailing list discussions, better tools, and public data repositories of known spam [and ham]. To that last point, one of his observations (which others made as well) was that there are no universally agreed on standards for what qualifies as spam, so repositories for spam will not be accurate for all users (spam for your programmers will be the bread & butter of your marketing department, etc). Plus, there are obvious privacy issues in publishing your spam & ham for public scrutiny. And to add another wrinkle, one danger of public spam/ham databases is that spammers can poison them with false data, screwing things up for everyone. That said, he encouraged users to help out with building spamarchive.org.

    • Paul Graham

      The man who organized the conference and kicked everything this week off with his landmark paper from last fall, A Plan for Spam. Graham's spam filtering technique famously makes use of Bayesian statistics, a technique popular with nearly all of the speakers. The nice thing about a statistical approach, as opposed to heuristics, simple phrase matching, RBLs, etc, is that they can be very robust & accurate; the down sides are that they have to be trained against a sufficiently large "corpus" of spam (most techniques have this property though) and they have to be continually retrained over time (again, this is common). Graham was too modest to produce numbers, but subjectively his results seemed to be even better than what Yerazunis gets with MailFilter, by an order of magnitude or more.

      Like other speakers, he predicted that spammers are going to make their messages appear more & more like "normal" mail, so we're always going to have to be persistent about this -- as one example, he showed us an email he received IN ALL CAPS from a non-English speaker asking for programming help, and although it was legit, the filters insisted otherwise. "That message is the one that keeps me up at night."

      Everyone interested in the spam issue should go read Graham's paper immediately.

    • Robert Rothe, eXpurgate

      Rothe works for Eleven, an ASP company from Berlin selling a spam management service/application called eXpurgate. His talk was short on details about how the tool worked (mainly that it searches for bulk mail), focusing instead on the high level functionality it provides to users -- basically, they classify mail as safe, questionable, or dangerous, and let the users handle them accordingly. Another speaker that sees spam as a network security issue, so they built their system accordingly, with privacy of the client's mail content in mind etc.

      Like many speakers, he warned about the dangers of an anti-spam "monoculture": that Bayesian techniques might be great, but if that's all anyone uses then spammers will catch on and adjust their messages to look more like normal mail, to the point that Bayesian filters won't work anymore. As a result, we're going to need to attack the problem from several angles, using different techniques, to keep the spammers off balance as much as possible.

    • Matt Sergeant, SpamAssassin

      SA is a well known Perl application for heuristically profiling messages as spam, adding headers to the message saying for example "I am 72% sure this is spam because it has X Y Z", and passing off the message to procmail or whatever to be handled accordingly. SpamAssassin can handle a message throughput great enough that it can be deployed at the network level (whereas some of the others, which might have somewhat better hit rates, are still too inefficient at this point). Deployed this way, the differences in effectiveness for single vs. multiple users becomes very apparent, as 99% effective rates fall down into the 95-80% range. This happens because, again, different users define different things as spam, so mapping one fingerprint to all users can never work quite right. For an example of a tool that your company can deploy right now & get fast, decent results, SA looks like a good choice; but for the long run it looks like a Bayesian technique is going to get better performance, and SA is adding a statistical component to its toolkit. Good talk.

    • Barry Warsaw, Python Labs

      This was another example of the "monocultures are dangerous" philosophy, as Warsaw explained how he is helping to use a variety of anti-spam techniques -- from clever Exim MTA configuration to good use of Spam Assassin & Procmail to fine tuning of the MailMan mailing list engine -- to work together to manage the spam problem for all things Python (Python.org, Zope, many mailing lists, a few employees, etc).

      He pointed out that some very simple filters can be surprisingly effective: run a sanity check on the message's date; look for obviously forged headers; make sure the recipients are legit; scan for missing Message-Id headers; etc. In response to the person that originally posted the article, yes, he did mention blocking outgoing SMTP as an effective element of a many tiered spam management approach.

      Among other tricks for getting the different filtering tiers to play nice together, they make heavy use of the X-Warning header so that if an alarm goes off in one tier of their mail architecture, other components can respond appropriately. Cited projects included ElSpy and SpamBayes.

    • Barry Shein, founder & CEO of The World -- or as he laughingly put it, "President of the World". Har har har

      This talk was mostly a let down for me -- Shein has made his views very well known, and his ranting, rambling talk didn't really introduce any new ideas for anyone that had read that interview (some good jokes & quotes though).

      His core argument is that spam is "the rise of organized crime on the internet", that filters are nice but that the mail architecture itself is fundamentally flawed, and that ISPs like his -- in 1989, The World was the world's first dialup ISP -- are being killed by the problem. Shein was very annoyed that all these talented people are having to clean up a mess like this when we should be out working on more interesting stuff, and not having to worry about this issue. His big hope seemed to be that legislation will someday come to the rescue, but he sounded very pessimisstic. (Others in the room seemed to feel that this was a very interesting machine learning problem, and weren't really fazed by his pessimism -- but then most of the people in the room don't run ISPs.)

      He also suggested that we need to find a way to make spammers pay for the bandwidth they are consuming (rather than having users & ISPs shoulder the burden) but didn't seem to know how we might go about implementing this. At all.

      Fun rant to cheer along to, but for me it wasn't very constructive in the end.

    • Jean-David Ruvini, eLabs SmartLook

      This was an interesting product. Ruvini's company is developing an extension to Outlook 2000 & XP that will watch the way users categorize messages into folders, come up with a profile for what kinds of messages end up in which folders, and then try to offer similar categorization on an automatic basis. Think of it as Procmail for Outlook, without having to mess with (or even be aware of!) all the nasty recipies.

      Obviously if you have a spam folder, then spam will be one of the categories it looks for, but more broadly it will try to categorize all your mail as you would ordinarily categorize it. This makes SmartLook a broader tool than "just" a spam manager.

      SmartLook is another statistical filter, though it uses non-Bayesian algorithms to get results. eLabs' tests suggest that the product is able to properly categorize messages about 96% of the time, with no false positives, and (for their tests, mind you) that it performed better than Bayes filters over three months of usage.

      One nice property of this tool was that it works well with different [human] languages -- some strategies fall apart &/or need retraining when you switch from English to some other language. For certain markets (eLabs seems to be a European company, perhaps French?) this is a crucial feature, and having a tool that works with one of the biggest mail clients out there (most people don't use Mutt or Pine, sadly enough) can be very valuable. Very clever -- watch for the inevitable embrace & extend three years from now.

    • Eric Raymond

      He didn't say anything about guns, but he did try to correct one of the other speakers for misusing the term "hacker."

      Like Graham, ESR is a Lisp fan, but he knows that the vast majority of people aren't, and he also knows that the vast majority of people need to be using something like Graham's spam software. So on a lark, he came up with a clean version in C, named it BogoFilter, and put it on Sourceforge, where a community sprung up to, well, embrace & extend it.

      As good as Graham's Bayesian algorithm is, ESR felt -- as did many of the other speakers -- that the nature of your spam/ham corpus is much more significant than the relative difference among any handful of reasonably good algorithms. (Back to the often repeated point about how corpus effectiveness falls apart when used for a group of users, as opposed to individuals.) To that end, he strongly feels that the best way to deal with the spam problem is to get good tools into the hands of as many people as possible, and to make them as easy to use as possible (ahh, the old "open source UIs always suck" argument :). As an example, one of the first things he did was to patch the Mutt mail agent so that it had two delete keys: one for general deletion, one for "get rid of this because it's spam." That second key, and interface touches like it, seem like the way to get average people to start using filters on a regular basis.

    • Joshua Goodman, Microsoft Research

      Unlike ESR, Goodman felt that algorithm selection does make a big difference, but this being Microsoft he refused to disclose what algorithms his team is working with -- except to say that, when delivered, they will be more accessible for average users than SpamAssassin, Procmail recipies, or Mutt :)

      Microsoft has been working on the spam problem since 1997, but because of how big they are they've had unique problems in bringing solutions to market. As a case in point, they tried to introduce spam filters to a 1999 Outlook Express release, but were immediately sued by email greeting card company Blue Mountain because their messages were being inaccurately categorized as spam. With that in mind, they have been very reluctant to bring new anti-spam software out since then because they would like to see legislation protecting "good faith spam prevention efforts."

      As a very large player, Microsoft faced certain difficulties in developing useful filters -- it may make sense for you as an individual to filter all mail from Korea, but this doesn't work so well if you are trying to attract customers *from* Korea :). This has forced them to put a lot of work into thoroughly testing different strategies before offering them to the public.

      In spite of what millions of webmail users may have expected, Hotmail & MSN are currently being filtered by Brightmail's service, and plans are underway to reintroduce spam management features to client side software again. (Just imagine how bad it would be if they weren't paying someone to filter for them! Unfortunately, no hecklers piped up to ask if they are really selling Hotmail's user database to spammers, and if that is a source of annoyance for his team.)

      An interesting barrier his group has had to grapple with was what he called the "Chinese menu" or "madlibs" spam generation strategy: that it's easy to come up with a template for spam -- "[a very special offer] [to make your penis bigger] [and please your special lady friend all night!" vs. "[an exclusive deal] [for genital enlargement] [that will boost your sex life!]" etc -- and have a small handful of options for each 'bucket' multiplying into a huge variety of individual messages that are easy for a human to group together but almost impossible for software to identify.

    • Michael Salib, extremely funny MIT student

      Unlike nearly all other filter writers of the day, Salib's approach was heuristic: find a handful of reasonable spam discriminators, throw them all against his mail, and see how much he can identify that way. "It's sketchy, but this is a class project. I don't have to be realistic. [...] These results may be completely wrong."

      Much to his surprise, he's trapping a lot of spam. He pulls in a little bit of RBL data ("the first two or three links from Google, whatever"), looks for some patterns and so on, and then churns it through LMMSE, an electrical engineering technique that as far as he can tell doesn't seem to be known in other fields. Basically this involves running the messages through a series of scary-but-fast-to-calculate linear equations). It turns out that he can process this much faster than a Bayes filter, to the point that customizing his approach for each user in a network would actually be feasible.

      For a small spam corpus, he got results better than SpamAssassin did, though for a large corpus his results were worse; he couldn't really account for why this would be the case, or predict how things would scale as the corpus continued to grow.

      When questioned about the RBL tactic by a member of the audience [who was apparently familiar to Salib -- I don't know who it was] about whether authenticating remote users might be the answer, Salib's response was "yes, I agree, but then you *do* work for Verisign, who is in the verification business, so you would say that."

      Right on, Salib -- his talk was easily the funniest & breezy of the day :)

    • David Lewis, general researcher

      The core of Lewis' argument, as ESR said earlier in the day, is that for any machine learning technique the quality of the learning corpus is much more important than the algorithm used. Bayes is one such algorithm, but there are many other good ones in the literature. In a dig at Goodman's refusal to disclose algorithms, Lewis pointed out that all of this has been publicly discussed since the first machine learning paper was published in 1961.

      Observations: "lots of task inspecific stuff works badly, but task specific stuff helps a lot." It is important to use different corpuses [corpi?] for training and for general use, so that you don't train your machine to focus too much on certain types of input (this is a point that Microsoft's Goodman made as well).

      As Graham did, Davis emphasized that spam is going to slowly start looking more like natural text, and we're going to have to deal with this as time goes on. www.daviddlewis.com/events/

    • Jon Praed, Internet Law Group

      To a burst of tremendous applause, this talk began with the sentence "my name is Jon Praed, and I sue spammers."

      He brought a legal take on the "not everything is spam to everybody" angle, emphasizing that we need a precise definition of what qualifies as Unsolicited Commercial Email (UCE). In particular, it has been difficult trying to pin down if the mail was really unsolicited, as this is where the spammers have the most wiggle room. However, if you can track down the spammer, they have to date rarely been able to verify that the user asked for mail, and so Praed has been able to successfully prosecute several spammers on this angle. He doesn't expect this to work forever though.

      According to Praed, "laws against spam exist in every state, and more are pending", but he doubts that a legal solution will ever be completely effective as long as spam is lucrative. By analogy, he pointed out that people still rob banks and that has never been legal.

      Praed informed the audience that there are several ways to get back at spammers, including injunctions, bankruptcy, and contempt, and all of these can be very effective. He pointed out that, to be blunt, a lot of these people are desperate low-lifes, and spam has been their biggest success in life. After these legal responses, their lives all get much worse. It hadn't occured to me to see spammers as pitiful before, but I can now. Most importantly, Praed stressed that these legal remedies can be very effective, and he strongly warned against taking vigilante action. This is almost always worse than the spam itself, and it only serves to get you in even deeper trouble than the spammer.

      Identifying the sources of spam, most comes from offshore spam houses, abuse of free mail accounts (Hotmail & Yahoo, free signups at ISPs, etc) and bulk software (which may apparently soon become illegal in certain areas, provided that a law can be found to ban spam software while allowing things like MailMan or MajorDomo). Interestingly, he questioned the idea that header spoofing is a big problem, and claimed that in every case he has dealt with he has been able to track down the messages to a legit source sooner or later.

      Suggestion: if you get a spam citing a trademarked product [e.g. Viagra], forward it to the trademark holder and they will almost always follow up on it. Suggestion: be fast in trying to track down spammers, as some of them have gotten in the habit of leaving sites up long enough for mail recipients to visit, but taking them down before investigators get a chance to take a look. Legal observation: spam is almost always fraud, and can be prosecuted accordingly.

      Praed wrapped up his talk by citing the encouraging precedent that the famous Verizon Online vs. Ralsky case set: [a] that the court is interested in where the harm occurs, not where the person doing harm was when causing it (so if you send spam to someone in Alaska and spam is a capital offence in Alaska, you can be tried as a citizen of that state even if you caused the harm from somewhere else), and [b] it is assumed that you have to be familiar with a remote ISPs acceptable usage policies, and ignorance is no defence (just as you can't say "I didn't know it was illegal to shoot someone", Ralsky couldn't say that he didn't know Verizon prohibits spam -- (he had to have known that the AUP wouldn't allow what he was doing, so he deliberately didn't read it)). That precedent makes future prosecution of spammers much more encouraging. While, again, legal solutions may never eliminate the spam problem, a precendent like this can be an important supplement to filtering efforts (the stick to the filter's carrot, or something -- my lousy analogy, not Praed's).

    • David Berlind, ZDNet executive editor

      His talk was primarily about how he receives a huge quantity of email from ZDNet readers, and he can't afford to use any spam filtering solution strategy that would allow *any* false positives. As one of the speakers said -- sorry, I forget who (Microsoft's Goodman?) -- getting a 0% false positive rate is easy: just classify nothing as spam. Getting a 100% hit rate is also easy: just classify everything as spam. Any solution besides those two is always going to have some degree of error either way, and determing how much of what kind of error you want to accept is up to you. Most users will tolerate a moderate false negative rate (some spam gets through) if it means that the false positive rate (legit mail is deleted) is very low. In Berlind's case, the false positive rate has to be vanishingly small, because reading all customer mail is a critical sign of respect for him.

      Further, his business is also a legitimate mass emailer, sending out millions of free newsletters to users every day, and if Shein's proposal to bill bulk mailers were to catch on then even a very low rate would quickly put his company in the red. One obvious solution, which wasn't mentioned: start charging a subscription for these mailings, and make them profitable. I don't want to see this happen but if it did then the economics would tilt back toward making things feasible again.

      Berlind is appreciative of the anti-spam work that is being done, but at the same time is skeptical of how pragmatic most of what is being proposed can really be. He feels we need a massive effort to rework the way mail is handled [Y2K anyone? It could get IT people back to work...], and to that end hopes ZDNet can help promote such a cooperative effort between the parties working on this. They don't want to be involved -- they are journalists & publishers, not standards developers -- but they are eager to get things going & want to cover the story as it progresses.

      Like Shein said, he feels it's a waste for all these talented people to be working on combating penis enlargement offers, and hopes that we can find a way to get past this and work on real problems, "like world peace." This comment got a chuckle from the audience, but he seemed like the kind of guy that really meant that, and more importantly, he was right. A smart guy like Paul Graham or Bill Yerazunis shouldn't have to waste time tinkering with how many Viagra offers he can automagically delete when there are more fun things to be doing.

    • Ken Schneider, Brightmail

      As mentioned earlier, Brightmail provides an ASP service for real time filtering of both incoming & outgoing mail. As would perhaps be expected, bigger ISPs and networks attract larger amounts of spam: 50% of mail coming into big ISPs and 40% coming into big companies is now spam. Brightmail offers the Probe Network, a <slashdot-killfile-term>patented</slashdot-killfil e-term> system of decoy honeypot addresses that gather data for analysis at their logistics center, which in turn distributes spam filtering rules to their clients where a plugin for $MTA (using the open source or proprietary MTA of the client's choice) can act on the database.

      An interesting property of their system is that they have a mechanism for both aging out dormant rules as well as for reactivating retired ones, so that the currently active ruleset can be kept as lean & effient as possible. A big source of difficulty for them is legitimate commercial opt-in lists, because things have gotten more shady & blurry over time and it's now hard to tell this mail from much of the spam out there. Whitelists help here, but the problem is still difficult.

    After each speaker had his turn, there was a panel discussion, but not much really happened there, and the moderator cut things short after only a couple of minutes. The original plan was for everyone to go out for Chinese food afterwards and continue the discussions over dinner, but when 580 people signed up that plan obviously fell apart. :) And so, here ends the notes...

  15. Oh sure, by sanermind · · Score: 2, Interesting

    let's encourage ISP's to destroy accessibility to an essential service on the internet, in a misbegotten attempt to lessen illegitimate access. I don't want my connection censored! I enjoy having home broadband and running my own little server on it. My sendmail is set up to disable relaying, it's not like it's hard, and that is the true solution to spam. Spammers will always find a service that allows them the access they need, but this idiotic talk of blocking/censoring vital services/protocals doesn't help the rest of us.

    BTW: Cause I run my own port 25 and have a static IP and a domain name, I get hardly any spam, personally. Why? Because I give out a different novel seperate address to everyone, and keep them all aliased to forward to my main account. If one becomes contaminated by spam, I simply delete it. If it actually was an address I gave to a correspondant [and not to some website, which is almost universally is] I only have to inform one person of a new address... come to think of it, that's only happened once...

    --

    ---
    the pen is mightier than the sword, the sword is mightier than the court, the court is mightier than the pen.