Slashdot Mirror


User: babbage

babbage's activity in the archive.

Stories
0
Comments
1,446
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 1,446

  1. Addenda on MIT Spam Conference Conclusions · · Score: 1
    The material above was originally posted as a comment on Slashdot, before being pasted into journal entries on Slashdot and use.perl.org. Each version of the writeup has attracted comments & emails, for which I thank you. A couple of corrections have come up, and I don't want the eventual archived versions of this not to reflect those contributions (hello, future Google spelunkers!), so here's a general cross-linked addendum:
    • http://use.perl.org/~babbage/journal/10069:

      Chrysflame posted detailed minutes for the proceedings, as pasted from Oliver Schmelzle's TechBlog.Readers may find it useful to cross-check my notes against his times when looking for talks they would like to listen to.

      Matt Sergeant politely replied as well, noting that the impressive claims about CRM114's accuracy were yet to be thoroughly tested, that in other tests CRM114 had not been significantly more accurate than other Bayesian strategies, and that the current performance of CRM114 is so much slower than many of the alternatives that any gains it may have to offer are more than offset by the low volume it can currently handle. Grain of salt taken :)

    • http://slashdot.org/~babbage/journal/21771/:

      No comments as of this writing.

    • http://slashdot.org/comments.pl?sid=51208&cid=5112 383:

      An anonymous coward added a couple of corrections which are worth noting:

      • Jon Praed was questioning IP spoofing, not message header spoofing. It is relatively easy to fake at least some of the headers on an email, but when tracked down & brought before a judge, no spammer has ever been able to explain a credible technique for spoofing IP data in any trial Praed was aware of. When this comment was made to the audience, ESR spoke up saying that he could show Praed how to do it, but I don't know what if anything came of any conversation they had after the talk.

      • The AC also expanded on Michael Salib's talk & how much mileage Salib was seeing out of a comically non-buzzword compliant filtering strategy, but came back to the point that his results were "probably unrepeatable and it would probably be best if we all just treated them as outright lies." As the AC noted, Salib seems to have played a big role in organizing the conference -- I think I read somewhere that when the attendee list swelled to 500+ people, he helped to find a last minute venue big enough to accomodate everyone. So not only do we have to thank Salib for an entertaining spiel of quackery, but also for bringing everyone together in the first place. :)

      I never said my notes were perfect :)

    • Emails sent to me directly:
      • Brad Spencer wrote to me asking if anyone had mentioned relay spam honeypots, citing http://jackpot.uk.net/ as an example, and claiming that they are "100% accurate and can be devastating.". Respectfully Brad, I'm not sure that the speakers gathered together last week would agree that any approach is "100% accurate" unless you have a very generous definition of "accurate" (as in, "delete everything as spam" is 100% accurate, but 100% useless :). More fairly though, Brad claims that "if you deal with spam at the relay level you can be dumb -- it is the spammers who are forced to be smart. If they make an incremental move towards being smart you move beyond them." I won't argue with that, it sounds like a fine idea. I suggest taking ideas like this to Barry Shein et al, who would probably love to discuss these ideas & implement anything that works well.

        In his email, Spencer went on to expand on the value of honeypots, and how they seem like a very promising tactic for handling the spam problem. I agree, and maybe my writeup didn't give this enough attention, but I think many or all of the conference speakers would have agreed as well. Ken Schneider made it clear that Brightmail in particular seems to make heavy use of honeypot addresses: it sounded like when they set up service for an organization, they plant one or more dummy addresses at that organization as data points for spam collection efforts, and have mechanisms in place to gather & analyze this data in real time. Spencer suggests that honeypot addresses would be very hard for spammers to detect if they resemble legit MTAs as much as possible, and I have the impression that this is exactly what Brightmail is doing. I'm sure that others are using tactics like this as well, but Schneider was the most vocal user of the tactic that I noticed.

      • John Hanna wrote to me saying that he runs an anti-spam project at http://assp.sf.net, and noticed a surge in traffic after the conference. To answer John's question, I did not notice anyone mentioning ASSP [caps?] during any of the talks, but it could well be that people were discussing it amongst themselves off stage. *shrug*

      • Ashley Pomeroy wrote to a mailing list where I posted my notes, asking:It may have been raised before, but does the specific use of 'ham' to mean 'good' and 'spam' to mean 'bad' leave all these good people open to abuse from the people who make Spam, the nutritious meat-based food?

        I assume that Spam(r) is cool about the use of the term 'spam' to mean junk e-mail, but adding a converse makes it explicitly clear that 'spam=bad'.

        And what do the pigs think about all this? Its their flesh we're talking about. The ultimate expression of love is to consume the flesh of another being; we are sending out a mixed message as to whether we love pigs or not, which will surely effect the quality of the eggs they lay.

        By this token eating one's fingernails/bogies/earwax is a form of self-love, which is perfectly natural.

        To which I have no comment :)

    If I get any other material relevant to the conference, I may add it to the Slashdot or use.perl journals, but in any case I wanted to get this up while the pages are still getting traffic, so readers of one variation of the page are not missing out on what may be added to other variations. Thanks all for the feedback! :)

  2. Re:In other news on Judge Decides X-Men Aren't Human · · Score: 1
    ha * ha * ha :-)

    (I briefly considered trying to make a legitmate algebraic expression that ends in an smiley, but it seems like it would have to end in punctuation or an operator, which probably wouldn't make sense. Clever "solutions" welcome :)

  3. Re:In other news on Judge Decides X-Men Aren't Human · · Score: 4, Funny
    Wasn't that an episode of "The Algebraic Prisoner" back in the sixties? "I am not a variable! I am an X-Man!"

    No? BY hook or by crook, it is!

  4. Re:How is it possible to be so fast? on An Even Faster Browser? · · Score: 2, Interesting
    Come on, be nice :)

    Let's profile this rather than just flame. The article claims a 100% or greater speedup, which is of course twice as fast. If the download time is half of that, and as you say it cannot be shurnk further, then you can still realize a 100% gain by getting the other half of the work to approach zero time.

    Very fast rendering (but very broken? I don't see anything saying this browser is actually usable or at all standards compliant...) is pretty much the main way to bring down that chunk of time. Good caching can minimize the amount of data transferred, and as another commenter noted, if the browser can take advantage of mod_gzip they'll get a significant download reduction on many sites.

    Stopgaps? Sure, but no one is saying that things will be infinitely fast. Have you actually spent any time profiling what portion of the time is spent on which tasks in getting & displaying a web page? If the average downloading time is 50% or more then okay, your flame scorched the right target. But if other factors can accumulatively account for 50 or 80 percent of the work time, then your objections become just one of several relevant bottlenecks.

    The thing is, whether or not you have ever used software profiling tools, I'm sure that the developers of the major browsers all have. That's what makes me skeptical of this. If there are any major gains to be made in download compression, caching, rendering or other areas, I would think that optimizations from each area would have shown up in the mainstream browsers by now (and in fact, all of this does exist in some form). While there is still room for improvement -- as the release of Safari clearly shows to the MSIE team -- extravagant claims are unlikely to be true.

    I skipped out this article the last time it was posted, so may be asking what is a FAQ by now, but can anyone provide a better source of material on this browser than Rupert Murdoch's little puff piece? It would be interesting to hear how this browser supposedly works, against which browsers it supposedly does so much better, and whether those gains hold true as you adjust variables like bandwidth (does the gain wash out at DSL or T1 speeds?) and processing power (does the gain get even better on a very fast computer?). It would be interesting to see which browsers it was benchmarked against, and if there was any obvious problems with them when the tests, if any, were conducted ("what, I shouldn't be running against mod_inflate_data on that side?" :).

  5. My notes for the proceedings (very long post!) on MIT Spam Conference Conclusions · · Score: 5, Interesting
    I was waiting for the review to show up on Slashdot, as the conference was really good. The audio proceedings have been put online, but I'm not sure if they can take a Slashdotting, so please be gentle :) If you have 8 hours to spare, the whole day was pretty good & worth listening to, but the schedule as planned isn't exactly the sequence people spoke in, so you may have to jump around the RealAudio stream a little bit.

    Turning my notes for the day into something vaguely coherent, here are some hightlights from the proceedings. There are a couple of speakers that I didn't write anything down for, but from mid-morning on this should be pretty comprehensive. Apologies in advance if my notes lead me to attribute certain comments to the wrong speaker -- if anyone notices any mistakes please feel free to add corrections:

    • Bill Yerazunis - CRM114 & MailFilter

      Because Perl "freaks him out", Yerazunis came up with the CRM114 minilanguage (points for anyone that gets the joke in the name without googling for it :), then wrote MailFilter in CRM114 as an implementation of a filter that can be used with Procmail or SpamAssassin or what have you. The basic idea is to decompose a message into a set of "features" composed of various permutations of single words, consecutive words, words appearing within a certain distance of one another, etc, such that the set of features N is very much bigger than the set of words X. You then analyze the features in various ways and if you get above a certain arbitrary threshold, you flag the message as spam & handle it accordingly.

      He claimed that with this software he could get better than 99.9% accuracy in nailing spam, and a similar percentage in avoiding "ham" (the term everyone was using for false positives -- legit mail that was falsely identified as spam). One of Yerazunis' observations is that the best way to defeat the spam problem is to disrupt the economics: if a 99.9% or better filter rate were to become the norm, then the cost of delivering spam can be pushed higher than the cost of traditional mail and the problem will naturally go away without requiring legislation (which would be nice anyway, but we can't count on it).

      The drawback of CRM114/MailFilter is that it can only handle about 20k of text per second, so it's not appropriate for large scale use yet. Still an interesting project to watch though: crm114.sourceforge.net

    • John Graham-Cumming - POPfile

      Most of his very entertaining talk was about the ingenious tricks that spammers resort to to obfuscate spam against filters, including most diabolically one example that placed each column of monospace text in the message into an HTML column, so that the average HTML-capable mail client would render the message properly, but it would be absolute gibberish to most mail filters. The ultimate lesson was that any good filter has to focus not on "ascii-space" (the literal bytes as transmitted) but the "eye space" (the rendered text as seen by the user), which by extension may mean that any full scale spam parser/filter could also have to include a full-scale HTML & Javascript engine. Yikes!

      As for Graham-Cumming's software, it's a Perl application, available for all platforms (Windows, Mac, & of course Linux) that allows users to filter POP3 mail. Interesting stuff if you're a POP user: popfile.sourceforge.net

    • John Draper - ShopIP

      Most of Draper's work seemed to be focused on profiling spammers, as opposed to profiling spam itself, by throwing out a series of honeypot addresses & using data collected to hunt down spammers. spambayes.sourceforge.net

    • Paul Judge, CipherTrust

      Judge's big argument, which no one really disagrees with, is that spam has become not just a nuisance, but an actual information security issue. To that end, he is advocating much more collaborative effort to address the problem than we have seen to date: conferences like this, mailing list discussions, better tools, and public data repositories of known spam [and ham]. To that last point, one of his observations (which others made as well) was that there are no universally agreed on standards for what qualifies as spam, so repositories for spam will not be accurate for all users (spam for your programmers will be the bread & butter of your marketing department, etc). Plus, there are obvious privacy issues in publishing your spam & ham for public scrutiny. And to add another wrinkle, one danger of public spam/ham databases is that spammers can poison them with false data, screwing things up for everyone. That said, he encouraged users to help out with building spamarchive.org.

    • Paul Graham

      The man who organized the conference and kicked everything this week off with his landmark paper from last fall, A Plan for Spam. Graham's spam filtering technique famously makes use of Bayesian statistics, a technique popular with nearly all of the speakers. The nice thing about a statistical approach, as opposed to heuristics, simple phrase matching, RBLs, etc, is that they can be very robust & accurate; the down sides are that they have to be trained against a sufficiently large "corpus" of spam (most techniques have this property though) and they have to be continually retrained over time (again, this is common). Graham was too modest to produce numbers, but subjectively his results seemed to be even better than what Yerazunis gets with MailFilter, by an order of magnitude or more.

      Like other speakers, he predicted that spammers are going to make their messages appear more & more like "normal" mail, so we're always going to have to be persistent about this -- as one example, he showed us an email he received IN ALL CAPS from a non-English speaker asking for programming help, and although it was legit, the filters insisted otherwise. "That message is the one that keeps me up at night."

      Everyone interested in the spam issue should go read Graham's paper immediately.

    • Robert Rothe, eXpurgate

      Rothe works for Eleven, an ASP company from Berlin selling a spam management service/application called eXpurgate. His talk was short on details about how the tool worked (mainly that it searches for bulk mail), focusing instead on the high level functionality it provides to users -- basically, they classify mail as safe, questionable, or dangerous, and let the users handle them accordingly. Another speaker that sees spam as a network security issue, so they built their system accordingly, with privacy of the client's mail content in mind etc.

      Like many speakers, he warned about the dangers of an anti-spam "monoculture": that Bayesian techniques might be great, but if that's all anyone uses then spammers will catch on and adjust their messages to look more like normal mail, to the point that Bayesian filters won't work anymore. As a result, we're going to need to attack the problem from several angles, using different techniques, to keep the spammers off balance as much as possible.

    • Matt Sergeant, SpamAssassin

      SA is a well known Perl application for heuristically profiling messages as spam, adding headers to the message saying for example "I am 72% sure this is spam because it has X Y Z", and passing off the message to procmail or whatever to be handled accordingly. SpamAssassin can handle a message throughput great enough that it can be deployed at the network level (whereas some of the others, which might have somewhat better hit rates, are still too inefficient at this point). Deployed this way, the differences in effectiveness for single vs. multiple users becomes very apparent, as 99% effective rates fall down into the 95-80% range. This happens because, again, different users define different things as spam, so mapping one fingerprint to all users can never work quite right. For an example of a tool that your company can deploy right now & get fast, decent results, SA looks like a good choice; but for the long run it looks like a Bayesian technique is going to get better performance, and SA is adding a statistical component to its toolkit. Good talk.

    • Barry Warsaw, Python Labs

      This was another example of the "monocultures are dangerous" philosophy, as Warsaw explained how he is helping to use a variety of anti-spam techniques -- from clever Exim MTA configuration to good use of Spam Assassin & Procmail to fine tuning of the MailMan mailing list engine -- to work together to manage the spam problem for all things Python (Python.org, Zope, many mailing lists, a few employees, etc).

      He pointed out that some very simple filters can be surprisingly effective: run a sanity check on the message's date; look for obviously forged headers; make sure the recipients are legit; scan for missing Message-Id headers; etc. In response to the person that originally posted the article, yes, he did mention blocking outgoing SMTP as an effective element of a many tiered spam management approach.

      Among other tricks for getting the different filtering tiers to play nice together, they make heavy use of the X-Warning header so that if an alarm goes off in one tier of their mail architecture, other components can respond appropriately. Cited projects included ElSpy and SpamBayes.

    • Barry Shein, founder & CEO of The World -- or as he laughingly put it, "President of the World". Har har har

      This talk was mostly a let down for me -- Shein has made his views very well known, and his ranting, rambling talk didn't really introduce any new ideas for anyone that had read that interview (some good jokes & quotes though).

      His core argument is that spam is "the rise of organized crime on the internet", that filters are nice but that the mail architecture itself is fundamentally flawed, and that ISPs like his -- in 1989, The World was the world's first dialup ISP -- are being killed by the problem. Shein was very annoyed that all these talented people are having to clean up a mess like this when we should be out working on more interesting stuff, and not having to worry about this issue. His big hope seemed to be that legislation will someday come to the rescue, but he sounded very pessimisstic. (Others in the room seemed to feel that this was a very interesting machine learning problem, and weren't really fazed by his pessimism -- but then most of the people in the room don't run ISPs.)

      He also suggested that we need to find a way to make spammers pay for the bandwidth they are consuming (rather than having users & ISPs shoulder the burden) but didn't seem to know how we might go about implementing this. At all.

      Fun rant to cheer along to, but for me it wasn't very constructive in the end.

    • Jean-David Ruvini, eLabs SmartLook

      This was an interesting product. Ruvini's company is developing an extension to Outlook 2000 & XP that will watch the way users categorize messages into folders, come up with a profile for what kinds of messages end up in which folders, and then try to offer similar categorization on an automatic basis. Think of it as Procmail for Outlook, without having to mess with (or even be aware of!) all the nasty recipies.

      Obviously if you have a spam folder, then spam will be one of the categories it looks for, but more broadly it will try to categorize all your mail as you would ordinarily categorize it. This makes SmartLook a broader tool than "just" a spam manager.

      SmartLook is another statistical filter, though it uses non-Bayesian algorithms to get results. eLabs' tests suggest that the product is able to properly categorize messages about 96% of the time, with no false positives, and (for their tests, mind you) that it performed better than Bayes filters over three months of usage.

      One nice property of this tool was that it works well with different [human] languages -- some strategies fall apart &/or need retraining when you switch from English to some other language. For certain markets (eLabs seems to be a European company, perhaps French?) this is a crucial feature, and having a tool that works with one of the biggest mail clients out there (most people don't use Mutt or Pine, sadly enough) can be very valuable. Very clever -- watch for the inevitable embrace & extend three years from now.

    • Eric Raymond

      He didn't say anything about guns, but he did try to correct one of the other speakers for misusing the term "hacker."

      Like Graham, ESR is a Lisp fan, but he knows that the vast majority of people aren't, and he also knows that the vast majority of people need to be using something like Graham's spam software. So on a lark, he came up with a clean version in C, named it BogoFilter, and put it on Sourceforge, where a community sprung up to, well, embrace & extend it.

      As good as Graham's Bayesian algorithm is, ESR felt -- as did many of the other speakers -- that the nature of your spam/ham corpus is much more significant than the relative difference among any handful of reasonably good algorithms. (Back to the often repeated point about how corpus effectiveness falls apart when used for a group of users, as opposed to individuals.) To that end, he strongly feels that the best way to deal with the spam problem is to get good tools into the hands of as many people as possible, and to make them as easy to use as possible (ahh, the old "open source UIs always suck" argument :). As an example, one of the first things he did was to patch the Mutt mail agent so that it had two delete keys: one for general deletion, one for "get rid of this because it's spam." That second key, and interface touches like it, seem like the way to get average people to start using filters on a regular basis.

    • Joshua Goodman, Microsoft Research

      Unlike ESR, Goodman felt that algorithm selection does make a big difference, but this being Microsoft he refused to disclose what algorithms his team is working with -- except to say that, when delivered, they will be more accessible for average users than SpamAssassin, Procmail recipies, or Mutt :)

      Microsoft has been working on the spam problem since 1997, but because of how big they are they've had unique problems in bringing solutions to market. As a case in point, they tried to introduce spam filters to a 1999 Outlook Express release, but were immediately sued by email greeting card company Blue Mountain because their messages were being inaccurately categorized as spam. With that in mind, they have been very reluctant to bring new anti-spam software out since then because they would like to see legislation protecting "good faith spam prevention efforts."

      As a very large player, Microsoft faced certain difficulties in developing useful filters -- it may make sense for you as an individual to filter all mail from Korea, but this doesn't work so well if you are trying to attract customers *from* Korea :). This has forced them to put a lot of work into thoroughly testing different strategies before offering them to the public.

      In spite of what millions of webmail users may have expected, Hotmail & MSN are currently being filtered by Brightmail's service, and plans are underway to reintroduce spam management features to client side software again. (Just imagine how bad it would be if they weren't paying someone to filter for them! Unfortunately, no hecklers piped up to ask if they are really selling Hotmail's user database to spammers, and if that is a source of annoyance for his team.)

      An interesting barrier his group has had to grapple with was what he called the "Chinese menu" or "madlibs" spam generation strategy: that it's easy to come up with a template for spam -- "[a very special offer] [to make your penis bigger] [and please your special lady friend all night!" vs. "[an exclusive deal] [for genital enlargement] [that will boost your sex life!]" etc -- and have a small handful of options for each 'bucket' multiplying into a huge variety of individual messages that are easy for a human to group together but almost impossible for software to identify.

    • Michael Salib, extremely funny MIT student

      Unlike nearly all other filter writers of the day, Salib's approach was heuristic: find a handful of reasonable spam discriminators, throw them all against his mail, and see how much he can identify that way. "It's sketchy, but this is a class project. I don't have to be realistic. [...] These results may be completely wrong."

      Much to his surprise, he's trapping a lot of spam. He pulls in a little bit of RBL data ("the first two or three links from Google, whatever"), looks for some patterns and so on, and then churns it through LMMSE, an electrical engineering technique that as far as he can tell doesn't seem to be known in other fields. Basically this involves running the messages through a series of scary-but-fast-to-calculate linear equations). It turns out that he can process this much faster than a Bayes filter, to the point that customizing his approach for each user in a network would actually be feasible.

      For a small spam corpus, he got results better than SpamAssassin did, though for a large corpus his results were worse; he couldn't really account for why this would be the case, or predict how things would scale as the corpus continued to grow.

      When questioned about the RBL tactic by a member of the audience [who was apparently familiar to Salib -- I don't know who it was] about whether authenticating remote users might be the answer, Salib's response was "yes, I agree, but then you *do* work for Verisign, who is in the verification business, so you would say that."

      Right on, Salib -- his talk was easily the funniest & breezy of the day :)

    • David Lewis, general researcher

      The core of Lewis' argument, as ESR said earlier in the day, is that for any machine learning technique the quality of the learning corpus is much more important than the algorithm used. Bayes is one such algorithm, but there are many other good ones in the literature. In a dig at Goodman's refusal to disclose algorithms, Lewis pointed out that all of this has been publicly discussed since the first machine learning paper was published in 1961.

      Observations: "lots of task inspecific stuff works badly, but task specific stuff helps a lot." It is important to use different corpuses [corpi?] for training and for general use, so that you don't train your machine to focus too much on certain types of input (this is a point that Microsoft's Goodman made as well).

      As Graham did, Davis emphasized that spam is going to slowly start looking more like natural text, and we're going to have to deal with this as time goes on. www.daviddlewis.com/events/

    • Jon Praed, Internet Law Group

      To a burst of tremendous applause, this talk began with the sentence "my name is Jon Praed, and I sue spammers."

      He brought a legal take on the "not everything is spam to everybody" angle, emphasizing that we need a precise definition of what qualifies as Unsolicited Commercial Email (UCE). In particular, it has been difficult trying to pin down if the mail was really unsolicited, as this is where the spammers have the most wiggle room. However, if you can track down the spammer, they have to date rarely been able to verify that the user asked for mail, and so Praed has been able to successfully prosecute several spammers on this angle. He doesn't expect this to work forever though.

      According to Praed, "laws against spam exist in every state, and more are pending", but he doubts that a legal solution will ever be completely effective as long as spam is lucrative. By analogy, he pointed out that people still rob banks and that has never been legal.

      Praed informed the audience that there are several ways to get back at spammers, including injunctions, bankruptcy, and contempt, and all of these can be very effective. He pointed out that, to be blunt, a lot of these people are desperate low-lifes, and spam has been their biggest success in life. After these legal responses, their lives all get much worse. It hadn't occured to me to see spammers as pitiful before, but I can now. Most importantly, Praed stressed that these legal remedies can be very effective, and he strongly warned against taking vigilante action. This is almost always worse than the spam itself, and it only serves to get you in even deeper trouble than the spammer.

      Identifying the sources of spam, most comes from offshore spam houses, abuse of free mail accounts (Hotmail & Yahoo, free signups at ISPs, etc) and bulk software (which may apparently soon become illegal in certain areas, provided that a law can be found to ban spam software while allowing things like MailMan or MajorDomo). Interestingly, he questioned the idea that header spoofing is a big problem, and claimed that in every case he has dealt with he has been able to track down the messages to a legit source sooner or later.

      Suggestion: if you get a spam citing a trademarked product [e.g. Viagra], forward it to the trademark holder and they will almost always follow up on it. Suggestion: be fast in trying to track down spammers, as some of them have gotten in the habit of leaving sites up long enough for mail recipients to visit, but taking them down before investigators get a chance to take a look. Legal observation: spam is almost always fraud, and can be prosecuted accordingly.

      Praed wrapped up his talk by citing the encouraging precedent that the famous Verizon Online vs. Ralsky case set: [a] that the court is interested in where the harm occurs, not where the person doing harm was when causing it (so if you send spam to someone in Alaska and spam is a capital offence in Alaska, you can be tried as a citizen of that state even if you caused the harm from somewhere else), and [b] it is assumed that you have to be familiar with a remote ISPs acceptable usage policies, and ignorance is no defence (just as you can't say "I didn't know it was illegal to shoot someone", Ralsky couldn't say that he didn't know Verizon prohibits spam -- (he had to have known that the AUP wouldn't allow what he was doing, so he deliberately didn't read it)). That precedent makes future prosecution of spammers much more encouraging. While, again, legal solutions may never eliminate the spam problem, a precendent like this can be an important supplement to filtering efforts (the stick to the filter's carrot, or something -- my lousy analogy, not Praed's).

    • David Berlind, ZDNet executive editor

      His talk was primarily about how he receives a huge quantity of email from ZDNet readers, and he can't afford to use any spam filtering solution strategy that would allow *any* false positives. As one of the speakers said -- sorry, I forget who (Microsoft's Goodman?) -- getting a 0% false positive rate is easy: just classify nothing as spam. Getting a 100% hit rate is also easy: just classify everything as spam. Any solution besides those two is always going to have some degree of error either way, and determing how much of what kind of error you want to accept is up to you. Most users will tolerate a moderate false negative rate (some spam gets through) if it means that the false positive rate (legit mail is deleted) is very low. In Berlind's case, the false positive rate has to be vanishingly small, because reading all customer mail is a critical sign of respect for him.

      Further, his business is also a legitimate mass emailer, sending out millions of free newsletters to users every day, and if Shein's proposal to bill bulk mailers were to catch on then even a very low rate would quickly put his company in the red. One obvious solution, which wasn't mentioned: start charging a subscription for these mailings, and make them profitable. I don't want to see this happen but if it did then the economics would tilt back toward making things feasible again.

      Berlind is appreciative of the anti-spam work that is being done, but at the same time is skeptical of how pragmatic most of what is being proposed can really be. He feels we need a massive effort to rework the way mail is handled [Y2K anyone? It could get IT people back to work...], and to that end hopes ZDNet can help promote such a cooperative effort between the parties working on this. They don't want to be involved -- they are journalists & publishers, not standards developers -- but they are eager to get things going & want to cover the story as it progresses.

      Like Shein said, he feels it's a waste for all these talented people to be working on combating penis enlargement offers, and hopes that we can find a way to get past this and work on real problems, "like world peace." This comment got a chuckle from the audience, but he seemed like the kind of guy that really meant that, and more importantly, he was right. A smart guy like Paul Graham or Bill Yerazunis shouldn't have to waste time tinkering with how many Viagra offers he can automagically delete when there are more fun things to be doing.

    • Ken Schneider, Brightmail

      As mentioned earlier, Brightmail provides an ASP service for real time filtering of both incoming & outgoing mail. As would perhaps be expected, bigger ISPs and networks attract larger amounts of spam: 50% of mail coming into big ISPs and 40% coming into big companies is now spam. Brightmail offers the Probe Network, a <slashdot-killfile-term>patented</slashdot-killfil e-term> system of decoy honeypot addresses that gather data for analysis at their logistics center, which in turn distributes spam filtering rules to their clients where a plugin for $MTA (using the open source or proprietary MTA of the client's choice) can act on the database.

      An interesting property of their system is that they have a mechanism for both aging out dormant rules as well as for reactivating retired ones, so that the currently active ruleset can be kept as lean & effient as possible. A big source of difficulty for them is legitimate commercial opt-in lists, because things have gotten more shady & blurry over time and it's now hard to tell this mail from much of the spam out there. Whitelists help here, but the problem is still difficult.

    After each speaker had his turn, there was a panel discussion, but not much really happened there, and the moderator cut things short after only a couple of minutes. The original plan was for everyone to go out for Chinese food afterwards and continue the discussions over dinner, but when 580 people signed up that plan obviously fell apart. :) And so, here ends the notes...

  6. Re:One million dollars later on MIT Spam Conference Conclusions · · Score: 1
    Har har har :)

    That said, the conference was free & MIT is not a public institution, so the comment is a little misplaced. Funny, but misplaced :)

  7. Re:Unfortunately still no tabs on Safari Beta Updated · · Score: 2

    Yes. I only realized after posting that comment that I'm using the wrong term, but yes, I am referring to drawers.

  8. Re:iBook users may disagree... on Safari Beta Updated · · Score: 1
    Everyone's mileage varies :)

    As time has gone on, I've come to feel that, regardless of the screen resolution, my most "comfortable" browser window geometry is more tall than wide, like a sheet of notebook paper. Since nearly all computer monitors have the opposite geometry -- wider than they are tall -- this means that for nearly all common screen resolutions (anything bigger than 640x480) I tend to have one or more windows open, partially overlapping vertically. Arranged this way, I personally would feel comfortable giving some of that un-used (or at least, less-used) horizontal space to something like a tab / bookmark / history drawer; at the same time, on a low resolution display it would annoy me to have to sacrifice the little bit of vertical height I have available to a row of tabs, when a more rich interface could present the same information & more if moved to one side.

    In any case though, this doesn't have to be an either/or situation. For every application I've seen that uses them, drawers are toggleable & can be resized as needed, and some even let you move them to the left, right, or maybe even bottom of the window (though putting it on the bottom seems messy for this situation). So if Apple were to put this functionality into Safari, I don't see any reason that they couldn't also make it flexible as well...

  9. Re:Unfortunately still no tabs on Safari Beta Updated · · Score: 4, Informative
    Skimming your linked post (sorry, will read it in more detail after this), I don't think we're describing quite the same thing here. What I'm referring to as trays should more accurately have been referred to as drawers, as that's the term that the Apple documentation seems to use. Out of habit, I use the terms 'tray' and 'drawer' more or less interchangeably, but I'm realizing now that searching for 'tray' interface elements isn't turning up many hits, so maybe this usage isn't as standard or common as I thought.

    In any case, in the Aqua interface, trays are a specific & unambiguous interface style that for whatever reason hasn't been used very often so far. The best example I can think of from one of the "core" applications is Mail.app, for which there is a screenshot at Apple's site. The other big application I can think of right now is Omniweb, which uses a drawer to organize bookmarks. (I'm not an Omniweb user, so I wasn't aware of that until searching for this post :). Of freeware apps that I use regularly, the best example I can think of is (the very slick) MacJournal, which uses two trays -- one to present a list of journals, the other to present entries within a particular journal (for example).

    Now that I poke around a bit, the best critical reviews of the tray interface I can find so far are this MacEdition review and this Oreillynet tutorial. (John Siracusa also wrote some excellent OSX reviews for Ars Technica, but I can't find a section that focuses on drawers in particular.)

    But the authoritative reference -- which unfortunately doesn't seem to have screenshots to go along with the prose -- is the Apple MacOS X Human Interface Guidelines:

    Drawers are a special window type, found only in Mac OS X. They are child windows--which slide out from a parent window--that users can open or close (show or hide) while the parent window is open. These windows should be used for tools or controls that are closely associated with the parent window and frequently accessed, but do not need to be visible all the time. For example, Mail uses a drawer to provide access to the user's mailboxes.

    So while this isn't incompatible with what you're asking for, it looks to me like it's not quite the same thing. This is an existing toolkit that could be called on by any Cocoa or Carbon application, and it seems to me like this is a perfect example of where best to apply it.

  10. Re:Unfortunately still no tabs on Safari Beta Updated · · Score: 5, Insightful
    Here's an idea: let's re-evaluate what you *really* want the software to do here. Is it really the case that you need tabs, or can it more accurately be said that you just want some form of a multiple document interface [MDI]. If the latter is correct -- and for me, it is -- then are tabs (as implemented in the Gecko family of browsers) the best or only way to do this? Or are there other, possibly better ways to get to the same goal?

    It occurs to me that a better -- and arguably more "Cocoa-ish" -- way to present this would be a tray interface, like what you see in Mail.app. Seen this way, you could have a hierarchy of widgets in the tray, including:

    • currently open pages (the tabs, as available in Mozilla etc)
    • bookmarked links & folders of links
    • history links
    • "scrapbook" page[s]?

    If presented this way, you could browse open documents and bookmarks much as you can browse mail folders in Mail.app. If items in the tray could be browsed with "flippy triangles" (like in the Finder's list view), then you could zoom in on different kinds of URLs quickly. Plus, having a tray interface might even buy you enough screen real estate that you could even have thumbnail versions of some or all pages in the collection. Neat, huh?

    Personally, I agree with everyone that's asking for tabbed browsing, but only to the extent that I think that the web is easier to browse in a MDI style. But the more I think about this tray idea, the less I think that simple tabs is the best way to present this information. Trays. They're IMO the coolest & most innovative part of the Aqua interface, and they really aren't implemented all that often. This seems to me like a perfect place to introduce a tray interface, and if Apple decides to add a MDI option to Safari, my hope is that this is how they'll implement it.

    If you agree that this is a good idea, please do as I've done and submit the idea as feedback to Apple with Safari's bug reporter widget, or by using the bug reporter on Apple's site (sorry, I forget the url offhand). Now is the time to let them know what features you would hope for... :)

  11. Re:Serious question here... on Next OmniWeb to be based on Safari Engine? · · Score: 2
    As a case in point, check out Crazy Browser. Like Phoenix or Chimera, Crazy Browser is a "new" web browser, but this one is built on the Internet Explorer instead of Gecko. CB (a silly name, but hey what can you do) enhances IE with features like tabbed browsing & popup blocking, and yet the download is only 700kb because most of the grunt work is done by the IE libraries, so the CB code is probably all interface stuff (it's freeware, but not open source, so that's just a guess).

    Anyone interested in learning more about how IE can be extended (as closed source but semi-open APIs) may want to get in touch with the CB people, though I have no idea if they'd want to talk shop. *shrug*

    (Annoyingly, you can't get to the CB home page without being forced to accept a popup for one of this company's other products, PowerIE. Some kind of toolbar thing, I dunno. It looks like it might be interesting but having to learn about it through a popup like this is rude -- but then I'm not typing this from a popup blocking browser, so I get what I deserve I guess. Amusingly, PowerIE -- nothing but an IE extension, not a whole browser like CB is -- has a download almost as big as CB itself, which I think nicely illustrates how much being able to use shared html rendering libraries can help things here...)

  12. Re:Show some initiative on Making the Case for Better Bugtracking Tools? · · Score: 4, Informative
    No, please don't use Bugzilla -- it's reputation far exceeds its actual quality. Bugzilla is an arcane, tightly bundled colledtion of hard to extend CGI scripts sitting on top of a bizarre MySQL schema. If it doesn't exactly meet your needs (i.e. you are not the Mozilla project), extending it can be a nightmare.

    May I humbly suggest that you take a look at RT: Request Tracker instead. RT is a general purpose ticketing system, suitable not only for bug tracking, for for all kinds of organized message exchange within an organization (i.e. help desk, sales force tracking, some aspects of inventory management, etc). RT allows users to collaborate via a web interface, email, or the command line. By providing multiple interaction interfaces, RT encourages users to work with the system by communicating the way they would already, rather than working against them by forcing them to adapt to a wholly new system. If you don't like the web interface, feel free to change it. If it's still not enough, people can just use email instead -- just cc: your RT account on ticket related mails, and include the ticket number in the subject line. Hey presto, people can do almost what they were doing in the first place.

    RT is written in clean, OO Perl making wise use of CPAN libraries instead of implementing everything from scratch. It will run on a variety of operating systems & databases (MySQL, PostgreSQL, Oracle). The system is well documented, easily extensible, and comes with a vibrant & supportive user community. It can even be integrated with things like pagers, so that the creation of critical tickets can send out a pager message to key personnel.

    All in all, RT is a very nice, very well engineered system that IMO is far more suitable for most users than Bugzilla, for which the suitable scope is much more restricted. That's why RT is now being used in, among other places, Perl's bug tracker at rt.perl.org.

    Disclaimer: My company uses RT, and I have met Jesse Vincent, RT maintainer, a handful of times, and even though I think it would be pretty cool if people switched to RT and bought support contracts from Jesse, I have nothing to gain if any of this happens. I just sincerely think that RT is better software than Bugzilla for almost all users, and would like to see development of the software continue to flourish and become accepted more widely. Spend a week messing around with RT and IMO you'll never want to go back to Bugzilla...

  13. Re:iTunes-iPod ... so ... *iPhoto*-??? on Apple To Introduce Video iPod? · · Score: 2
    I think you're on the right track, but keep pushing on the idea. As everyone seems to be howling, there doesn't seem to be a huge amount of interest in being able to watch movies on the run. Some maybe, but not much.

    But what about recording them?

    I just got the Sony Clie with the cheap-ass little camera on it, and man the thing is great. With a 128mb memory stick & provided that you have enough battery time remaining, you can record 2 hours of video. Now of course, nobody is going to shoot a movie like LOTR on this, but for recording brief snippets of everyday life it's a hell of a lot more fun to use than a traditional camcorder.

    So, if Apple's oh-so-clever design engineers could find a nice place to mount a little camera on an iPod's chassis, what then? If 128mb of Memorystick can hold 2 hours, that works out to -- let's round to keep things simple -- a minute per megabyte. At that rate, the 20gb iPod will be able to record for, what, almost two weeks or so? (My back-of-the envelope estimates 333 hours, which works out to just under 14 days.)

    At that rate, you can go for a higher resolution image & still be able to get many hours of video recorded on the device -- with battery life being your main constraint. Switch from video to still photography and you get even more use out of the thing. You could put a little screen on there to skim over what you've recorded, but the focus would probably be on getting this into iMovie or iPhoto.

    Don't just think iPhoto in your hand, think "iApps -- iTunes, iMovie, iPhoto, iCal, etc -- in your hand". :)

  14. Re:You misunderstand completely on E ~ mc^2 · · Score: 2

    Thank you. That was the clearest & most thorough answer to that question that I have ever encountered, and I've been reading [and using] less well-stated versions of this comment for years now. Every school board in the USA should be forced to read this before trying to impost creationism on their poor students... :)

  15. Re:This could make The Gimp cozy for MacHeads?? on GTK+OSX for Mac OS X Aqua · · Score: 2

    ...and to be honest, I should have mentioned that there is already an Aqua port of Vim (currently version 6.1.184, runs on all versions of OSX). So just to head off the person that will inevitably point that out that Vim has been ported to a more or less Aqua-native form (menus on the screen border, not on the windows, etc), I realize that wasn't such a great example -- I just couldn't think of a better one at the time :-)

  16. Re:This could make The Gimp cozy for MacHeads?? on GTK+OSX for Mac OS X Aqua · · Score: 5, Insightful
    On one level you're right -- Photoshop is in most ways & by most opinions a superior tool compared to the Gimp, and most oldschool Mac users will not be impressed by the Gimp. On the other hand, a lot of newschool Mac users are oldschool Unix users, and a lot of those folks are only passingly interested in creative graphics software. For that segment, noodling around in the Gimp is just fine, and makes far more sense than shelling out a few hundred bucks for the professional grade graphic designers' software. In short, the Mac ecosystem is diverse enough to support both applications just fine.

    More importantly, the real gain here is the GTK+ toolkit, not just the most prominent application written with that toolkit. Being one of those unix/mac users, I'm not particularly interested in the Gimp -- but I'd love to be able to use an Aqua-native version of Gvim every day, and with a native GTK+ port there are now a huge number of other GTK+ apps that can be brought over to OSX without forcing users to set up X11 as well. As another commenter noted, no, these will not really have the right look & feel for OSX -- menus attached to windows instead of the screen border is a mistake here -- but as a bridge framework for bringing graphical Unix software to the Mac, this is far better than having to run X11 alongside Aqua.

  17. Obvious, when you think about it on Windows Security Holes Go Mostly Unexploited · · Score: 2
    Windows Security Holes Go Mostly Unexploited

    This from Wired magazine. Yes, and I can see near future headlines in the Wall Street Journal or Onion:

    Saudi Arabian Petroleum Fields Go Mostly Undrained

    Supply & demand, fellas. Obviously the currently exploited pool of vulnerabilities is keeping a lot of people -- malware authors, antivirus vendors, security professionals, etc -- plenty busy right now. What would be the point in exploiting & then distributing software that hits all the other, as yet ignored possibilities?

    Doing that would be like writing a song that hits every key on the piano or every fret on the guitar -- it would be impressive to your colleagues, but really the public would be happy with Britney Spears style pumping out of Nimda / CodeRed / etc variants.

    Just as the average academic cheater just wants to get by with a minimally passing grade, the average script kiddie is probably happy with ripping off & minimally modifying code that already does the job.

    Analyze things in supply & demand microeconomic terms -- the currently used set of exploits yields high profits through minimal development expense & maximal effectiveness in the wild, so there is no need to expend effort on coming up with cleverer attacks. Unless & until fundamental fixes for the flaws that the common malware attacks are both available & widely applied, malware authors have no incentive to get more ambitious -- they're already living on "#4, there is no #4; #5, profit!"-land :)

  18. Re:MS was at USENIX/SAGE asking what makes a good on Microsoft Next Generation Shell · · Score: 2
    Given the context -- nice signature :)

    The funny thing is, from a certain point of view, this looks like both a validation & cancellation of what Apple has managed to do with OSX. Once OSX came out, Windows was the only remaining family of mainstream operating systems that wasn't either Unix based or, if nothing else, had a robust layer for using Unix tools (BeOS wasn't really Unix/Posix, but it was glose enough for many purposes; everything else is even closer & usually at a pretty deep level).

    Legions of Unix fans were able to boast that they could take their shell skills to any other platform painlessly. The Macintosh found a whole new audience that had in the past mostly ignored it. And aside from efforts like Cygwin (which is imperfect at best and, besides, isn't available out of the box), Windows users were left out of the party. Not that that bothered most Unix fans.

    Unfortunately, it seems like Microsoft was paying attention to all of this. If the really do get something like the Unix command shell running as a native capability of some near-future version of Windows, you're right -- much of Unix's unique strength won't be quite so unique any more, and again you're right -- Unix unfortunately doesn't have as much to fall back on as one would maybe hope. As much as *I* would hope, anyway...

  19. Re:What will the Hoax theorists say? on To the Moon and Beyond · · Score: 1

    Rosetta, nah, that's that Egyptian thing. An obvious NASA/ESA plot to throw us offguard :)

  20. Re:A good thing, really. on Microsoft Next Generation Shell · · Score: 2
    On the other hand, I guess it just makes Windows easier to crack too ;)

    Actually, I think you've got a really good point there, but maybe not for the reason you seem to be getting at.

    What happens if, a couple of years from now, commonly available versions of Windows -- server versions at the minimum, consumer ones as a possibility -- start coming with a reasonable complement of standard-ish Unix-ish tools? Just to list a hypothetical set: sshd, perl, bash, a handful of utility programs (grep, tr, etc) and maybe some of the more specialized tools for networking, hardware monitoring, etc.

    Where would we, as users of an "ecosystem" of computer platforms, be? Better off? "Yay, I can take my 'leet hacker skills to Unix, Mac, or Windowland now! Joy!" Or worse? "This new ssh vulnerability allows remote compromise across all versions of Unix, Mac, and Windows. When coupled with recently discovered buffer overflow errors on the version of Perl commonly installed by default on many of these platforms, remote users can...."

    ?

    When the DNS root servers were recently DDOS attacked, a major factor preventing a complete [root] system wide failure was the heterogeneous nature of these servers, according to several post-attack analyses. The fact that each of them was running different hardware & software configurations prevented the attack from being universally effective, even though it was extremely effective against some servers and moderately effective against others.

    Broaden that to the internet as a whole. As much as I like the idea of being able to "port" my Unix originated shell skills to every platform I have to use, at the same time I'm worried about the situation we'll be in when writers of malicious software will be able to port their exploits as well. If the 2005 variant of CodeRed or Nimda does not just have to be confined to the Windows target, my bet is that it won't be -- we're *all* going to have to worry.

    Be careful what you wish for, you just might get it...

  21. Re:Spam Conference... on Spam Conference in Boston · · Score: 2
    Let them attend, I say. Let them heckle from the back of the room, saying "aw hell that won't work, if you do $this then I can just do $that." Hey presto, the researchers get a better awareness of the failure points, and the solutions ultimately developed are that much more robust.

    Think about it -- this is exactly the same argument that favors open source software over proprietary equivalents. "With enough eyes all bugs/security holes are shallow." Without exposure to real life spam & spammers, how is anyone ever going to know if new techniques work? If the conference is attended by both pro- and anti- spam advocates, we'll all get to the meat of the issues that much faster -- you might as well be confronted with the problems while a bunch of experts are in the same room to hash out a solution...

  22. Re:short answer on Fighting Back Against Messenger Popup SPAM · · Score: 2
    Pardon me for the creative citation editing here:
    This isn't really a free speech issue <cite argument in favor of suppressing speech /> Sorry. Can't, or at least shouldn't, be done.

    Sorry, you lost me there. If this "isn't really a free speech issue", then why are you defending this activity on free speech terms? I don't understand your thinking here. In what ways relevant to this context (broadly, spam) is commercial speech governed differently from non-commercial speech, such that your argument can be consistent with itself? I'm curious because, not knowing the fine points of the law, it looks to me like you're contradicting yourself here, and in the end I can't parse what conclusion you're trying to tease out.

  23. Re:The one that annoyed me on Removing Burstabit Spyware? · · Score: 3, Informative
    Unfortunately, considering the ways these spyware programs are written, their "official" uninstall instructions are unlikely to be enough. What to do? Google to the rescue! Their new webquotes beta service -- which shows you [a] the URL it thinks you're looking for, and [b] *what other pages say about that URL* -- is exactly what you need here. Follow that link and you'll find several explanations of how Lop works & how to remove it, and you don't have to take their "official" word for it.

    Google rules. Well, usually -- they're not turning up any hits for Burstabit yet, though I'm sure this article will itself become part of their index before too long. Not that that Google reference helps the person who submitted this story in the first place...

  24. Re:Score another for Linux? Not. on Bridging Unix and Windows At NASA · · Score: 5, Insightful
    That's nice. But if you actually read the article, the government *requires* them to use Microsoft software for tasks such as email. Can you honestly picture a department full of Unix nerds bending over backwards to accomodate Outlook because they *wanted* to? Especially when Ximian Evolution is available for much less pain?

    So while you make a good point, it doesn't seem to be Nasa that you need to make your argument to. The problem sounds like it's upstream somewhere, and that itself is a huge problem: why is the federal government forcing its employees to use the software of a tried, convicted, and... well completely unpunished abusive monopoly? Don't take your aggression out on the people that came up with this hack, point it at their bosses & their bosses' bosses, who told them that this is what they have to do.

  25. Re:bad journalism alert on RC Car Craze: The Spam Connection · · Score: 2
    FYI, the Google trick isn't as robust as you might hope -- they've got the partner=google token in the URL or whatever, but they also are able to check referral data as well. If they ever decide to get strict about this, you may end up having to spoof both the URL and $ENV{HTTP_REFERER} (figuring out how to do this is left as an exercise for the Lynx user :).

    And as for your other tricks, sure, they mask you a bit -- much better than simply randomizing your registration profile. But the thing to keep in mind is that, in the absence of a registration scheme, all they can ever get is a bunch of disconnected data points. BY forcing people to register, even if just for one session, they can start connecting some of these dots. They might not learn as much as they'd like about you personally, but they might get a better idea of the demographics of people using your ISP (come on, there's not that much entropy in a dynamic IP address, they still know what virtaul block you live on).

    In the end, it comes down to how much trouble is it worth to you to screw up their analysis. You really have to put in a lot of effort to get the results you're hoping for, and in the end what's the point? They still know more about you than they did without registration. Best case scenario that way -- they can't figure out who you are, so they sell what info they have to *all* the spammers, popup advertisers, etc. On the other hand, if you just roll with it, it's a lot less stressful for you, they get a better data set which can allow them to be more effective (profitable, efficient -- better chance that they won't have to start charging a subscription fee somewhere down the line), and the spam & ads you get at least stand a chance of not being so obnoxious.

    Hey, it's your choice and your business, do as you please. I just can't help that fighting this is more than a little bit Quixotic... :)