Slashdot Mirror


Revolutionary Spam Firewall Developed

psy writes "physorg has a story on a new spam firewall developed at The University of Queensland. The new technology is the only true spam firewall in existence, according to co-developer Matthew Sullivan. "Existing anti-spam software filters out spam whereas ours puts up a firewall, stopping all email traffic and only allowing real mail through," said Mr Sullivan. "In addition, our technology is accurate and fast. We recently completed a successful trial of a key layer of the spam firewall and it processed the emails at 90 messages per second, misclassifying only one out of 25,000 emails." "It turned out that the software was even better than us, picking up spam we'd incorrectly classified as legitimate emails."

23 of 507 comments (clear)

  1. Not the first; not revolutionary by Anonymous Coward · · Score: 5, Informative

    I think Barracuda Networks would rather disagree with the idea that this is the "only true spam firewall in existence," considering that Barracuda's entire product line consists of spam firewalls.

    Damn fine spam firewalls, too, I might add. They handle around 115 messages per second, and can run up to eight filtering steps (including Bayesian analysis, which is similarly efficient to SVM, which the one in the article uses). Plus Barracuda's can do virus scanning.

    I'm not sure how this is revolutionary.

    1. Re:Not the first; not revolutionary by Greyfox · · Score: 5, Informative
      I believe the distinction is when the filtering takes place. If you wait for the spam to be placed on your hard drive and filter it out when you start your mail client, then it's filtering. If you reject the spam before the remote MTA drops the connection, then it's a firewall.

      I'm using Postfix at home and it's got some nifty features to allow you to do this sort of thing. You can write a simple SMTP server that listens on some port of 127.0.0.1 and configure postfix to send the mail though that. Your server scans the E-Mail and sends a reject or accept message back to postfix, which sends it on to the remote MTA. Your SMTP server then feeds the mail into another postfix server which listens on an odd port of 127.0.0.1 and doesn't have the restrictions that your publically accessable postix server does. There are packages available for all sorts of scanning based on this ability. Since you reject the message at MTA time, you don't have to bother with sending a bounce message, either.

      --

      I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

    2. Re:Not the first; not revolutionary by Weirdofreak · · Score: 2, Informative

      I'm reminded of the legend of DWIM. For those that don't know, it was basically an automated error-correction program - Do What I Mean. If it thought you'd typed something in wrong, it would replace it with what it thought you meant.

      Somebody tried to delete their backup files, which had $s appended. There were no backup files, so DWIM thought that somehow they'd mistakenly hit the $ key just after pressing *, and in fact meant to delete everything on the disk. And no, heaven forbid that it confirmed this assumption, it merely proceeded to wipe everything. The guy managed to abort it, but wasn't happy.

      Now why the hell would I want a computer to assume that it knows what is and isn't spam, and then not give me any way of verifying this? The software is fallible. When judging email that I don't want, the only infallible person is me. That one in 25,000 isn't likely to be important, but it sure would be nice if I was allowed to read it instead of just being told to sod off.

      And how can it be better than yourself at finding spam? If you read an email and don't consider it spam, there's a good chance you might actually WANT it. Then a machine comes along, tells you it's spam, and you just accept that blindly?

      Maybe by not reading the article I missed something vital, but that's how it seems to me.

    3. Re:Not the first; not revolutionary by naelurec · · Score: 4, Informative

      I do multi-layered protection. At the MTA level, I utilize some DNSRBL lists to block from known spam servers. In addition, I require HELO and reject people who are claiming to be my server. In addition, I will reject invalid recipient domains, etc.

      From here I run accepted emails through AMaViS / SpamAssassin / ClamAV / Sophos Sweep (I have yet had Sophos catch a virus that ClamAV did not detect.. though ClamAV caught two that Sophos did not..) and will not deliver (but notify postmaster) of spams over a set value (ie 8), deliver spam between 5-8 tagged and items under a certain value get passed without tagging. Viruses are always blocked and reported.

      Overall this has reduced unwanted email significantly. On networks of 40-60 users, between 35-50% of email is rejected at the SMTP level, about another 10% or so is quarantined (either viruses/spam), another 10% or so is tagged but delivered and the rest is legit.

      I have yet had any compliants of false positives (granted there is a risk that they do not know) but have had a lot of priase for reduction in spam levels. I am not aware of any viruses penetrating.

      Check out http://jimsun.linxnet.com/misc/postfix-anti-UCE.tx t for more info (this is postfix centric, but the ideas could be applied to other setups)

    4. Re:Not the first; not revolutionary by CustomDesigned · · Score: 2, Informative

      Your definition is a good one. But it still doesn't make this product the first - or revolutionary. Sendmail created the 'milter' interface many years ago precisely to make this kind of rejection of unwanted mail possible. There are many sendmail milters written in many languages. The most popular being C, Perl, Python in that order. I run a Python milter which removes Windows executables (except DOC and XLS), checks SPF, and checks content with DSPAM wrapped for Python. Of the 40000 spams a day we get, nearly all are rejected before SMTP DATA. Those flunking content check are rejected before the connection closes - except when addressed to a 'screener', in which case it goes to a spam mailbox. Screeners have the task of providing feedback to the Bayesian filter - relieving others in the company of the burden.

  2. Not a firewall by BarryNorton · · Score: 4, Informative

    This isn't a firewall as it doesn't filter based on addressing. Furthermore, the use of SVMs (support vector machines) to classify text is not new...

  3. Ciphertrust, too... by TrebleJunkie · · Score: 4, Informative

    I know! Ciphertrust's Ironmail works the same way... It stops ALL mail inbound, runs it through about a dozen different detection queues, only letting legitimate stuff through. I'd really like to see how this new one is otherwise unique.

    --

    Ed R.Zahurak

    You know, oblivion keeps looking better every day.

  4. Re:Spelling by random_culchie · · Score: 5, Informative

    Yes and aparently there are 600,426,974,379,824,381,951 different ways to spell viagra!

    Will your algorithm do it with polynomial complexity ;)

  5. Re:Spelling by random_culchie · · Score: 2, Informative

    The there is the old trick of putting html in the middle of dodgey words.
    Like: viag<!--xyz -->ra

  6. Re:Spelling by random_culchie · · Score: 2, Informative

    Select Extrans from the drop down box :)

  7. Re:Support Vector Machine (SVM) by Anonymous Coward · · Score: 2, Informative

    Support vector machines are actually quite a good machine learning tool -- try Wikipedia: http://en.wikipedia.org/wiki/Support_vector_machin e

  8. Re:Spelling by ninewands · · Score: 3, Informative
    Quoth the poster:
    Yes and aparently there are 600,426,974,379,824,381,951 different ways to spell viagra.

    Actually, the number is 1,300,925,111,156,286,160,896. He missed a couple of possibilities and had to update the page.
  9. Re:Here's how it probably works by Santana · · Score: 4, Informative

    That's how spamd works, and yes, it works tremendously well. I used to get 300 spam messages daily. I receive now one or two every week.

    --
    The best way to predict the future is to invent it
  10. Re:Spelling by CommanderData · · Score: 4, Informative

    His algorithm doesn't need to. All it needs to do is check against an existing dictionary of words. If the word is not on the list, it is assumed to be misspelled. (If the good spelling of Viagra is in the dictionary, simply remove it so that any correctly spelled reference to Viagra counts as a misspelling too). If there are greater than X% misspellings in the e-mail it gets trashed. X can be a smaller percentage if the e-mail has any hyperlinks in it, because it is virtually guaranteed that someone is trying to sell you something...

    --
    Urge to post... fading... fading... RISING!... fading... fading... gone.
  11. Re:Here's how it probably works by hedronist · · Score: 4, Informative
    I think you're trying to describe greylisting. Although greylisting is amazingly effective, I don't believe that's what is being discussed here (the site is slashdotted).

    Our experience with greylisting has been (1) an 90%+ reduction in passed-through email (with no complaints from users about lost mail (yet)), (2) a dramatic decrease in server load because SpamAssassin doesn't see the message until after it gets past greylisting, and (3) people rediscover how useful email is once you get all of the crap out of their inbox.

    Marketing Guy: What's the worst that could happen?
    Dilbert: Our beta product could turn into an evil robot that annihilates the galaxy.

  12. Re:1/25000 by ColdGrits · · Score: 3, Informative

    If missing one email is not acceptible to your business, then your business should not be using email ever anyway - email is not, nor has it ever been, a guaranteed delivery mechanism.

    At our company, current just over 50% of all inbound email is detected as spam. Thus more than 50% of all our inbound email is spam, and the true figure (allowing for the false negatives which slip through) is probably in excess of 60% (and rising)

    With a failure rate of 1 in 25,000, AND assuming that means a false positive rather than a false negative, then for our company taking into acount the volume of spam we receive it means 1 email in > 55,000 is wrongly identified.

    I can assure you that our business is capable of coping with 1 missed email in > 55,000.

    We certainly do not to business-threatening-essential transactions via insecure, non-guaranteeded publicly-transported email, and nor shoudl your business!

    --
    People should not be afraid of their governments - Governments should be afraid of their people.
  13. Re:Why filter at firewall layer? by Anonymous Coward · · Score: 1, Informative

    Since you says its Slashdotted, heres the text.
    Posted anon so no K-whoring.

    ----
    The email spam nightmare could be halted in cyberspace by a groundbreaking firewall developed at The University of Queensland.

    The new technology is the only true spam firewall in existence, according to co-developer Matthew Sullivan.

    "Existing anti-spam software filters out spam whereas ours puts up a firewall, stopping all email traffic and only allowing real mail through," said Mr Sullivan.

    "In addition, our technology is accurate and fast. We recently completed a successful trial of a key layer of the spam firewall and it processed the emails at 90 messages per second, misclassifying only one out of 25,000 emails."

    "It turned out that the software was even better than us, picking up spam we'd incorrectly classified as legitimate emails."

    A Specialist Systems Programmer at The University of Queensland, Mr Sullivan worked on the spam firewall concept largely in his spare time, only coming together this year to work on the project with Guy Di Mattina, a recent UQ Engineering honours graduate, and Dr Kevin Gates, a UQ mathematics lecturer.

    Pivotal to the trio's spam firewall is the unique method of using a Support Vector Machine (SVM) to categorise emails. The only anti-spam software that analyses emails as a whole picture, rather than based solely on components such as key words or phrases, said Mr Sullivan.

    "Using a SVM, we can train our spam firewall to accurately recognise legitimate emails to the extent that it can tell the difference between a pharmaceutical bulletin on Viagra and someone trying to sell Viagra," he said.

    UQ's main commercialisation company, UniQuest, has formed a start-up company based on the technology and is seeking investment to take the spam firewall to market.

    UniQuest Managing Director, David Henderson said the global cost of spam was estimated by the Radicati Group in 2003 to be $20.5 billion or $49 per user mailbox.

    "With spam escalating and companies losing valuable employee time to deleting spam, UniQuest hopes to get this revolutionary spam firewall technology on the market quickly but it just depends on the level of funding we receive," said Mr Henderson.

    Source: University of Queensland

  14. Revolutionary Mail Firewall? by Titusdot+Groan · · Score: 2, Informative
    Mail Firewalls are an entire business sector with many companies competing in this space. This space is tracked by Gartner and Meta Group. How in the hell is this revolutionary?

    Hell, there's even a product called the Mail Firewall that pops up if you google for mail firewall.

  15. Old news by Anonymous Coward · · Score: 3, Informative
    They are not the first on the block.

    Heuristic analysis - detects and blocks spam by various email characteristics

    Black lists - checks if the sending server is in RBL (Realtime Blackhole List), dial-up or open-relay servers

    DNS verification - checks if the sender is using a valid mail server

    Keyword blocking - blocks spam according to keywords in subject and body

    Anti-spoofing - blocks email masquerading as coming from within the organization - a common spam technique

    Cookies/web beacons - blocks email cookies which help spammers identify the recipient as a "live" email

    Header verifier - inspects various header signatures and blocks spam

    Textual analysis - categorizes spam according to textual content like mortgages, pornography, dental care, etc

    Spam signatures - an auto-updating spam database allows detection and blocking of spam according to smart signatures

    Spam URL filtering - blocks email with links to spam sources and sponsors

    Spam image filtering - blocks email containing spam associated images

    Auto-updating database - local or remote spam blocking database based on thousands of Spam collecting bots and web crawlers

    http://www.esafe.com/esafe/anti-spam.aspeSafe

  16. Re:Here's how it probably works by slashname3 · · Score: 3, Informative

    You just described greylisting. And it works extremely well. It is something all ISPs should be forced to implment immediately.

    And for those that say this is a stop gap and won't be effective for very long, they are wrong.

    The whole idea is to increase the cost to the spammer of sending out millions of emails. By greylisting they have to resend the same message at least twice, possibly multiple times, since they don't know how long the delay is.

    On top of that if you combine greylisting with an RBL which is fed from a spam trap it is most likely that by the time the spammer resends the message to you a second time that machine is listed in the RBL. So the second attempt you let it in, check the RBL and reject the message.

    Add spamassassin as the next line of defense and the few messages that do get through will get tagged and dropped in the spam bucket.

    But the important part of all this is to increase the cost to the spammer. If they try to get around this then they have to maintain a list of sent messages that were rejected and resend. This takes time and resources to do, thus increasing the cost to the spammer.

  17. Nothing new..MXLogic was doing this 2 years ago by cubicleman · · Score: 2, Informative

    www.mxlogic.com

  18. Pahleez. Nothing new here. by Anonymous Coward · · Score: 1, Informative

    www.surbl.org nuff said?

  19. Sorry Guys, but it's been done a long time ago by by joemapango · · Score: 2, Informative

    The firewall I use does exactly what this company is claiming their new product does. I've been running it for years. It's Open Source to boot. It's called messagewall, and I think it's great. My (other) mail server receives between 100 and 700 spams a day, out of which I actually receive 1 or 2. I like it because it rejects the mail if it is spam before the sending server can actually send it.

    The down side, you have to load, compile, and build it. It's not too bad, even for a non programmer like me.

    CC