Revolutionary Spam Firewall Developed
psy writes "physorg has a story on a new spam firewall developed at The University of Queensland.
The new technology is the only true spam firewall in existence, according to co-developer Matthew Sullivan.
"Existing anti-spam software filters out spam whereas ours puts up a firewall, stopping all email traffic and only allowing real mail through," said Mr Sullivan.
"In addition, our technology is accurate and fast. We recently completed a successful trial of a key layer of the spam firewall and it processed the emails at 90 messages per second, misclassifying only one out of 25,000 emails."
"It turned out that the software was even better than us, picking up spam we'd incorrectly classified as legitimate emails."
I think Barracuda Networks would rather disagree with the idea that this is the "only true spam firewall in existence," considering that Barracuda's entire product line consists of spam firewalls.
Damn fine spam firewalls, too, I might add. They handle around 115 messages per second, and can run up to eight filtering steps (including Bayesian analysis, which is similarly efficient to SVM, which the one in the article uses). Plus Barracuda's can do virus scanning.
I'm not sure how this is revolutionary.
This isn't a firewall as it doesn't filter based on addressing. Furthermore, the use of SVMs (support vector machines) to classify text is not new...
I know! Ciphertrust's Ironmail works the same way... It stops ALL mail inbound, runs it through about a dozen different detection queues, only letting legitimate stuff through. I'd really like to see how this new one is otherwise unique.
Ed R.Zahurak
You know, oblivion keeps looking better every day.
Yes and aparently there are 600,426,974,379,824,381,951 different ways to spell viagra!
;)
Will your algorithm do it with polynomial complexity
The there is the old trick of putting html in the middle of dodgey words.
Like: viag<!--xyz -->ra
Select Extrans from the drop down box :)
Support vector machines are actually quite a good machine learning tool -- try Wikipedia: http://en.wikipedia.org/wiki/Support_vector_machin e
Actually, the number is 1,300,925,111,156,286,160,896. He missed a couple of possibilities and had to update the page.
utter rubbish
That's how spamd works, and yes, it works tremendously well. I used to get 300 spam messages daily. I receive now one or two every week.
The best way to predict the future is to invent it
His algorithm doesn't need to. All it needs to do is check against an existing dictionary of words. If the word is not on the list, it is assumed to be misspelled. (If the good spelling of Viagra is in the dictionary, simply remove it so that any correctly spelled reference to Viagra counts as a misspelling too). If there are greater than X% misspellings in the e-mail it gets trashed. X can be a smaller percentage if the e-mail has any hyperlinks in it, because it is virtually guaranteed that someone is trying to sell you something...
Urge to post... fading... fading... RISING!... fading... fading... gone.
Our experience with greylisting has been (1) an 90%+ reduction in passed-through email (with no complaints from users about lost mail (yet)), (2) a dramatic decrease in server load because SpamAssassin doesn't see the message until after it gets past greylisting, and (3) people rediscover how useful email is once you get all of the crap out of their inbox.
Marketing Guy: What's the worst that could happen?
Dilbert: Our beta product could turn into an evil robot that annihilates the galaxy.
If missing one email is not acceptible to your business, then your business should not be using email ever anyway - email is not, nor has it ever been, a guaranteed delivery mechanism.
At our company, current just over 50% of all inbound email is detected as spam. Thus more than 50% of all our inbound email is spam, and the true figure (allowing for the false negatives which slip through) is probably in excess of 60% (and rising)
With a failure rate of 1 in 25,000, AND assuming that means a false positive rather than a false negative, then for our company taking into acount the volume of spam we receive it means 1 email in > 55,000 is wrongly identified.
I can assure you that our business is capable of coping with 1 missed email in > 55,000.
We certainly do not to business-threatening-essential transactions via insecure, non-guaranteeded publicly-transported email, and nor shoudl your business!
People should not be afraid of their governments - Governments should be afraid of their people.
Since you says its Slashdotted, heres the text.
Posted anon so no K-whoring.
----
The email spam nightmare could be halted in cyberspace by a groundbreaking firewall developed at The University of Queensland.
The new technology is the only true spam firewall in existence, according to co-developer Matthew Sullivan.
"Existing anti-spam software filters out spam whereas ours puts up a firewall, stopping all email traffic and only allowing real mail through," said Mr Sullivan.
"In addition, our technology is accurate and fast. We recently completed a successful trial of a key layer of the spam firewall and it processed the emails at 90 messages per second, misclassifying only one out of 25,000 emails."
"It turned out that the software was even better than us, picking up spam we'd incorrectly classified as legitimate emails."
A Specialist Systems Programmer at The University of Queensland, Mr Sullivan worked on the spam firewall concept largely in his spare time, only coming together this year to work on the project with Guy Di Mattina, a recent UQ Engineering honours graduate, and Dr Kevin Gates, a UQ mathematics lecturer.
Pivotal to the trio's spam firewall is the unique method of using a Support Vector Machine (SVM) to categorise emails. The only anti-spam software that analyses emails as a whole picture, rather than based solely on components such as key words or phrases, said Mr Sullivan.
"Using a SVM, we can train our spam firewall to accurately recognise legitimate emails to the extent that it can tell the difference between a pharmaceutical bulletin on Viagra and someone trying to sell Viagra," he said.
UQ's main commercialisation company, UniQuest, has formed a start-up company based on the technology and is seeking investment to take the spam firewall to market.
UniQuest Managing Director, David Henderson said the global cost of spam was estimated by the Radicati Group in 2003 to be $20.5 billion or $49 per user mailbox.
"With spam escalating and companies losing valuable employee time to deleting spam, UniQuest hopes to get this revolutionary spam firewall technology on the market quickly but it just depends on the level of funding we receive," said Mr Henderson.
Source: University of Queensland
Hell, there's even a product called the Mail Firewall that pops up if you google for mail firewall.
Heuristic analysis - detects and blocks spam by various email characteristics
Black lists - checks if the sending server is in RBL (Realtime Blackhole List), dial-up or open-relay servers
DNS verification - checks if the sender is using a valid mail server
Keyword blocking - blocks spam according to keywords in subject and body
Anti-spoofing - blocks email masquerading as coming from within the organization - a common spam technique
Cookies/web beacons - blocks email cookies which help spammers identify the recipient as a "live" email
Header verifier - inspects various header signatures and blocks spam
Textual analysis - categorizes spam according to textual content like mortgages, pornography, dental care, etc
Spam signatures - an auto-updating spam database allows detection and blocking of spam according to smart signatures
Spam URL filtering - blocks email with links to spam sources and sponsors
Spam image filtering - blocks email containing spam associated images
Auto-updating database - local or remote spam blocking database based on thousands of Spam collecting bots and web crawlers
http://www.esafe.com/esafe/anti-spam.aspeSafe
You just described greylisting. And it works extremely well. It is something all ISPs should be forced to implment immediately.
And for those that say this is a stop gap and won't be effective for very long, they are wrong.
The whole idea is to increase the cost to the spammer of sending out millions of emails. By greylisting they have to resend the same message at least twice, possibly multiple times, since they don't know how long the delay is.
On top of that if you combine greylisting with an RBL which is fed from a spam trap it is most likely that by the time the spammer resends the message to you a second time that machine is listed in the RBL. So the second attempt you let it in, check the RBL and reject the message.
Add spamassassin as the next line of defense and the few messages that do get through will get tagged and dropped in the spam bucket.
But the important part of all this is to increase the cost to the spammer. If they try to get around this then they have to maintain a list of sent messages that were rejected and resend. This takes time and resources to do, thus increasing the cost to the spammer.
www.mxlogic.com
www.surbl.org nuff said?
The firewall I use does exactly what this company is claiming their new product does. I've been running it for years. It's Open Source to boot. It's called messagewall, and I think it's great. My (other) mail server receives between 100 and 700 spams a day, out of which I actually receive 1 or 2. I like it because it rejects the mail if it is spam before the sending server can actually send it.
The down side, you have to load, compile, and build it. It's not too bad, even for a non programmer like me.
CC