A New Type Of Realtime Blocklist: The SURBL

← Back to Stories (view on slashdot.org)

A New Type Of Realtime Blocklist: The SURBL

Posted by timothy on Monday April 12, 2004 @09:02AM from the chicken-egg-spam dept.

Glamdrlng writes "The SURBL, or "Spam URI Realtime Blocklist", represents a nexus of RBL's and content filtering that may bring us one step closer to a spam magic bullet. While traditional RBL's perform a DNS lookup on the connecting mail server, SURBL's take this a step further by parsing the text of the email looking for URI's and doing a lookup on those web servers. They also prevent "joe jobs" by maintaining a whitelist of legitimate web servers whose domain names may show up in spam messages, e.g. EBay, Paypal, Microsoft, etc. The only requirement to implement the SURBL is a plugin on your MTA such as spamassassin that can parse the body of each email. While there is no MTA that directly supports SURBL's without a plugin, the author hints at one being in development."

10 of 219 comments (clear)

Min score:

Reason:

Sort:

Is this really a GOOD idea? by beh · 2004-04-12 09:03 · Score: 5, Interesting

Blocking URLs is an "ACTIVE" measure - and one that opens very bad
possibilities for abuse. While the While-List would protect against
this it will protect the BIG players on the market - it can still
wreak havoc on small/medium enterprises - e.g. a competitor of a
(pretty much) 'niche' firm could get a spam out advertising the
COMPETITOR in order to get HIM blocked...

Or - the other way around - a company gets itself a whitelisting
(via a "fake" joe-job on itself) and then continues spamming...

Please stick to PASSIVE measures! They can't be abused...
1. Re:Is this really a GOOD idea? by acariquara · 2004-04-12 09:12 · Score: 5, Interesting
  
  What ever happened to Bayesian Noise Reduction/Dobly algorythms? I was hoping for these to get more known and widespread...
  snip snip from their page
  Feb 24: We broke 99.984% today and caught up with CRM114 =). DSPAM is now around ten times more accurate than a human. According to a study by Bill Yerazunis (CRM114), a correspondence secretary is approximately 99.84% accurate at filtering spam. As of today, DSPAM has classified 3140 spams and 3457 nonspams in my mailbox with only 1 false accept and 1 false reject. The false accept was caused by a bug in the BNR code which was fixed, so depending on how you count it, I am getting either 99.968% or 99.984% accuracy. These are from real mailbox statistics, and not based on some 'test corpus' mail sent in. As spammers continue to try and evade filters, statistical filters such as DSPAM continue to adapt easily maintaining their high levels of accuracy.
  
  And no, I am not posting an URL. If you want to get to the page, google for "Dobly" (yes, that is the actual spelling) and go to the first page.
  
  --
  Dear aunt, let's set so double the killer delete select all
2. Re:Is this really a GOOD idea? by jelle · 2004-04-12 11:53 · Score: 4, Interesting
  
  Emails with paragraphs of random words are not very easy to distinguish from emails with paragraphs of actual language in nonspam emails. But emails with dozens of random links are better distinguishable from nonspam emails, so if the spammers start doing this, then you can filter out their spam even without having to check the SURBL by simply adding some points to the score of emails with a lot of links
  
  And if you use the auto-whitelist feature, then it won't increase the false-positive count, except for people who receive a lot of emails with lots of random links from lots of different people.
  
  Plus, the spam detection software may very well be capable of distinguishing between the decoys and the real spam-links by analyzing the context of the URI. At least that will be a lot easier than analyzing the grammar in an email and detecting the nonsensical paragraphs and the nonsensical/typo-ed words in spam.
  
  Sure, it's not the final battle, but it looks like a very promising improvement in the fight against spam.
  
  --
  --- Hindsight is 20/20, but walking backwards is not the answer.
3. Re:Is this really a GOOD idea? by delstar+dotstar · 2004-04-12 16:32 · Score: 4, Interesting
  Words that can be more than one part of speech are used fairly infrequently. Ten in a row is a pretty good giveaway.
  
  that:
  
  adj (Not this one, that one)
  
  dem. pron. (Look at that )
  
  rel. pron. (birds that sing)
  
  can:
  
  noun (a can of whoopass)
  
  verb (The boss is gonna can your ass)
  
  modal (I can swim)
  
  one
  
  adj ( one fine morning)
  
  pron (the one that got away)
  
  part
  
  noun ( part of speech)
  
  verb ( part the Red Sea)
  
  adj ( part man, part machine)
  
  used:
  
  verb (I used a hammer on the kitten)
  
  adj (a used car)
  
  ten
  
  adj ( ten fingers)
  
  pron ( ten in a row)
  
  row
  
  noun (ten in a row )
  
  verb ( row your boat)
  
  pretty
  
  adj (a pretty girl)
  
  adv (a pretty good giveaway)
  
  OK, that was a little snarky. Anyhoo, spammers can just extend the stream-of-random-words technique and create "sentences" that are syntactically kosher but semantically empty: Colorless green dreams sleep furiously. Hell, they don't even need to create sentences -- they can just pinch real, human-generated text from any old web site.
The whitelist will always be limited by Anonymous Coward · 2004-04-12 09:03 · Score: 5, Interesting

There are millions of legitimate sites, and most of them aren't major sites like ebay, yahoo, etc. If I want to do a joe-job on an enemy small site, I can cause them a lot of pain by including their link. They'll have a dificult time someone wasn't spamming on their behalf.
A plugin? by Pranjal · 2004-04-12 09:09 · Score: 4, Interesting

The only requirement to implement the SURBL is a plugin on your MTA such as spamassassin that can parse the body of each email.
Anything which requires extra software on the MTA or client side is not a simple requirement as it cannotn be implemented universally. This is doomed to fail.
Whitelist maintenance? by tepples · 2004-04-12 09:10 · Score: 4, Interesting

From the article:

This is a democratic effect, improved by manual de-selection of legitimate domains by SpamCop users when they submit their reports. More reports means more votes that a given site is indeed spam.

Though the article's author feels that "most SC users probably make an effort to uncheck legitimate domains to prevent false reporting," I have read reports that some mail server admins claim that SpamCop's users are rather likely to mistakenly report ham as spam. So the domain whitelist becomes important, but what practices have the SURBL administrators put in place to prevent corruption with respect to sites reported to whitelist at surbl dot org?
Then what happens when .... by Anonymous Coward · 2004-04-12 09:12 · Score: 5, Interesting

Spammers could then post their web sites as search URL's on Google, MSN, etc.. If you block those URL's then lots of people would complain that they can't send Google entries. Even if you solved that, then what happens with sites like tinyurl.com? If you block them then you have liability and legal issues to think about. No doubt the spammers will script up a number of ways to cloak the marketeers site urls.
DOSes and things outside of ones control by Corvar · 2004-04-12 09:12 · Score: 5, Interesting

This type of system is very abusable.

I know I have gotten spam reports from places like spam cop because people have included the URL of my website in their spam. My site had nothing to do with the spam other than the spammer was using an article on the site to back up his point of view.

This type of system could very easily be abused to blackhole many mailing lists.
Re:My proposed solution to spam by hacker · 2004-04-12 09:58 · Score: 4, Interesting

1. Spam isn't primarily coming from legitimate SMTP relays like Yahoo or Hotmail
You're kidding, right?
At least 80% of our incoming spam, brute-force attacks, and other SMTP violations are coming from behind legitimate hosts like AOL, Verizon, Blueyonder, RoadRunner, and so on. Not forged IPs that pretend to be those hosts, but actual IPs that return to those MXs.
Look at today's list of brute-force attacks so far.. (as of Mon Apr 12 17:55:53 EDT 2004)
Every single one of these lists gets collected and reported, per day, per provider, and to date, not a single one of them has done anything to stop the abuse. In fact, it keeps increasing every day. The more we block, the faster they come at us.