A New Type Of Realtime Blocklist: The SURBL

← Back to Stories (view on slashdot.org)

A New Type Of Realtime Blocklist: The SURBL

Posted by timothy on Monday April 12, 2004 @09:02AM from the chicken-egg-spam dept.

Glamdrlng writes "The SURBL, or "Spam URI Realtime Blocklist", represents a nexus of RBL's and content filtering that may bring us one step closer to a spam magic bullet. While traditional RBL's perform a DNS lookup on the connecting mail server, SURBL's take this a step further by parsing the text of the email looking for URI's and doing a lookup on those web servers. They also prevent "joe jobs" by maintaining a whitelist of legitimate web servers whose domain names may show up in spam messages, e.g. EBay, Paypal, Microsoft, etc. The only requirement to implement the SURBL is a plugin on your MTA such as spamassassin that can parse the body of each email. While there is no MTA that directly supports SURBL's without a plugin, the author hints at one being in development."

18 of 219 comments (clear)

Min score:

Reason:

Sort:

Re:Is this really a GOOD idea? by beh · 2004-04-12 09:08 · Score: 5, Insightful

(one minor thing I missed before:

The advent of bayesian spamming brought spams that included whole paragraphs of random words - just so that your list would get more and more bloated...

How long do you think it will take spammers to add dozens of valid - but in the context of the spam nonsensical - URLs just to fill up the black-list and make it useless?
Re:It's a great idea by beh · 2004-04-12 09:11 · Score: 4, Insightful

...unless I would send out a spam with TONS of valid links on various sites that haven't got anything to do with the rest of the spam...

Boy - that list will be f***ed up pretty soon...
Present problem. by FreeLinux · 2004-04-12 09:12 · Score: 2, Insightful

Presently the only problem with this is that there are no plug-ins for the MTAs themselves yet. The plug-in is for spamassassin. That means that the message has to be transfered and passed onto Spamassassin before it can be dropped or tagged whereas, the other RBLs allow you to drop the connection before the message is transfered. This problem will be solved once there are plug-ins for the MTAs themselves.

But, I have to ask, why aren't existing RBLs like Spamhaus effective. They should be far more effective than the ~40% that I am experiencing.
1. Re:Present problem. by Phroggy · 2004-04-12 13:18 · Score: 4, Insightful
  Presently the only problem with this is that there are no plug-ins for the MTAs themselves yet. The plug-in is for spamassassin. That means that the message has to be transfered and passed onto Spamassassin before it can be dropped or tagged whereas, the other RBLs allow you to drop the connection before the message is transfered. This problem will be solved once there are plug-ins for the MTAs themselves.
  
  Sorry, but that's not because it's a SpamAssassin plugin vs an MTA plugin. That's because the SMTP protocol doesn't allow for what you describe.
  
  Let's say I'm an MTA. When you connect to me, the first thing you do is introduce yourself, then tell me the envelope sender and envelope recipient of the message you're about to send, then give me the full message including headers and body. My options for blocking the message are:
  
  Before you even connect, your IP could be blocked at the firewall level, so I'd never see you.
  
  After you connect, before you introduce yourself, I have your IP address, and can check it against a blacklist and/or whitelist, and give you an error and disconnect if I don't like what I find. I can also do reverse and forward DNS queries on your IP to make sure they agree.
  
  After you introduce yourself, I can compare your greeting against your reverse DNS, since that's how you should be introducing yourself. I can give you an error if I don't like it.
  
  After you give me the envelope recipient, I can make sure that domain exists, etc. (Side note: Verisign wants to break this; ICANN is currently not letting them.)
  
  After you give me the envelope recipient, I can make sure that e-mail address is OK - if it's my domain name and the username is somebody I know I'll accept it, or if it's a valid domain name somewhere else and your IP is on my LAN I'll relay it. Otherwise I can give you an error.
  
  If we've gotten this far, I must now accept the entire message, including headers and body. If there's something in the headers I don't like, too bad! If there's something in the body I don't like, too bad! I have to let you send the whole message.
  
  After I've accepted the message, if there's a problem, I can generate a bounce message to send back to you, assuming the e-mail address you gave me actually works. If that fails, I'll send an e-mail to my postmaster explaining what happened. Or if that's too annoying, I could just delete your message and not tell anyone.
  
  Existing RBLs work at step 2. Filtering based on message content can't happen until step 7. You could build it into the MTA, but MTAs are complex enough as it is; using something else (SpamAssassin, Procmail, whatever) is a better idea.
  --
  $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
  $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Re:Time to dig out this old post. by interiot · 2004-04-12 09:33 · Score: 4, Insightful
- (x) Users of email will not put up with it
We'll see.
- (x) Eternal arms race involved in all filtering approaches
One of the few constants is that there will be way for money to get from the target back to the original spammer or seller. (well, it's possible something more complex is going on and that's not the real goal of spam, but at the least, it's something that's remained constant for years, which is notable in the world of spam). So "following the money" is really based on an acceptance of the above criticism, and a realization that the arms race can never get around the money stream.
Filters may be lead to arms races, but does anyone NOT use them right now? There are few alternatives, namely things like making email non-anonymous / PKI, enacting large legal penalties along with huge international support, rejecting email from anyone you don't know, ....
- (x) Whitelists suck
Actually, it's a blacklist. Blacklists may suck, but it's possible they suck less than spam, and the proliferation of RBLs kind of implies that.
Sure, there might be a way to stop spam once and for all and then blacklists would be hated, but the very presence of a antispam-rejection-template implies that there won't be a magic bullet for a long time to come.
- (x) Sorry dude, but I don't think it would work.
The only way it CAN'T work is if money isn't the real goal of spammers, or if they make it hard enough to "follow the money" that other methods are easier/nicer.
Uh... No. by FreeLinux · 2004-04-12 09:34 · Score: 2, Insightful

Sounds like a great idea especially for home users or some such but, as soon as you look at the bigger picture things start to break down. First of all, what about legitimate mailinglists? Some of them have hundreds of thousands of addresses. You want the administrator to have to go through and click a web page for each and every address on the list? Never gonna happen.

What about corporate use? Many legitimate emails go to a dozen recipients almost like a mailinglist. Think of the lost productivity with the senders clicking webpages for each reply and forward. Think of the dreaded Everyone group. Well, that would be an advantage but, you start to see what I mean.

Your idea is very similar in concept to a few others in that it requires a cost, someone reading a picture and clicking a button for the message to transfer. This scheme is better implemented by the various proposals that invoke a computational cost for each message transferred, like those from AOL Yahoo and Microsoft but, even these proposals all have major drawbacks and no one is rushing to implement them.
"Everytime the user needs to send a mail" by markv242 · 2004-04-12 09:36 · Score: 2, Insightful

See the above list? Your post fits into:

(x) Requires immediate total cooperation from everybody at once

Also, accessibility, custom SMTP clients, yadda yadda yadda... but you've already realized your mistake so I'll stop now.
Re:Spam is unavoidable by rw2 · 2004-04-12 09:37 · Score: 4, Insightful

We can't ever have a workable spam filter because of the adaptability of spam.

This is because the solutions of the day focus on content instead of anonymity.

I've said it before, I'll probably say it again, get rid of unauthenticated email and the spam problem becomes a thousand times easier to fight. SPF and various RMX solutions exist in design today. If people want the spam problem to go away, that can be done today. Unfortunately people would rather piss and moan and call for legislation or perfect solutions than deal with these good ones today.

In the case of spam the perfect is the enemy of the good enough. We should stop spam today.
Re:A plugin? by interiot · 2004-04-12 09:45 · Score: 2, Insightful

So that explains why RBLs are so unpopular, right?
ANY technical solution is going to require extra work on the client side, so rejecting this outright is kind of rediculous unless you're advocating a purely legal, market-based, or vigilante solution.
Spam is getting to be such a problem that techies are setting up things like SpamAssassin for themselves and friends, and major ISPs are using RBLs. So this isn't really a problem.
joe jobs by selfabuse · 2004-04-12 09:45 · Score: 2, Insightful

One thing I've noticed a lot recently, is spammers including a big list of domains that have nothing to do with them in the text of thier junk mail. In the past 2 weeks, i've probally got about 10 spamcop reports for my customers, and in every case, my customer has had nothing to do with the junk mail, except for being listed in a list of about 15 URLs, that are not associated with the spammer. This system here says it has a whitelist for paces like ebay, paypal etc, but what about smaller people. They'd get blocked, and potentially lose business, due to something that they had absolutley nothing to do with.
Not sure you're getting it by juhnke · 2004-04-12 09:50 · Score: 2, Insightful

Having just configured my email server with a ROBUST array of RBLs, I think this is a fantastic idea. I've been using the body_checks feature in postfix and manually adding violating URI's to create my own blacklist for several months now. I would love to benefit from a shared list. I don't care much for the white-list feature, that seems to me to create a backdoor for the spammer. Combined, RBL and private blacklisting (IPs and Addys) allow me to block 6000 plus spam A DAY. That's for a mere 150 plus users. Server side spam blocking using only Bayesian processing is an immense processor drain as is server side virus scanning. Look at it this way. Spammers need to make money. To do so you must be presented a URI to complete a transaction to make that money. They cannot easily change this URI without incurring cost so it will always be in the spam. Spammers who try to include too much "sales" content in their spams instead of a URI will be caught by a secondary bayesian filter. P.s. We have been successfully blocking encrypted URI's for months now. It's an easy rule to set up and legitimate users will never encrypt a URI. It's really quite beautiful.
Re:A plugin? by MrChuck · 2004-04-12 09:51 · Score: 2, Insightful

You don't want MTAs to stop being MTAs. They really SHOULDN'T be looking at message bodies. And then making complex decisions (is http://1593985/ a decimal version of a member of the list?) or doing regexs.
If you want bloat and dysfunction like this, look at Exchange or Notes or Gropewise. It's a GUI, it's a calendar, it's a database, it's an MTA! well, it doesn't scale and it tastes like floorwax.
This is why sendmail developed the MILTER interface. Firewall-1 had a proprietary scanning interface (easy to develop clients with their kit, but a bitch to find out the SERVER's protocol and use that). "the future" (in 1999) promised a vendor free spec. Which still isn't there.
MILTER allows sendmail to speak to an external process that can do things to the headers/envelope or bodies. External can mean to another box (or group of boxes) or just another program.
This also lets you make decisions before the SMTP session closes.
I just with other MTAs had this available.
Re:No system that uses the content of an email... by interiot · 2004-04-12 09:56 · Score: 3, Insightful

The unlimited-email-addresses basically comes down to requiring a password before someone can send you email (eg. if everyone accepted *@their.domain.com, spammers would just pick some random characters for the left side. So you have to have some sort of checksum or hash built into the characters, but if everyone uses the same algorithm, spammers would be able to generate their own random list again. So the only way to make it work universally is to salt the hash with something like a private key).
If people have to get passwords from you before they can contact you, then... what do you do if you're an open source author... or if an ex from college wants to hook up again and googles you, and finds your website, but STILL can't contact you... or you want to sign up for match.com so that random women can email you.
Re:Won't catch JavaScript-constructed URL's by julesh · 2004-04-12 09:57 · Score: 2, Insightful

But if it this approach catches on significantly, the more sophisticated spammers will just build their URLs in scripts.

And the rest of us will just block all e-mail that contains scripts. Yeah, I can't wait for that to happen...
Re:Yet Another Stupid Spam Idea (YASSI) by ChaosDiscord · 2004-04-12 10:13 · Score: 2, Insightful

You don't have content-based filtering on other primary methods of communication. It's a federal crime to go through mail; (at least before Patriot) you needed a court order to tap phones. E-mail should be an equally sacred communication medium that shouldn't be subject to "strip searches" before it hits your inbox.

Ummmm, the hell? It's perfectly legal to go through mail. My own mail, naturally. And it's legal to tell someone else (say, a secretary) to go through your and filter it. Ditto phone calls. I agree, I wouldn't want the government or any other random person or organization rummaging through my email. But I'm more than happy to run a program to do it myself. I appreciate ISPs that offer me the service. (I'm less keen on ISPs that make the service mandatory, but that's another issue.)
And this whole boneheaded scheme will NEVER stop spam in the first place, so let's stop pursuing these efforts.

Attacking the source varies from extremely difficult to impossible. Spam filtering systems (especially multiple technique systems like SpamAssassin) are a good stopgap measure. Sure, the spammer is still wasting my bandwidth, that sucks, but having it disappear into my IN.spam folder reduces my irritation. Ultimately even legal measures have limitations as spammers move overseas. Too suggest that we should just give up and drown in spam (I'm at a 2:1 spam:ham ratio these days) until we get a full solution is foolish. Much like medicine, sometimes a cure isn't a real option; all you can do is treat the symptoms. I hope a real cure is on the horizon, we should certainly look for one, but in the mean time I'll treat my own symptoms.

--
Search 2010 Gen Con events
Re:No system that uses the content of an email... by interiot · 2004-04-12 12:52 · Score: 2, Insightful
Ahh, this guy states the problem succinctly:
- My "disposable" addresses aren't very disposable. ... When retiring an address has undesired effects, it's no longer disposable.
Re:Is this really a GOOD idea? by dgatwood · 2004-04-12 13:41 · Score: 2, Insightful
Random words are -easy- to distinguish heuristically from actual text. It just takes a good bit of time.... Start with these simple rules:
1. Alll sentences must contain at least one verb.
2. All singular nouns other than proper nouns are almost always preceded by an article or a possessive noun/pronoun. There may be words in-between, but these will generally be adjectives (or adverbs modifying the adjectives).
3. Prepositions are almost always followed by a noun within a handful of words, all of which are generally adjectives (or occasionally adverbs modifying those adjectives)
4. Sentences are rarely more than about twenty words long.
5. Sentences are rarely less than about five words long.
6. Words that can be more than one part of speech are used fairly infrequently. Ten in a row is a pretty good giveaway.
The English language, if you include slang, jargon, etc. is 3 million words or more. However, the core of the language (removing slang and specialized terminology) only contains about 600k word forms (616,500 from OED2), mostly by adding trivial endings to a core of about 300k words. Of those, only about 200k words are actually in common use anywhere in the world. The average educated person knows about 20k. The average educated person uses only about 2k in any given week. So start with a core of about 20k words, including all articles and pronouns. That should be enough to do a fairly accurate job of determining the legitimacy of a string of text.
For unknown words, assign it a probability of being a noun, verb, adjective, or adverb based on its ending and its position within the sentence. Give it only a 95% probability of being valid so that several in a row will dramatically lower the probability of a sentence being legitimate.
From this, you should be able to easily come up with a heuristic that determines with a high degree of confidence whether a string of text obeys each of the above rules. All sentences should obey most of those rules. Most sentences should obey all of those rules. Three sentences in a row that break more than one rule (even with vocabulary limitations) and it is probably either random noise, Ayn Rand, or unreadable gibberish that you won't understand it anyway. Some would argue that all three are an equivalence class, but this is somewhat subjective.
If random text generators get good enough to beat those rules, they will no longer be truly random and will run a high risk of being intelligible sentences---and potentially offensive ones at that---substantially reducing the likelihood of spammers using such generators.
--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Could be good, could be bad. by autopr0n · 2004-04-12 15:03 · Score: 4, Insightful

I see one major problem with this, which is that Spammers might now be able to cause problems for legitimate websites simply by including their URL in the a Spam.

I'm a little sensitive to this since a spammer is actually Jo-jobbing one of my domains (not autopr0n), and I get hundreds of "user unknown" messages every day, along with a handful of messages telling me "my" email was blocked. It's really irritating.

But, if it's done right, it could work out pretty well. In fact, this would actually be effective against a lot of the current Spam out there, and kill Spam with off-site images.

Anyway, let me throw one countermeasure out there. Suppose spammers start including commonly mailed URLs (such as those on hotornot, yahoo, etc) in their spams in order to decrease the usefulness of these things. If this thing gets popular, expect to see a lot of Spam include a lot of random URLs the way they now include lots of random words. You'll also start to see things like "Javascript decryption" and other techniques to prevent machines from figuring out which, exactly, URL it is that is being advertised, rather then random noise.

--
autopr0n is like, down and stuff.