Slashdot Mirror


A New Type Of Realtime Blocklist: The SURBL

Glamdrlng writes "The SURBL, or "Spam URI Realtime Blocklist", represents a nexus of RBL's and content filtering that may bring us one step closer to a spam magic bullet. While traditional RBL's perform a DNS lookup on the connecting mail server, SURBL's take this a step further by parsing the text of the email looking for URI's and doing a lookup on those web servers. They also prevent "joe jobs" by maintaining a whitelist of legitimate web servers whose domain names may show up in spam messages, e.g. EBay, Paypal, Microsoft, etc. The only requirement to implement the SURBL is a plugin on your MTA such as spamassassin that can parse the body of each email. While there is no MTA that directly supports SURBL's without a plugin, the author hints at one being in development."

28 of 219 comments (clear)

  1. Is this really a GOOD idea? by beh · · Score: 5, Interesting

    Blocking URLs is an "ACTIVE" measure - and one that opens very bad
    possibilities for abuse. While the While-List would protect against
    this it will protect the BIG players on the market - it can still
    wreak havoc on small/medium enterprises - e.g. a competitor of a
    (pretty much) 'niche' firm could get a spam out advertising the
    COMPETITOR in order to get HIM blocked...

    Or - the other way around - a company gets itself a whitelisting
    (via a "fake" joe-job on itself) and then continues spamming...

    Please stick to PASSIVE measures! They can't be abused...

    1. Re:Is this really a GOOD idea? by acariquara · · Score: 5, Interesting
      What ever happened to Bayesian Noise Reduction/Dobly algorythms? I was hoping for these to get more known and widespread...

      snip snip from their page

      Feb 24: We broke 99.984% today and caught up with CRM114 =). DSPAM is now around ten times more accurate than a human. According to a study by Bill Yerazunis (CRM114), a correspondence secretary is approximately 99.84% accurate at filtering spam. As of today, DSPAM has classified 3140 spams and 3457 nonspams in my mailbox with only 1 false accept and 1 false reject. The false accept was caused by a bug in the BNR code which was fixed, so depending on how you count it, I am getting either 99.968% or 99.984% accuracy. These are from real mailbox statistics, and not based on some 'test corpus' mail sent in. As spammers continue to try and evade filters, statistical filters such as DSPAM continue to adapt easily maintaining their high levels of accuracy.

      And no, I am not posting an URL. If you want to get to the page, google for "Dobly" (yes, that is the actual spelling) and go to the first page.

      --
      Dear aunt, let's set so double the killer delete select all
    2. Re:Is this really a GOOD idea? by DocSnyder · · Score: 3, Interesting
      Blocking URLs is an "ACTIVE" measure - and one that opens very bad possibilities for abuse.

      The SURBL is not blocking URLs but IPs where spamvertised URLs are hosted at. I've been doing this for about half a year, too - it's really effective in filtering spam as most spammers choose "bulletproof" ISPs whose netblocks are listed on SPEWS and SBL for that reason. Take Chinanet, for example - an email which is including a link hosted at Chinanet is almost always spam.

      I'd recommend not a single SURBL list but several ones, ranging from an in-progress DNSBL to a SPEWS-/SBL-like blacklist with the latter fed manually.

      If SURBL gains acceptance, spammers could choose bulletproof ISPs and have most of their spam emails filtered due to SURBL listings, or choose white-hat ISPs and don't get filtered but kicked.

    3. Re:Is this really a GOOD idea? by jelle · · Score: 4, Interesting

      Emails with paragraphs of random words are not very easy to distinguish from emails with paragraphs of actual language in nonspam emails. But emails with dozens of random links are better distinguishable from nonspam emails, so if the spammers start doing this, then you can filter out their spam even without having to check the SURBL by simply adding some points to the score of emails with a lot of links

      And if you use the auto-whitelist feature, then it won't increase the false-positive count, except for people who receive a lot of emails with lots of random links from lots of different people.

      Plus, the spam detection software may very well be capable of distinguishing between the decoys and the real spam-links by analyzing the context of the URI. At least that will be a lot easier than analyzing the grammar in an email and detecting the nonsensical paragraphs and the nonsensical/typo-ed words in spam.

      Sure, it's not the final battle, but it looks like a very promising improvement in the fight against spam.

      --
      --- Hindsight is 20/20, but walking backwards is not the answer.
    4. Re:Is this really a GOOD idea? by delstar+dotstar · · Score: 4, Interesting
      Words that can be more than one part of speech are used fairly infrequently. Ten in a row is a pretty good giveaway.
      • that:
        1. adj (Not this one, that one)
        2. dem. pron. (Look at that )
        3. rel. pron. (birds that sing)
      • can:
        1. noun (a can of whoopass)
        2. verb (The boss is gonna can your ass)
        3. modal (I can swim)
      • one
        1. adj ( one fine morning)
        2. pron (the one that got away)
      • part
        1. noun ( part of speech)
        2. verb ( part the Red Sea)
        3. adj ( part man, part machine)
      • used:
        1. verb (I used a hammer on the kitten)
        2. adj (a used car)
      • ten
        1. adj ( ten fingers)
        2. pron ( ten in a row)
      • row
        1. noun (ten in a row )
        2. verb ( row your boat)
      • pretty
        1. adj (a pretty girl)
        2. adv (a pretty good giveaway)
      OK, that was a little snarky. Anyhoo, spammers can just extend the stream-of-random-words technique and create "sentences" that are syntactically kosher but semantically empty: Colorless green dreams sleep furiously. Hell, they don't even need to create sentences -- they can just pinch real, human-generated text from any old web site.
  2. The whitelist will always be limited by Anonymous Coward · · Score: 5, Interesting

    There are millions of legitimate sites, and most of them aren't major sites like ebay, yahoo, etc. If I want to do a joe-job on an enemy small site, I can cause them a lot of pain by including their link. They'll have a dificult time someone wasn't spamming on their behalf.

  3. A plugin? by Pranjal · · Score: 4, Interesting

    The only requirement to implement the SURBL is a plugin on your MTA such as spamassassin that can parse the body of each email.
    Anything which requires extra software on the MTA or client side is not a simple requirement as it cannotn be implemented universally. This is doomed to fail.

  4. Whitelist maintenance? by tepples · · Score: 4, Interesting

    From the article:

    This is a democratic effect, improved by manual de-selection of legitimate domains by SpamCop users when they submit their reports. More reports means more votes that a given site is indeed spam.

    Though the article's author feels that "most SC users probably make an effort to uncheck legitimate domains to prevent false reporting," I have read reports that some mail server admins claim that SpamCop's users are rather likely to mistakenly report ham as spam. So the domain whitelist becomes important, but what practices have the SURBL administrators put in place to prevent corruption with respect to sites reported to whitelist at surbl dot org?

  5. Then what happens when .... by Anonymous Coward · · Score: 5, Interesting

    Spammers could then post their web sites as search URL's on Google, MSN, etc.. If you block those URL's then lots of people would complain that they can't send Google entries. Even if you solved that, then what happens with sites like tinyurl.com? If you block them then you have liability and legal issues to think about. No doubt the spammers will script up a number of ways to cloak the marketeers site urls.

  6. Works for me by Frisky070802 · · Score: 3, Interesting

    I am continually adding certain domain names to my spam filter, if found in text. I'd love it for this tool to do it for me, as long as I can trust the low false positive rate.

    --
    Mencken had it right. So glad that's old news.
  7. DOSes and things outside of ones control by Corvar · · Score: 5, Interesting

    This type of system is very abusable.

    I know I have gotten spam reports from places like spam cop because people have included the URL of my website in their spam. My site had nothing to do with the spam other than the spammer was using an article on the site to back up his point of view.

    This type of system could very easily be abused to blackhole many mailing lists.

  8. Spam is unavoidable by Klatoo55 · · Score: 2, Interesting

    We can't ever have a workable spam filter because of the adaptability of spam. However much you try, the spammers will come up with a way to circumvent your block. How long do you think that it would take for the spammers to figure out how to send emails that the whitelist software would mistake for legit? Nothing short of a trained monkey going through your inbox will sort this out effectively.

    --
    ------- "A true friend stabs you in the front." -Eliot
  9. so-called "remove me" links by imroy · · Score: 3, Interesting

    I've been playing with a honeypot email account for the last couple of months. Those "remove me from your list" links sure are a good way to get more spam (Spammers are lying scum). I hope this SURBL suggestion doesn't get implemented at the ISP level. Then I wouldn't be able to go the spammers site (carefully editing the URL as needed, and with Mozilla) and sign up my honeypot account for more penis enlargement spam!

  10. My proposed solution to spam by GillBates0 · · Score: 3, Interesting
    I don't know if anybody has tried this yet....and if not, why not:

    My suggestion is to present the user with those images containing a word (like the one used by Yahoo! etc during registration) everytime the user needs to send a mail (before clicking Send). This is a reasonably difficult Turing-type tests which could weed out a majority of automated scripts/spambots.

    An immediate problem with this scheme that I see is that for the words to be sufficiently random and crack-proof, they would have to be served in real-time to the mail program, and could need tweaks in current mail programs. A static list coded into the program might be too easy to break. This isn't too impractical, since an Internet connection is assumed during most email transactions.

    Another problem, ofcourse is that it will not work with text-based mailers like PINE, but as long as it weeds out all the spam sent from all the freebie mail accounts we could see an improvement.

    Comments/Suggestions?

    --
    An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
    1. Re:My proposed solution to spam by mabu · · Score: 2, Interesting

      This scheme doesn't work because:

      1. Spam isn't primarily coming from legitimate SMTP relays like Yahoo or Hotmail

      2. Ultimately to make such a system work, the mail would end up having to be flagged as "approved" by completing the process you suggest, which basically turns the scheme into a trusted-computing system (aka "whitelist"), and if you're going to go that route, you might as well call a spade a spade.

      And since we're calling spades a spades, the way to do it is to require all SMTP servers to have a "license". Create a regulatory body in the same manner the TLDs are done (but with some competence) and endorse a sanctioned SMTP whitelist. Then when you get e-mail, you can choose to accept only mail from licensed SMTP servers.

      Mark my words: This WILL happen. It's just a matter of time. It's the only way to stop spam. All the challenge-response systems; all the content-based filters eventually work because of NOT what they block, but because of the rules they use to determine what is legitimate.

    2. Re:My proposed solution to spam by Anonymous Coward · · Score: 1, Interesting
      My suggestion is to present the user with those images containing a word [...] everytime the user needs to send a mail This is not interesting but plain nonsense.

      Spammers don't use "freebie mail accounts", they don't need to. Even if the mails seem to origin from such accounts, they most probably didn't send it from there:

      If you know an open relay server, you just need to connect to it, and you can send any mail you wish, with any "From:"-Header you would like the receiver to see. Maybe hundreds or thousands of mails in a second.

      If you programmed one of those virii, we see every day, you may use the lusers mail-adress and send mails from his account, again with fake or his address. And of course there are many other ways to send spam...>p> What you are maybe really asking for, is a change of the protocol, but that would not look like "words in images" at all, but some kind of cryptographic extension authenticated machines and programs could do. Anything else you asked for would just hurt the normal user, but no spammer.

    3. Re:My proposed solution to spam by hacker · · Score: 4, Interesting
      1. Spam isn't primarily coming from legitimate SMTP relays like Yahoo or Hotmail
      You're kidding, right?

      At least 80% of our incoming spam, brute-force attacks, and other SMTP violations are coming from behind legitimate hosts like AOL, Verizon, Blueyonder, RoadRunner, and so on. Not forged IPs that pretend to be those hosts, but actual IPs that return to those MXs.

      Look at today's list of brute-force attacks so far.. (as of Mon Apr 12 17:55:53 EDT 2004)

      Every single one of these lists gets collected and reported, per day, per provider, and to date, not a single one of them has done anything to stop the abuse. In fact, it keeps increasing every day. The more we block, the faster they come at us.

  11. Bayesian spamming by Anonymous Coward · · Score: 1, Interesting
    How long do you think it will take spammers to add dozens of valid - but in the context of the spam nonsensical - URLs just to fill up the black-list and make it useless?

    Good point, but I think it will take very little time for developers to enhance spamassasin to mark anything as spam if it has more than, say, 5, URIs in it that don't point to the same domain. (If this feature isn't in spamassassin already.)
  12. Re:Too much work!!! by mabu · · Score: 2, Interesting

    I agree with you, but there are some cases, such as APNIC networks which, unless you have reason to communicate with China or Korea, it's much easier to simply put a 218.* reject in your sendmail access file and avoid all the overhead to call the RBLs.

    One problem we're seeing now is that some of the RBLs like Spamcop, automatically expire a blacklisted entry after X days. The spammers take advantage of this by playing around in huge Asian-Pacific blocks of IP space that give them plenty of addresses from which to rotate their spamming. One way around this is to blacklist the entire rogue regions, and then let the legitimate operations in those spaces contact you for permission.

    For example, if Bellsouth is operating in the 68.* domain, and the lion's share of their IP space are DULs which shouldn't be sending port 25 traffic, it's a lot easier to BL the entire block and then redirect users to a form where they can submit legitimate SMTP relays and have them whitelisted.

    The problem I have with RBLs (even though I love them) is that they're singly-IP-based, when there are some areas that just need to be wholesale blocked, and I've yet to figure out how to configure Bind to easily resolve IP lookups on blocks of addresses.

  13. Re:Time to dig out this old post. by ajs · · Score: 2, Interesting

    Not only is it both, but the suck factor seems to be heavily in the whitelist camp.

    Take for example the spammer who wants to get his spam through to me. He peppers his document with HREFs to Yahoo!, Hotmail, CNN.com, NASA.gov and a dozen or so other sites that are likely in the whitelist.

    Now I look at it and he manages to squeek by the initial origin lookup (e.g. he would have passed through traditional RBLs) and body check finds that *most* of the entires in the body are good sites, and only one of them is suspicious.

    Why maintain a whitelist at all if you're going to have to turn the gain down to the point that 20 good entires are drowned out by one bad?

  14. Re:Too much work!!! by Anonymous Coward · · Score: 2, Interesting

    For example, if Bellsouth is operating in the 68.* domain, and the lion's share of their IP space are DULs which shouldn't be sending port 25 traffic, it's a lot easier to BL the entire block and then redirect users to a form where they can submit legitimate SMTP relays and have them whitelisted.

    Assuming you setup and honor a whitelist form, maybe. I regularly setup legitimate businesses on DSL connections from BellSouth. These are busiiness accounts with static IPs but, certain organizations like RoadRunner and AOL have decided that ALL BellSouth IPs should be blacklisted and they aren't interested in making exceptions. This became such a problem that my company has setup an SMTP relay on another ISP for our customers to be able to send mail. That's something we never wanted to do and would love to stop but...

  15. Re:We adjust the frequency of the shields, by chris_mahan · · Score: 2, Interesting

    Naw, all they got to do is get link results off google for random words for each email they send out, that way, each email is a little different, and nearly all the links are valid.

    --

    "Piter, too, is dead."

  16. Re:No system that uses the content of an email... by interiot · · Score: 2, Interesting
    But if there are invariants associated with spam, then systems will be at least partially effective.

    Currently, spammers can create new spam relays only so fast.

    Currently, spammers want to receive money via credit card over the internet.

    Currently, it's hard enough to effectively spam that there aren't tens of thousands who are actively doing it, so blacklisting certain credit card vendor IDs could work.

    Currently, spammers want to make it harder to "follow the money" so they use crazy javascript stuff on the front page of their websites, and the crazy javascript is one clue that the trail you're following is spammy. (add it to all the other clues you find, and you have a score that you can use to make a yes/no determination)

  17. Re:No system that uses the content of an email... by Electrum · · Score: 2, Interesting

    It costs me $35 to buy my own domain and a one off payment of about $30 to zoneedit to set up the mail forwarding.

    Use a registrar like directNIC that has $15 domains and free email forwarding.

    But note that you don't have to have your own domain to use that method. MTAs like qmail offer extension addresses (user-*@example.com). Also check out spamgourmet for a more advanced approach.

  18. Re:Time to dig out this old post. by Carnildo · · Score: 3, Interesting

    Additional problem:

    (x) The whitelist feature can be abused

    As anyone who's spent any amount of time reading Slashdot comments should know, there are open redirect URLs on a number of sites that would be whitelisted under this proposal. On Slashdot, they were used to hide references to goatse. In spam, they can be used to whitelist spam URLs.

    --
    "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
  19. Re:We adjust the frequency of the shields, by Carnildo · · Score: 2, Interesting

    Another way to defeat this method would be to hack web servers, and put on files that redirect to the desired site. This has a lot of implications - legal and technical - but again gets into the same situation as before where blacklisting the site in the email would blacklist legitimate sites.

    You don't need to hack into them. I know that Yahoo has an open redirect URL -- it was used to disguise a link to goatse a while back -- and I suspect that most other major web sites have similar URLs.

    --
    "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
  20. Re:Too much work!!! by swb · · Score: 2, Interesting

    Do you have a *good* reliable geographicIP delegation table? I can never find one, and if I do, it's old, or grossly inadequate.

    I'd love to have one; I wouldn't necessarily *blacklist* APNIC, but I would definitely rate-limit the entire APNIC to 28.8kbps into my network. I'm not sure it would "end" anything, but it would slow down spammers and/or cause them to give up on us.

  21. Check out my Anonymous E-mail by KalvinB · · Score: 2, Interesting

    check out the anonymous e-mail through www.icarusindie.com

    Instead of a picture I just present a riddle or other question.

    A human can search Google for the answer in order to be able to send their anonymous message. A program would need to be written and trained to be able to do that specifically for my web-site. I'm confident only someone with an academic interest in such a challenge would do it. And so far it hasn't been abused.

    I use the same type of challenge but render the text to an image and add some noise on the Indie-Mail sign up page to keep bots off.

    I also use a server generated ChallengeID that must be present which prevents anyone from using any page but the one I offer to even attempt to submit the form. If you don't use my page, the challenge file isn't generated on the server and without the file the server will ignore the request to process the form. You are also never sent the question number or question in text form. Everything the server needs to know about what question you're supposed to supply the answer to is stored in the server generated file that never leaves the server. And everything you need to know is in an image.

    So far that hasn't been broken either. And if it is, I can adapt faster than bots can.

    Ben