Slashdot Mirror


Researchers Create Highly Predictive Blacklists

Grablets writes "Using a link analysis algorithm similar to Google PageRank, researchers at the SANS Institute and SRI International have created a new Internet network defense service that rethinks the way network blacklists are formulated and distributed. The service, called Highly Predictive Blacklisting, exploits the relationships between networks that have been attacked by similar Internet sources as a means for predicting which attack sources are likely to attack which networks in the future. A free experimental version is currently available."

8 of 71 comments (clear)

  1. Re:Hmm... by elnico · · Score: 2, Interesting

    Somehow, I doubt identifying "troubling" sites is the limiting factor in Chinese internet censorship. More likely, the things holding back the censors are international pressure/attention, circumvention by their people, and the censors' own sense of decency, if that exists.

  2. Not really. by khasim · · Score: 4, Interesting

    So if this isn't predictive, what is? Would you rather they develop an algorithm that identifies blacklist-worthy addresses before they make their first attack?

    Ummmm, yes. If you can identify them BEFORE they make their first attack then that would qualify as "predictive".

    It captures the fact that "true" attackers mostly attack "true" (that is, weak or high profile) targets, whereas those targets are mostly attacked by "true" attackers.

    Not in my experience. The attacks are usually automated scripts running on zombies that randomly scan address (or search their immediate networks) looking for known vulnerabilities.

    Thus some isolated attack by a never-before-detected attacker on a never-before-attacked target has very little predictive potential in the eyes of the algorithm, whereas even just a few attacks by a never-before-seen attacker on several oft-attacked targets raises a huge red flag.

    That is the opposite of how their system was described. They looked for matches amongst IP addresses and then "predicted" that if your example machine one firewall it should be blacklisted for the other firewalls that closely matched that list.

    Now a real predictive system would look more factors.

    #1. Who was attacking.

    #2. How did the attacker(s) gain access to the machines used in the attack.

    #3. What other machines are vulnerable to #2 that are available to #1.

    Example - Spam zombies often appear in ranges of home addresses from the large ISP's. So machines in those ranges are given an increased score in SpamAssassin. Whether they have ever sent spam before or not. See #1 and #2 and #3.

    1. Re:Not really. by mcrbids · · Score: 5, Interesting

      Ummmm, yes. If you can identify them BEFORE they make their first attack then that would qualify as "predictive".

      Stock analysts make daily predictions based on past behavior. This is not only predictive, but if it wasn't for this past analysis, the predictions would be largely meaningless and highly inaccurate. Or do you want a computer program that can predict what you'll think before you actually think it?

      Not in my experience. The attacks are usually automated scripts running on zombies that randomly scan address (or search their immediate networks) looking for known vulnerabilities.

      How many high profile hosts have you overseen? In my experience, the random attacks you mention are found everywhere. But high-profile hosts are their own deal. I've seen very carefully crafted spam attacks directed at one of my client ISPs that would last anywhere from 3-8 hours. (one of the largest regional ISPs in my area) A typical spam attack would entail perhaps 250,000 deliverable messages. It was a constant game of cat and mouse with firewall rules and automated responses.

      I'd implement an anti-spam technology which would work for anywhere from a few days to a few months, while logging the repeated attempts to crack my solution. And then, the measure would be defeated and I'd be back to the drawing board while the mail cluster's load average spiked to 20.0 or so and users complained.

      One of my more successful ideas I called "Double Dribble". I'd identify spam that had been sent to a non-deliverable address, then returned to sender, then bounced with an invalid return address. I'd calculate the success rate of the source IP address and within 5 minutes or so, I'd have a spam source identified and blocked with a dynamic DNS RBL.

      That solution held off the spammer for almost a full year, until he/she/it began randomizing sending addresses so well that each IP address would send only maybe 10 emails every 24 hours, well below the threshold of Double Dribble. The address pool was insane - well over 100,000 unique IP addresses logged over a 24 hour period.

      Then greylisting was implemented, which stopped the spam dead in its tracks, and completely nullified the spam that Double Dribble couldn't stop. That's when I turned over the account to another party. I still use greylisting personally with great success.

      Now a real predictive system would look more factors.

      #1. Who was attacking.

      #2. How did the attacker(s) gain access to the machines used in the attack.

      #3. What other machines are vulnerable to #2 that are available to #1.

      No. A Real system would find out:

      1) Who was attacking.

      2) Send out the Russian Mafia after them to bust a few kneecaps.

      3) What other machines are attacking that haven't been attacked by the Russian Mafia.

      4) Send Chuck Norris after any attackers who are part of the Russian Mafia.

      5) Scan for Natalie Portman donkey porn and send a copy to you.

      6) ???

      7) Profit!

      --
      I have no problem with your religion until you decide it's reason to deprive others of the truth.
    2. Re:Not really. by nabsltd · · Score: 2, Interesting

      Then greylisting was implemented, which stopped the spam dead in its tracks, and completely nullified the spam that Double Dribble couldn't stop. That's when I turned over the account to another party. I still use greylisting personally with great success.

      For me, between greylisting and requiring strict RFC compliance for the "HELO" parameter, pretty much no spam gets through to even be looked at by SpamAssassin.

      For the "HELO" parameter, almost every spambot uses one of:

      • something that isn't a fully qualified domain name ("laptop", "Notebook", and "PC-200806211153" are some recent examples)
      • an IP address

      Neither of these are acceptable (according to section 2.3.5 of the SMTP RFC) as the "HELO" parameter.

      Then, I throw out a few more bogus things, like:

      • my host/domain name
      • my public IP address
      • domain literals (i.e., an IP address surrounded by square brackets) that have an IP address in a bogon range

      At this point, the e-mail gets to face greylisting, ClamAV, and SpamAssassin. About 1 in 100 "bad emails" get through to the end users.

  3. Enumerating Badness by giminy · · Score: 5, Interesting

    Every time I read some new whiz-bang security tool, I look back to Marcus Ranum's terrific The Six Dumbest Ideas in Computer Security article.

    This idea meets three of the 'dumb' criteria:

    1) Default Permit. Use of firewalls (even 'intelligent' firewalls) allows all traffic through, except that traffic that looks somehow bad.
    2) Enumerating Badness. Kind of like #1, you're blacklisting the bad stuff. There's a helpful chart in the article to show why this is dumb.
    6) Action is Better than Inaction. 'Nuff said.

    Reid

    --
    The Right Reverend K. Reid Wightman,
    1. Re:Enumerating Badness by RAMMS+EIN · · Score: 4, Interesting

      Well, two points here.

      First of all, security and spam are not the same. If one security threat makes it through to you, your security has been compromised. If one spam message makes it through to you, it's a little annoying, but no disaster. If, on the other hand, your "spam filtering" causes a legitimate message not to reach you, this is much worse. For spam, you err on the safe side by letting the message through. In security, you err on the safe side by blocking it.

      Secondly, while mjr's 6 "dumb ideas" aren't going to give you perfect security, it's not obvious how you _would_ get that, nor that you should not implement any of those ideas. For example: enumerating badness is certainly not going to allow you to recognize and stop all badness. However, it isn't clear how you _would_ do that. How do you determine if something should or shouldn't be allowed to enter your system? Perhaps having a list of things you _don't_ want on your system could be helpful.

      Enumerating badness certainly seems to work pretty well for email. With software, you can (really!) get away with making a list of what _is_ allowed on your system, and refuse everything else. With email, you actually _want_ messages you have never seen before from people you have never seen before, about things you have never talked about before. At least, most people do. On the other hand, spammers will often send lots of somehow similar messages. My spam filter, which I train based on lists of good and bad messages, correctly recognizes all good messages and something like 99% (it varies a bit) of bad messages. It doesn't keep the spam out, but it reduces it by a factor 100, without losing me any good messages. Is this a Dumb Idea?

      --
      Please correct me if I got my facts wrong.
  4. Re:Babies out with the bath water. by initialE · · Score: 3, Interesting

    Half of us here are for sender authentication, or at least verification. And half of us are for privacy and anonymity. These, to me, are conflicting goals. The sad thing is that there is overlap, that people want their privacy, not realizing that spam is exactly what that privacy brings. It surprises me that people can laugh at the implementations of DRM (But Bob and Eve are the same person! Hilarity ensues...) and not know that this is a very similar issue right here, (Bob wants his rights protected, but he doesn't want any riff raff Eve out there to contact him. But Bob and Eve are the same person! Not so funny now?) and it, like DRM, could very well be unsolvable.

    --
    Starbucks, Harbuckle of Breath.
  5. Re:Not really that "predictive". by rocketman768 · · Score: 2, Interesting

    What the heck does "highly" predictive mean?

    "Honey, the weatherman is on and he is highly predicting some storms in the evening."

    Maybe "highly effective" prediction?