Researchers Create Highly Predictive Blacklists
Grablets writes "Using a link analysis algorithm similar to Google PageRank, researchers at the SANS Institute and SRI International have created a new Internet network defense service that rethinks the way network blacklists are formulated and distributed. The service, called Highly Predictive Blacklisting, exploits the relationships between networks that have been attacked by similar Internet sources as a means for predicting which attack sources are likely to attack which networks in the future. A free experimental version is currently available."
They take X firewall logs ...
Then they look for matches in attacking IP addresses between the logs ...
And if any IP addresses appear in log A (which is very similar to log B) ... then those IP addresses are "predicted" as being possible to attack the firewall from which log B was obtained.
Logical - yes.
Predictive - no.
This sounds ripe for abuse. For example, a heavy censorship nation like China could use this to block critical sites that they claim are 'attacking' them far more efficiently than their current human-based censoring.
http://twitter.com/OLDTELEGRAM
The problem with ANY "predictive" statistics (like racial profiling, for one glaring example) is that even when they become accurate enough to produce useful information, they tend to produce too many false positives.
And often (again using racial profiling as a good example), even a few false positives are too many.
This isn't going to work in the real world. Too many users you want to hear from at an ISP won't like it when the virus-victim spammers gets their whole network preventatively banned.
Stop fixing the mail protocols we have today. It's time to replace with some form of sender authentication.
Ummmm, yes. If you can identify them BEFORE they make their first attack then that would qualify as "predictive".
Not in my experience. The attacks are usually automated scripts running on zombies that randomly scan address (or search their immediate networks) looking for known vulnerabilities.
That is the opposite of how their system was described. They looked for matches amongst IP addresses and then "predicted" that if your example machine one firewall it should be blacklisted for the other firewalls that closely matched that list.
Now a real predictive system would look more factors.
#1. Who was attacking.
#2. How did the attacker(s) gain access to the machines used in the attack.
#3. What other machines are vulnerable to #2 that are available to #1.
Example - Spam zombies often appear in ranges of home addresses from the large ISP's. So machines in those ranges are given an increased score in SpamAssassin. Whether they have ever sent spam before or not. See #1 and #2 and #3.
Cause I come from the QuakeWorld days, and HPB means High Ping Bastard to me.
Yes, it does. Look at the spam zombies on the major ISP networks.
Now do the math about whether there are more home users on the big ISP networks or whether there are more companies running their own email servers.
If you're getting spam, 99.9%+ of the time it will be from a cracked machine on a home system easily identified as such.
Likewise, 99.9%+ of the legitimate email will not be coming from an ISP's home user block. If it is coming from that ISP's block, it will come from their mail servers.
Predictive goes both ways. Identifying what is probably good and identifying what is probably bad.
Every time I read some new whiz-bang security tool, I look back to Marcus Ranum's terrific The Six Dumbest Ideas in Computer Security article.
This idea meets three of the 'dumb' criteria:
1) Default Permit. Use of firewalls (even 'intelligent' firewalls) allows all traffic through, except that traffic that looks somehow bad.
2) Enumerating Badness. Kind of like #1, you're blacklisting the bad stuff. There's a helpful chart in the article to show why this is dumb.
6) Action is Better than Inaction. 'Nuff said.
Reid
The Right Reverend K. Reid Wightman,
Your post advocates a
(x) technical ( ) legislative ( ) market-based ( ) vigilante
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
( ) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
(x) It will stop spam for two weeks and then we'll be stuck with it
(x) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
(x) Requires too much cooperation from spammers
(x) Requires immediate total cooperation from everybody at once
( ) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business
Specifically, your plan fails to account for
( ) Laws expressly prohibiting it
(x) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
(x) Huge existing software investment in SMTP
(x) Susceptibility of protocols other than SMTP to attack
(x) Willingness of users to install OS patches received by email
(x) Armies of worm riddled broadband-connected Windows boxes
( ) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
(x) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook
and the following philosophical objections may also apply:
(x) Ideas similar to yours are easy to come up with, yet none have ever been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
(x) Countermeasures must work if phased in gradually
( ) Sending email should be free
( ) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough
Furthermore, this is what I think about you:
(x) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your house down!
... then they could warn the poor bastard she's going to attack next.
There is no knowledge that is not power.
I should not have written "any", much less emphasized it. Nevertheless, there is a strong tendency, and it behooves designers to take this into account. Probably I have become cynical, because so many otherwise intelligent people do not quite grasp the subtleties of predictive statistics and refuse to acknowledge this problem, even though it can be demonstrated with nothing more than simple middle-school-level math.
People who do not have at least a basic grasp of statistics should never be allowed to be politicians. Half or even most of what they do involves statistics to some degree.
to have someone poison the 'predictive' list, and suddenly behind such a system would lose access to google, the pirate bay, or demonoid? (c'mon, those are like, the only 3 sites other than slashdot I use!)
"the new HPB service will employ a link analysis algorithm to cross-compare firewall logs"
..
Snoooze
davecb5620@gmail.com
"Well, two points here .. First of all, security and spam are not the same"
Identifying spam is actually 'enumerating badness', which does lead to losing legitimate messages.
davecb5620@gmail.com
That's what we really need... (baggsy on the acronym BTW)
A network of mathematical values which define reputation relative to one another. We have a number of attempts at this in place just now, not the least of which are Slashdot Karma, Google Pagerank, Stumbleupon etc. The thing is that what may be a good reputation to one person may well be the antithesis to another, so simple averaging is inappropriate. Richard Dawkins for example is someone who will have a very high reputation among certain groups and very low among others.
I should be able to see a relative reputation of someone/thing based on those other things which I hold in esteem and the things/people which they hold in esteem.
Decidedly non trivial. We haven't actually worked it out in The Real World (tm) either, relying on branding instead.
Deleted
Wait until somebody spoofs somebody else's IP address and throws "attacks" with it at a few of the networks that submit logs. That would effectively block the IP from the spoofed address as the system would predict that the host is an attacker. Since TCP allows us to spoof almost any IP we want, we could get creative and spoof the addresses of the submitting members or even dshield itself.
Suck my double-precision floats, AC!
DRM: Terminator crops for your mind!