Slashdot Mirror


Future Hack: New Cybersecurity Tool Predicts Breaches Before They Happen

An anonymous reader writes: A new research paper (PDF) outlines security software that scans and scrapes web sites (past and present) to identify patterms leading up to a security breach. It then accurately predicts what websites will be hacked in the future. The tool has an accuracy of up to 66%. Quoting: "The algorithm is designed to automatically detect whether a Web server is likely to become malicious in the future by analyzing a wide array of the site's characteristics: For example, what software does the server run? What keywords are present? How are the Web pages structured? If your website has a whole lot in common with another website that ended up hacked, the classifier will predict a gloomy future. The classifier itself always updates and evolves, the researchers wrote. It can 'quickly adapt to emerging threats.'"

23 of 33 comments (clear)

  1. Nothing New Here by sehlat · · Score: 1

    Precrime Division has had it for years.

  2. Isn't the correct answer: by jmauro · · Score: 1

    Given enough time all of the sites on the Internet will eventually be hacked?

    1. Re:Isn't the correct answer: by mark-t · · Score: 1

      Not necessarily true.... somes sites on the internet are not of general interest to enough people to ever draw the attention of somebody who might even want to hack it.

    2. Re:Isn't the correct answer: by Penguinisto · · Score: 1

      Exception:
      My ancient and long-dead first domain/site ever had never got hacked, and it never will: I shuttered it in 2001 (-ish) when I sold the domain name (spark.org). ;)

      --
      Quo usque tandem abutere, Nimbus, patientia nostra?
    3. Re:Isn't the correct answer: by K.+S.+Kyosuke · · Score: 1

      You seem to be assuming that being an HTTP server implies having security holes.

      --
      Ezekiel 23:20
    4. Re:Isn't the correct answer: by bloodhawk · · Score: 1

      a large percentage of attacks are performed by automated tools searching for targets. They don't give a shit if the site is of huge interest or your Granny's blog talking about how cute her poodle is. check your logs, even your home computers will be receiving regular port scans, and knocks on various ports/protocols to see if there is anything to attack.

    5. Re:Isn't the correct answer: by vux984 · · Score: 3, Insightful

      The premise was "given enough time...".

      By taking the site down, you limited the time.

      That's not an "exception", that's violating the premise.

  3. Mostly Wordpress, then. 50% accurate: all sites by raymorris · · Score: 5, Informative

    I see of the top "features" they identified, mostly is just various tags that mean Wordpress is in use. So they learned that Wordpress sites tend to get hacked. Duh. The Wordpress team isn't interested in security. I demonstrated an exploit for a serious vulnerability in Wordpress and submitted it to their bug tracker. For two years it sat, with one WP developer saying "it can't be exploited" - even though I attached an exploit directly to the tracker issue. Two years later, the vulnerability was added to a 'sploit kit and thousands of sites were compromised over the course of just a few days. That's when WP finally got around to patcing the clear and significant vulnerability.

    I see TFA claims "66% accuracy". "All sites will be hacked at some point" is about 50% accurate. I bet we could have 66% accuracy simply by saying "sites running PHP 5.2 or below will be hacked."

  4. 16% Improvement! by mythosaz · · Score: 2

    That's like a 16% improvement over the quarter I flip...

  5. Re:WordPress? by Penguinisto · · Score: 1

    True - and how is it that they say they're not counting vulns when that is precisely what they're doing (albeit counting past vulns and extrapolating...)

    --
    Quo usque tandem abutere, Nimbus, patientia nostra?
  6. 66%? Worthless trash... by gweihir · · Score: 3, Interesting

    I can predict for most sites that they will be hacked eventually, because they do not have anything resembling a secure set-up. But predicting when? That is impossible. Likely this tool gets even its pathetic 66% only dues to cherry-picked test data (also known as "lying" in scientific circles).

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:66%? Worthless trash... by iiii · · Score: 1

      My algorithm does better than 66% and I'm open sourcing it right here...
      (Predicts whether site will be hacked between now and the destruction of earth)

      public boolean willSiteBeHacked(Vector whateverYouFeelLike) {
              return true;
      }

      You can't disprove my claim.

      --
      Light cup, beer drink, thin so chain, neck turtle fat, man I won't say it again
    2. Re:66%? Worthless trash... by ThatAblaze · · Score: 1

      I'm pretty sure your algorithm would be worse than 50%. It basically amounts to "which even comes first? A) site gets hacked or b) site gets taken down."

      I think more sites get taken down every day than get hacked.

  7. ... accurately predicts .. by CaptainDork · · Score: 1

    66% = "could happen."

    --
    It little behooves the best of us to comment on the rest of us.
  8. RUns PHP? by certain+death · · Score: 1

    100% chance it will be hacked and used as a launching point for EVARYTHANG!!!

    --
    "My immediate reaction is "WTF? What kind of moron doesn't make things 64-bit safe to begin with?" Linus
  9. Results? by manu0601 · · Score: 1

    Is there a page somewhere where I can query the results to see how my own site goes?

  10. What a coincidence. by Kazoo+the+Clown · · Score: 1

    66% of all websites get hacked. So if you predict EVERY website will get hacked, you'll be right 66% of the time.

    1. Re:What a coincidence. by aaronb1138 · · Score: 1

      Wouldn't it just be easier to aggregate information from social media sites using a weighted system. Just put 4Chan at the top of the weighting, with Facebook next and use separate weighting scales for positive versus negative mention counts. Both are valid predictors, so it should work and get closer.

      I'm glad one of my side jobs is setting up IPS / IDP and similar security on firewalls. I'll never be thirsting for work.

  11. In totally unrelated news by Mr.+Freeman · · Score: 1

    New cyber security tool doesn't work!

    --
    -1 disagree is not a modifier for a reason. -1 troll, flaimbait, redundant, overrated are NOT acceptable substitutes.
  12. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  13. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  14. It's a confidence score. Normal for binary decisio by raymorris · · Score: 1

    The "inferred third value" is almost certainly the probability/score/confidence level, and it's normally included for machine-learning or any classifier algorithm, such as one that makes a yes/no decision based on a numeric value within a range. You'll see it a lot with spam filters. It's required because the USER choses at which threshold they wish to take certain actions.

    I'm going to use the spam filter example because that's one many people are familiar with, specifically Spamassassin. It will score a message like this:
    Body includes the word "free": 2 points
    HTML and text parts are different: 1 point
    Sent through an open relay: 2 points
    Tiny font: 1 point
    From address default whitelist: -3 points

    Adding up the scores, the total score for that email is 3 points. The server admin can configure how many points are required before an email is placed in the spam box, and how many are required before the email is deleted outright. Note that the choice of how high the score needs to be to be considered spam is completely separate from the algorithm generating those scores. One admin might be very tough on spam and decide that anything over 2 points is treated as spam. Another admin might be more lenient and set it to 4, so anything 4 or higher is treated as spam. The ROC informs the admin as to the results of different settings. A threshold of 2 will obviously have more false positives than a threshold of 4.

    Note again the choice of threshold to take some action is selected by the USER, not by the group who designed the algorithm. In the case of this predictive tool, a web hosting company might choose to have the following policies:

    No site with a risk score over 80 can be hosted on our servers.
    Any site with a score over 40 will be informed and our security team will offer assistance in making the site more secure.

    Those policies of what to do at different score thresholds are completely separate from the algorithm, the team who wrote the paper doesn't choose the thresholds for specific actions. Instead, the graph informs the web hosting company "at a risk score of 80, you can expect 5% false positives. At a risk score of 40, you can expect 15% false positives".

  15. The tool has an accuracy of up to 66% by TemporalBeing · · Score: 1

    So in other words it could be 0% accurate...

    --
    Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)