Slashdot Mirror


Using Graph Theory To Predict NCAA Tournament Outcomes

New submitter SocratesJedi writes "Like many technically-minded people, I don't have a lot of time to keep up with sports. Nevertheless, trying to predict the outcome of the NCAA men's basketball tournament is a fun activity to share with friends, family and colleagues. This year, I abandoned my usual strategy of quasi-randomly choosing teams and instead modeled the win-loss history of all Division I teams as a weighted network. The network included information from 5242 games played during the 2011-2012 season. From this, teams came be ranked using tools from graph theory and those rankings can be used to predict tournament outcomes. Without any a priori information, this method accurately identified all the #1 seeds in the top 5 best teams. It also predicts that at least one underdog, Belmont (#14 seed), will reach the Elite Eight. Although the ultimate test will be how well it predicts tournament outcomes, initial benchmarks suggest 70-80% accuracy would not be unreasonable."

25 of 91 comments (clear)

  1. past history by Collin · · Score: 5, Insightful

    wouldn't running the algorithm against past years' records and testing against past tournament results be the best possible test to tune the algorithm?

    1. Re:past history by PatDev · · Score: 5, Insightful

      I worked in a research group in college that worked on exactly this problem - predicting NCAA tournaments with a graph-theoretic approach. That is exactly how you test the algorithm. And the cited estimate of 70-80% accuracy seems made up. People who research the field know that there is far less certainty than that. At something like 20% confidence, your prediction should be something like 20%-90%.

      The problem stems from the fact that we traditionally predict a team will win if it is a stronger or better team, and we use our graph theory to produce relative team ratings. And if each game of the tournament were played over and over again with the winner of the majority going to the next round, then our methods would work even better. As it stands though, we are trying to predict a single sampling from a probability distribution - which will necessarily have error. Informally, the real tournament has upsets (when a weaker team beats a stronger one). Our algorithms can't predict these, the best they can do is gain a better understanding than humans as to which team is better.

      Add to that the fact that the tournament is structured hierarchically - a mis-prediction in the first round prevents you from even attempting to predict later games (and by NCAA bracket scoring, that counts the same as mis-predicting those later games). So early upsets can potentially have large negative outcomes on brackets.

  2. Predicting the top is easy by elrous0 · · Score: 4, Insightful

    Everyone knows who the big names are who are likely to make it to the final four. It's predicting how things will go at the middle and bottom, where teams are much more likely to be evenly matched, that's really hard.

    --
    SJW: Someone who has run out of real oppression, and has to fake it.
    1. Re:Predicting the top is easy by Bill,+Shooter+of+Bul · · Score: 2

      You mean like penciling in Butler for the championship two years in a row? Or the final four matchup of Butler vs VCU?

      --
      Well.. maybe. Or Maybe not. But Definitely not sort of.
  3. 70-80%? by Anonymous Coward · · Score: 2, Informative

    Okay, you can get 50% accuracy just by flipping a coin.
    If you go with "the higher seed wins", you get to 85% or so. Color me unimpressed.

    1. Re:70-80%? by MyLongNickName · · Score: 2

      Should be lower seed (I am the AC).

      --
      See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
    2. Re:70-80%? by MyLongNickName · · Score: 4, Informative

      And my numbers are off. In 2011, 43 times out of 63, the lower seed won for about a 68% win rate.

      --
      See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
  4. Re:How is this news? by Lunix+Nutcase · · Score: 2

    It's not. This is just a puff piece trying to drive hits to their site by mentioning the NCAA tournament.

  5. Re:Just take last years results by JayBean · · Score: 5, Insightful

    That may work for pro sports, but not for college sports. In fact, because teams usually lose their nucleus after winning it all (players declare for the draft), it is rare for a team to make it to the final game two or more years in a row.

  6. As a sports fan by jayhawk88 · · Score: 3, Interesting

    Some problems I see. Disclaimer: I know there's a margin of error here as the author said, and I know my observations will be based largely on anecdotal evidence, making it inferior. But if sports were so easy to predict there would be no sports gambling.

    - That's probably too far for Belmont; a #14 has only ever gotten as far as the Sweet 16, twice (Cleveland State '86, Chattanooga '97). Lowest seed to make an Elite 8 is Missouri in 2002 as a #12 . Belmont is actually going to be one of the more popular upset picks, but they would have to upset two far superior teams twice in 3 days.

    - It's a bit too "chalk". #1 seeds generally survive the first two games (undefeated against #16's, 55-14 v. #8's, 59-6 v. #9's), but the #2's have it worse (only four losses v. #15's, but 58-21 v. #7's and 29-21 v. #10's). I know two #12's, a #13 and a #14 doesn't seem like "chalk" but historically it's much more likely that we'll see more #5-7 or #10-11's. To have only one #2 not make the Elite 8 and all the #1's would be almost unheard of.

    - A #12 always beats a #5, but three of them doing so in one year would seem unlikely, as they're only 39-89 overall.

    - Some of the other first round matchups seem a bit improbably. It has every #6 and every #7 winning, for example.

    1. Re:As a sports fan by kenrblan · · Score: 2

      I didn't read the article (yet), but I put together a game result predictor a couple of years ago that I ran against the tournament field with about an 83% success rate for the whole tournament. It was in the 93% range for the first two rounds. My algorithm utilized season long team statistics to get a team's baseline and then incorporated strength of schedule and seeding components. Just like you mentioned about how far a team has historically progressed from a specific seed, I used historical analysis of seed matchups as another component. Essentially those historical #12 beating #5 type of matchups included a slight scoring boost to the worse seed. In some pairings, that modifier kicked the scoring over the top, but in others it didn't. It turned out to be quite accurate and even predicted the Murray State win over Vanderbilt, among others.

      I might make another run at tournament prediction this year using some different statistical metrics that are game pace independent rather than the raw scoring and defense that I used before. Game prediction simulators present unique challenges and are quite fun to work on, especially for nerds who also like sports.

      --
      Make everything as simple as possible, but not simpler. - Albert Einstein
    2. Re:As a sports fan by bjourne · · Score: 2

      It is not hard to create a model that works perfectly on observed data. But then you run into the problem of overfitting and your model loses any general predictability it had. To counter overfitting you need to have separate datasets for training and testing otherwise the model will depend on random details in the data. The proof of the pudding is in the eating and if you're model is good enough, you should be able to make money on sports betting on it.

    3. Re:As a sports fan by ThatsNotPudding · · Score: 2

      If there is a core of deep, personal knowledge about early upsets in the NCAA BB Tourney, it would definitely be at Kansas University (KU). Oh; I meant University of Kansas: UK. No, wait... what?

  7. The joys of single elimination by PPalmgren · · Score: 2

    March Madness is notoriously hard to predict, partly because of the number of teams involved and also because of the single elimination system that I love so much. Its prevalent in few sports and makes each game mean a lot more, also opening the door for cinderalla to take her 15 minutes of fame. 7-game playoff rounds like they have in Baseball and the NBA tend to nullify those outliers. I honestly think that's a big reason for the success of the NFL too - every game and every play means a hell of a lot more when the best possible record is 19-0.

  8. Re:Just take last years results by MonsterTrimble · · Score: 2

    I disagree - how good a team is can vary wildly year to year. Coaching changes, injuries, age, experience and so on can play huge roles in how a team performs especially on a collegiate level where there is so much growth between juniors and seniors in terms of development. This is less so in professional sports but still relavent.

    --
    I call it 'The Aristocrats'
  9. Doesn't matter if it works by Anonymous Coward · · Score: 2, Insightful

    Can you write a windows installer for it and sell it to gamblers?

  10. Re:Morale of the story... by kenrblan · · Score: 2

    Not quite. Picking winners =/= winning at gambling. Margin of victory, aka the spread, comes into play. That is a bit harder to account for in these types of situations involving so much human variable. Granted, being able to identify some potential upsets could allow someone to bet big on those and become potentially rich.

    --
    Make everything as simple as possible, but not simpler. - Albert Einstein
  11. I'll go with the squid by nightcats · · Score: 2

    Just have to find one with 32 tentacles. Or a large appetite.

    --
    Development is programmable; Discovery is not programmable. (Fuller)
  12. Re:Just take last years results by UnknowingFool · · Score: 2

    Yes but last year's tournament had 2 small schools, Butler and VCU, in the Final Four. While Butler made it to the championship 2 years in a row, they were a surprise both times. VCU has never made it that far in the tournament and there were some TV pundits that said they should not have been selected for the tournament at all when the bracket was announced. VCU got to the Final Four after the same pundits predicted they would lose in the next game for every single game.

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  13. Not enough time? by babyrat · · Score: 3, Insightful

    You don't have time to follow sports, but you have time to model "information from 5242 games played during the 2011-2012 season".

    You could be honest and just say you don't really care, but get involved in the playoffs because everyone else is talking about it.

    I'm guessing your level 80 warlock probably doesn't 'have time' either. :)

  14. Re:Call me when it works for stocks. by NatasRevol · · Score: 2

    Yes. If fairly valued at a PE of say 25 or so (which is still low for their growth rate), their stock should be at $875 or so.

    MOT, INTC, EMC, JNPR are all similarly valued. But have much lower growth rates.

    BIDU is the only large tech company with a similar growth rate. It's PE is 46, which would put AAPLs stock price at $1615.
    VMware has lower growth, but a PE of 60. AAPL would be at $2100 if similarly valued.

    http://www.google.com/finance#stockscreener

    --
    There are two types of people in the world: Those who crave closure
  15. Re:Morale of the story... by Bill,+Shooter+of+Bul · · Score: 2

    Ah, you would think that the casino sports book odds were the most accurate availibe and only determined by scientific study of the sports.

    BZZZT! Wrong. Casinos need to make a profit. So they determine the *initial* odds by studing the sport, but then change the odds in reaction to the bets that are placed. They try to have equal amounts on both sides of a bet. They pay less to the winners than they get from the losers.

    What's the point of pointing that out? Well, you have some pro gamblers who actually do make an incredible living off of betting on sports who use the above factlet. They simply move the odds the casino gives by placing money on the other side of the bet. So they want the odds to go up on a team A winning, they place a large bet on the opposite team B, the casion increases the odds of team A winning in order to attract gamblers to help them balence out the bet on team B. So they now place an even larger bet on team A with the odds they really wanted in the first place. If team A wins, it will cover the loss of the first bet on team B. If Team B wins, they obviously lose more money than they win.

    --
    Well.. maybe. Or Maybe not. But Definitely not sort of.
  16. Re:"Like many technically-minded people, I don't.. by Hatta · · Score: 2

    At least in Skyrim, you're an interactive participant. That, and Skyrim isn't just a polite way for people to act out their base tribalistic instincts.

    --
    Give me Classic Slashdot or give me death!
  17. Comment removed by account_deleted · · Score: 2

    Comment removed based on user account deletion

  18. Re:Nerd by tragedy · · Score: 2

    You know, for stereotypical nerd behaviour like communicating to each other in incomprehensible jargon and obscure references that other people don't get, obsessive behaviour, dressing up in ridiculous costumes for gatherings, etc, I've come to realize that nothing beats a hard-core sports fan.