Using Graph Theory To Predict NCAA Tournament Outcomes
New submitter SocratesJedi writes "Like many technically-minded people, I don't have a lot of time to keep up with sports. Nevertheless, trying to predict the outcome of the NCAA men's basketball tournament is a fun activity to share with friends, family and colleagues. This year, I abandoned my usual strategy of quasi-randomly choosing teams and instead modeled the win-loss history of all Division I teams as a weighted network. The network included information from 5242 games played during the 2011-2012 season. From this, teams came be ranked using tools from graph theory and those rankings can be used to predict tournament outcomes. Without any a priori information, this method accurately identified all the #1 seeds in the top 5 best teams. It also predicts that at least one underdog, Belmont (#14 seed), will reach the Elite Eight. Although the ultimate test will be how well it predicts tournament outcomes, initial benchmarks suggest 70-80% accuracy would not be unreasonable."
wouldn't running the algorithm against past years' records and testing against past tournament results be the best possible test to tune the algorithm?
Everyone knows who the big names are who are likely to make it to the final four. It's predicting how things will go at the middle and bottom, where teams are much more likely to be evenly matched, that's really hard.
SJW: Someone who has run out of real oppression, and has to fake it.
Okay, you can get 50% accuracy just by flipping a coin.
If you go with "the higher seed wins", you get to 85% or so. Color me unimpressed.
You can get very reasonable results by just taking last years results. This works for most sports.
People have been doing this, either knowing or unknowingly, since the inception of sports gambling.
Some problems I see. Disclaimer: I know there's a margin of error here as the author said, and I know my observations will be based largely on anecdotal evidence, making it inferior. But if sports were so easy to predict there would be no sports gambling.
- That's probably too far for Belmont; a #14 has only ever gotten as far as the Sweet 16, twice (Cleveland State '86, Chattanooga '97). Lowest seed to make an Elite 8 is Missouri in 2002 as a #12 . Belmont is actually going to be one of the more popular upset picks, but they would have to upset two far superior teams twice in 3 days.
- It's a bit too "chalk". #1 seeds generally survive the first two games (undefeated against #16's, 55-14 v. #8's, 59-6 v. #9's), but the #2's have it worse (only four losses v. #15's, but 58-21 v. #7's and 29-21 v. #10's). I know two #12's, a #13 and a #14 doesn't seem like "chalk" but historically it's much more likely that we'll see more #5-7 or #10-11's. To have only one #2 not make the Elite 8 and all the #1's would be almost unheard of.
- A #12 always beats a #5, but three of them doing so in one year would seem unlikely, as they're only 39-89 overall.
- Some of the other first round matchups seem a bit improbably. It has every #6 and every #7 winning, for example.
... have a lot of time to keep up with sports."
Yes, if you enjoy sports, you must not be technically-minded. Tis for the plebes...
But, I bet you have time for Skyrim!
...you're rich! 70~80% accuracy beats the 70~80% of people who don't know/use/master the graph theory, thus you're gonna win 70~80% of online bets.
Slashdot, fix the reply notifications... You won't get away with it...
March Madness is notoriously hard to predict, partly because of the number of teams involved and also because of the single elimination system that I love so much. Its prevalent in few sports and makes each game mean a lot more, also opening the door for cinderalla to take her 15 minutes of fame. 7-game playoff rounds like they have in Baseball and the NBA tend to nullify those outliers. I honestly think that's a big reason for the success of the NFL too - every game and every play means a hell of a lot more when the best possible record is 19-0.
Can you write a windows installer for it and sell it to gamblers?
Like many technically-minded people, I don't have a lot of time to keep up with sports.
The word you're looking for is "Nerd". It's OK to say it, it's in the title-bar of Slashdot.
Just have to find one with 32 tentacles. Or a large appetite.
Development is programmable; Discovery is not programmable. (Fuller)
You're some way behind the curve if you want to make money sports betting on this. There is an extreme non-stationarity problem with basketball teams which inevitably means methods using past statistics will never be that successful. I know of professional basketball modellers who pay an army of students and the like to watch college games while clicking on hand-held devices to record second-by-second data on passes, interceptions etc. This data is then fed into their models and provides a very accurate picture of how a team is playing right now. They are then able to handicap the games and look for value where the line is wrong.
Between 70 and 80%. That's a HUGE difference. That means that compared to the other computerized systems out there you're either totally awesome or really suck.
That's like saying, "I did a lap in a Formula 1 car, and I'm either 15 seconds ahead of last year's world champion, or I'm a minute behind the field."
You haven't done this before, have you?
His statistical reasoning is always well described, so that if you disagree with his results, at least you understand why you disagree. He's got "picks" and a description of the system used to generate them.
The original article is an interesting network analysis exercise, but it is really limited by its assumption of no a priori quality data. (Any time you beat Kentucky or North Carolina or other perennial powerhouses, that's almost always a quality win.) Sagarin and LRMC follow similar logic, but without an explicit network piece.
See AAPL's PE vs growth rate for example.
You mean how low their PE is? By rights their stock should be up around $600/share.
The opposite of progress is congress
You don't have time to follow sports, but you have time to model "information from 5242 games played during the 2011-2012 season".
You could be honest and just say you don't really care, but get involved in the playoffs because everyone else is talking about it.
I'm guessing your level 80 warlock probably doesn't 'have time' either. :)
It seems in office pools I do the best by picking favorite team colors.
Yes. If fairly valued at a PE of say 25 or so (which is still low for their growth rate), their stock should be at $875 or so.
MOT, INTC, EMC, JNPR are all similarly valued. But have much lower growth rates.
BIDU is the only large tech company with a similar growth rate. It's PE is 46, which would put AAPLs stock price at $1615.
VMware has lower growth, but a PE of 60. AAPL would be at $2100 if similarly valued.
http://www.google.com/finance#stockscreener
There are two types of people in the world: Those who crave closure
Yeah, like when someone intentionally throws a game. As long as people are gambling (somewhere) and money is to be made, there is an opportunity and incentive to cheat. Get your graph theory to account for that!
Or maybe regression analysis is better like Levitt used to find cheating with Sumo wrestling and US student test takers in his book Freakonomics. (Awesome book BTW) ;)
At least in Skyrim, you're an interactive participant. That, and Skyrim isn't just a polite way for people to act out their base tribalistic instincts.
Give me Classic Slashdot or give me death!
Comment removed based on user account deletion
Skyrim isn't a polite way to act out my base tribalistic instincts? Am I offending people when I slaughter entire towns of virtual Argonian civilians?
Comment removed based on user account deletion
Comment removed based on user account deletion
There's too much data and too many variables. Even just inputting all the known, public data might significantly improve the accuracy, but there's also lots of unknown private data that can influence games. Algorithms like this can't account for things like the coach's son getting killed in an automobile accident the night before a game, or the star center getting hit with a bad flu. And when you make it complex enough to take in all that data, it still has to get all that data somewhere, which means it has to have access to all news feeds, and it has to be accurate at knowing which ones are appliccable, etc. etc... or you have to manually input all that data, which would take a horrific amount of time. In the end, it's so much easier to just intuitively account for things like that without using a computer, which I believe is why human experts are just as good as computers at predicting outcomes. We don't calculate the hard statistics as well, but we can account for the human element.
There is no -1 Disagree mod. Slashdot.org/faq defines mod options. USE IT.
... modeled the win-loss history of all Division I teams as a weighted network. The network included information from 5242 games played during the 2011-2012 season. From this, teams came be ranked using tools from graph theory ...
... you obviously don't have enough time to keep up with sports.
It must have been something you assimilated. . . .
Regarding the bracket, the four No 1 seeds march along, undefeated, until they meet in the final four. While this can happen, it seems like a trivial and unsophisticated result to me.
The problem with these predictions is, of course, that is the most likely scenario. There are enough other things that can happen that it is probably a worse than 50-50 shot, but there isn't another scenario that is more likely. Really, all any algorithm can do to beat picking the better seed every time is try to find spots where teams are seeded either higher or lower than they should be, and the very top and bottom of the list are probably not the most likely spots for this to happen.