ESPN and TopCoder Run College Football Algorithm Challenge
Mike writes with a timely link to a story about the ESPN/TopCoder Winning Formula Challenge, a combination of fantasy football and competitive programming. The goal is to write an algorithm to predict the outcome of college football games using a collection of historical data provided by the tournament organizers. The season is broken up into 3-4 week chunks that are used to evaluate the results. Prizes will total $100,000.
What if someone has not played that team before, such as the game with Appalachin St. (sp?). Also, the algorithm is based strictly off of historical data, kind of like a Black and Scholes model for historical prices of stocks and should be used as a starting point, but not an end all prediction method. There are too many things to consider such as: crowd noise (maybe it's a new stadium?); player experience (there could be a lot of veterans on the team, which may mean less nervousness, i.e. less dropped passes, missed tackles, turnovers); coaches (new playbook and different style of offense); and so on.
Anything and Everything about the Net
If someone truly came up with an algorithm worth it's salt for predicting football games. Why not just go to Las Vegas and make a lot more than $100,000. :P
They tell you you can't use more than 1024MB of RAM, which is fari enough. They tell you you can't use more than 9 minutes of real time, but they don't tell you on what CPU it will run. They tell you what the name of the class and method should be, but they don't tell you what language they want - from the description it sounds like Java or C# (terrible languages for writing this kind of code), but they don't say which. The Infoworld link specifies C++, C#.NET, VB.NET, Java, and Python, which improves the description slightly. Why anyone would pick any of these languages when the problem is one ideally suited to Lisp, Prolog, or even something like CLIPS is beyond me.
I vaguely remember this being the reason I ignored Top Coder in the past - a good programmer picks the best tool for the job, and Top Coder only provides the option of using some of the worst tools for any job.
I am TheRaven on Soylent News
I vaguely remember this being the reason I ignored Top Coder in the past - a good programmer picks the best tool for the job, and Top Coder only provides the option of using some of the worst tools for any job.
Fair enough, but in the "real world" you often don't get that option. I suppose if you want a pure computer science competition, you should absolutely get to choose your language. A good computer scientist will pick the right language. A good hired gun will work within the constraints given.
It seems like you must be assuming a particular approach to the problem. Can you expand on why you think this problem is best solved with the languages you listed?
It's a shame that ESPN chose to do this through TopCoder, as TopCoder's general practices are poison for a machine learning contest. TopCoder chose to impose a gig memory limit and a nine minute runtime on any approach to this problem, which murders most machine learning tactics right out the door. It's a shame they didn't do this themselves on the NetFlix model, where contestants just submit predictions.
This contest isn't to get football predictions. It's to get football predictions under arbitrary ram and cpu caps. ESPN's staff wouldn't face such restrictions when using the work; there is literally no reason for this limitation to exist.
This contest's design precludes most modern approaches to machine learning to no appreciable benefit, and is therefore fundamentally flawed. ESPN is going to get seriously quality-limited results.
Very disappointing.
StoneCypher is Full of BS
Start with Home team 21 Away team 17:
+7 score for the higher ranked team every 12 positions they are ahead of the other based on a general ranking(I don't think they give you this information). Overall maybe 10 lines of code. Put Ohio State #1, put Temple #117. The rest of the rankings are an exercise for the reader. You won't pick major upsets, but you're not going to be too far off the mark otherwise.
That's about as close as you're going to get. College football varies too much to get exact scoring. A #1 team has 15 starters leave for graduation of the NFL, and they have nothing but freshman and sophomores starting. Any data on the previous year isn't valid for the next year.
Meh. The currently running Google Code Jam let people use whatever they wanted. In the last round (round 3), there was 1000 people. 3 used Haskell, 3 used Lisp, 1 used OCaml. Of those, 1 Haskell and 2 Lisp users got through to the round of 500 (and one of those was reid, who could have advanced using baling wire and twine). The remainder of advancers used C++, Java, Pascal, Python, and a couple other boring procedural languages. One poor fool used VB (me).
I think a good programmer can solve these kind of problems in any language. Sure some competitors might be more comfortable in one language or another, but in the end the meat of the solution is going to be the same anyways.
For TopCoder, their framework and style makes it hard to support a lot of different languages. And in terms of whether they picked the right ones, the statistics of what people pick when they have the choice suggests that they chose well.
Let's not stir that bag of worms...
I think a good programmer can solve these kind of problems in any language. Sure some competitors might be more comfortable in one language or another, but in the end the meat of the solution is going to be the same anyways.
Yes, the meat of the solution is going to be the same, but the implementation may not be. The same algorithm written in different languages will run in different amounts of time. I'm sure you know this already, but every from the language type (compiled, interpreted, etc) to memory management will affect running time.
If you don't need a garbage collector (which I don't think you would for this kind of task) then don't use a language that enforces one. If you don't want the overhead of OO programming then don't use a language that was designed for OO programming.
A few years back some of the more hardcore programmers would have argued that assembly was the optimal way of doing it. What the GP was getting at was that some languages just aren't suited to solve the problem because either they don't offer appropriate tools, or they offer useless inappropriate tools.
I just pooped your party.
Topcoder is popular in China now, many universities jion this program game
Sure there's differences in how much code it will take to implement an idea, and there's differences in runtimes, available libraries and what not.
I guess my point would be that for an interesting algorithm contest, none of those things are going to be much more than tie breakers. Now obviously if the problem is, "multiply these 100 digit numbers" it's going to be easier in a language that an arbitrary precision math class - but I wouldn't count that as much of an interesting contest. Similarly there might be problems that are nigh unsolvable in a certain language due to speed, but again that's not the case for a well designed contest. Things like garbage collection and what not are only going to enter in to a contest if it's very poorly designed (too tight of time bounds or a bad judge, etc..).
I guess my other point is that if "TopCoder only provides the option of using some of the worst tools for any job", and "good programmer picks the best tool for the job", then it's strange that the same set of guys that win at TopCoder tend to win at more open programming competitions (GCJ, the last ICFP, ICPC), and they do it - in large part - with those same horrible languages.
To the extent that language is important, and that the best programmers would tend to choose the best language I think empirically we'd have to conclude that some C variant is the best language for algorithm competitions.
My own view is that that's not really the case, and that its dominance is a combination of unrelated circumstance about it's availability and popularity combined with the fact that language choice just isn't that important.
Let's not stir that bag of worms...
TopCoder is coding for hire. You write the code in the language that whomever's paying for it wants to maintain it in.
Apparently the best algorithms submitted so far get about 75% of their win/loss predictions correct, which would be more than enough to make real money in a state such as Nevada that allows gambling on sports.
75% win/loss ratio is shit when gambling on college sports. There's no probability here - there is a major bias between teams and one could easily guess 75% by just choosing the higher ranked team. And win/loss means nothing since Vegas won't pay you money for choosing Ohio State over Montana State. They'll give you a spread, say Ohio State will win by 32, and you can choose to either agree with them or disagree. Obviously, if you pull for Montana State you win if they win, but you will win even if Montana State keeps the score closer than 32. Or you can pull for Ohio State and you win if they 'cover the spread', winning by 32 or more.
But honestly, the probability is that you are going to lose. Any time you gamble with the hope that some 18-22 year old, who could be hung over, just broke up with his girlfriend, or has the flu, is going to come through for you, you are a sucker.
"Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry