Mystery MLB Team Moves To Supercomputing For Their Moneyball Analysis
An anonymous reader writes "A mystery [Major League Baseball] team has made a sizable investment in Cray's latest effort at bringing graph analytics at extreme scale to bat. Nicole Hemsoth writes that what the team is looking for is a "hypothesis machine" that will allow them to integrate multiple, deep data wells and pose several questions against the same data. They are looking for platforms that allow users to look at facets of a given dataset, adding new cuts to see how certain conditions affect the reflection of a hypothesized reality."
"Supercomputer, is baseball still boring as fuck?" "YES, DAVE."
I'm sorry, but not even robots could make this game interesting. I say we sell it to the Japanese while they're still game.
I can take it or leave it, but a minor league game from wooden bleachers is a much better time for me.
What I find amusing is the obsession with statistics, considering the randomness of any particular game. But then I don't follow any particular team, its the spectacle of seeing it done. (And the main thing I appreciate is how unimaginable it is relative to my own abilities.)
They feared that it could be used to suppress protest or support unpopular rule.
Sorry, I RTFA, it was unavoidable.
Looks like they might actually use the horsepower.
They feared that it could be used to suppress protest or support unpopular rule.
My best guess is it's the Cubs.
They are looking for minority investors in the club right now, and the cost of ballpark improvements is a smoke screen for taking on the cost of big data. Theo has not been the same without Tessie, and it's not cheap to recreate the analysis that system is capable of performing.
I really wonder what the value of such a system is compared to updating / refining Nate Silver's PECOTA odds to play out hypothetical teams and transactions over a 5 year period. There is so much data available about players at this point, it's almost possible to predict regressions on a macro level.
they've been buying wins for almost 20 years now, nothing new
I'm a nerd, not a hacker, but yeah. I was on the fencing team.
They feared that it could be used to suppress protest or support unpopular rule.
It is likely the Boston Red Sox. There was talk of this at the Analytics conference in Boston a month ago.
How have the sox been buying wins since 1994?
It will be better to purchase from an owner who is a good farmer and a good builder.
...why haven't they been doing this from the start?
I am very small, utmostly microscopic.
"They are looking for platforms that allow users to look at facets of a given dataset, adding new cuts to see how certain conditions affect the reflection of a hypothesized reality."
Hypothesized reality? Oh you mean if a coach wanted to give a player performance enhancing drugs that they know they can hide to analyze the wins, or do you mean simulating reduced gravity because you plan to bilk the entire nation in taxes to pay for the next baseball stadium on the moon?
I don't think baseball needs a supercomputer to analyze just how bored I am watching men be paid millions of dollars to stand around 90% of the time in a grassy field, especially when that cost translates to the average American family spending hundreds at the ballpark for a single game.
They need to calculate what to do when players go on paternity leave.
"Fascism should more properly be called corporatism because it is the merger of state and corporate power." -- Mussolini
One of the great pleasures of baseball is that it generates a vast amount of data for the analytically minded to use and abuse to their heart's content.
This purchase is presumably related to MLB's recent announcement of a new system that will constantly track and measure the movement of the ball and every player on the field. Supposedly this is going to generate several terrabytes of information each game, and some team has decided to buy a Cray as a way of processing all that data. Whether that's a better idea than the proverbial Beowulf cluster I don't know, but that seems to be this team's thinking.
Most, maybe all, baseball teams have been doing some variant of advanced analytics for quite some time now. Most of this work is proprietary and secret, but there's been a lot of "open source" (or at least publicly available) work that's probably along the same lines. Sabermatricians (baseball stat people -- from "SABR', the Society for American Baseball Research) have gotten very good at measuring offense, and reasonably good at predicting hitters' future numbers. Nate Silver's PECOTA system is the most famous, but there are others that work about as well (ZiPS and Cairo being the ones I've spent time with, plus the "dumb as the monkey on Friends" system called Marcel). Pitching numbers are understood pretty well, at least as they relate to the Three True Outcomes, which are the results or a batter v. pitcher matchup that don't involve any defensive players (i.e., walks, strikeouts, and home runs).
The next great frontier of analytics is defense. There's been a lot of work in this field over the last decade, but the problem has always been in getting good data. If a ball is hit towards the shortstop and the shortstop doesn't get to it, why is that? Is it because the ball was hit too hard? Is it because the shortstop was badly positioned by his coaches? Is it because the shortstop isn't very good? Data that's not much more than "groundball to shortstop" can't really answer that question, but the new tracking system promises to answer that sort of question in full by precisely measuring reaction times, routes to the ball, and so forth. This in turn might lead to greater and greater changes in defensive positioning, different emphases in player acquisition, maybe even in-game changes based on small changes in wind patterns or whatever.
Some of what we're already learning about defense is very surprising. For example, there has been a lot of work done recently on catcher's ability to "frame" pitches, that is to make a borderline pitch look good. The most current results suggest that the pitch-framing difference between the best and worst catcher might be worth something on the order of 5 wins. That's roughly the difference between having a random scrub and an All-Star as your right fielder, and all from a catcher's ability (or inability) to fool the umpire. It's shocking.
As for what team this is, when the news first broke it was claimed that the purchasing team "would surprise most people". That rules out the teams that are well-known to be friendly to advanced analytics -- starting with the Red Sox, Yankees, Cub, and A's. The best guess I've seen is that it's the Phillies -- they have tons of cash and seem to be very behind on analytics, and seem likely to just go out and buy a supercomputer rather than have the MIT grads in their analytics department jerry-rig a bunch of Debian boxes into something cooler and weirder.
Psychohistory called, and they want their 5% of the profits.
A boffin to explain how these blinkly light things work and if they can run hadoop on the item of searing white hot technology (a LEO III) they have in the basement. In the hope that it can stop the English Cricket Team losing to the Dutch!
I was also a fencer. But I've spent the past 10 years playing catcher on a beer-league softball team.
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
I thought they already knew which teams are going to win, like wrestling?
they = the yankees and red sox, the red sox didn't start until the 2000's