Mystery MLB Team Moves To Supercomputing For Their Moneyball Analysis
An anonymous reader writes "A mystery [Major League Baseball] team has made a sizable investment in Cray's latest effort at bringing graph analytics at extreme scale to bat. Nicole Hemsoth writes that what the team is looking for is a "hypothesis machine" that will allow them to integrate multiple, deep data wells and pose several questions against the same data. They are looking for platforms that allow users to look at facets of a given dataset, adding new cuts to see how certain conditions affect the reflection of a hypothesized reality."
"Supercomputer, is baseball still boring as fuck?" "YES, DAVE."
I'm sorry, but not even robots could make this game interesting. I say we sell it to the Japanese while they're still game.
I can take it or leave it, but a minor league game from wooden bleachers is a much better time for me.
What I find amusing is the obsession with statistics, considering the randomness of any particular game. But then I don't follow any particular team, its the spectacle of seeing it done. (And the main thing I appreciate is how unimaginable it is relative to my own abilities.)
They feared that it could be used to suppress protest or support unpopular rule.
Sorry, I RTFA, it was unavoidable.
Looks like they might actually use the horsepower.
They feared that it could be used to suppress protest or support unpopular rule.
My best guess is it's the Cubs.
They are looking for minority investors in the club right now, and the cost of ballpark improvements is a smoke screen for taking on the cost of big data. Theo has not been the same without Tessie, and it's not cheap to recreate the analysis that system is capable of performing.
I really wonder what the value of such a system is compared to updating / refining Nate Silver's PECOTA odds to play out hypothetical teams and transactions over a 5 year period. There is so much data available about players at this point, it's almost possible to predict regressions on a macro level.
they've been buying wins for almost 20 years now, nothing new
I'm a nerd, not a hacker, but yeah. I was on the fencing team.
They feared that it could be used to suppress protest or support unpopular rule.
Why bother with moneyball? If your stadium is more than 10 years old, just whine you need a new one to provide the revenue to be competitive. You can threaten to leave for another city, promise to get an All-Star game, or just quit spending money on decent players for a while to convince the fan base that you really aren't competitive.
The Twins did a combination of all these things, but of course, the owners decided that more money in their pockets was the real goal as the new money from their shiny, taxpayer financed stadium hasn't bought new players and they have been .407 in each of the last two seasons and are working on a similar outcome this season, already making themselves comfortable at 1-3 in last place.
It is likely the Boston Red Sox. There was talk of this at the Analytics conference in Boston a month ago.
How have the sox been buying wins since 1994?
It will be better to purchase from an owner who is a good farmer and a good builder.
...why haven't they been doing this from the start?
I am very small, utmostly microscopic.
"They are looking for platforms that allow users to look at facets of a given dataset, adding new cuts to see how certain conditions affect the reflection of a hypothesized reality."
Hypothesized reality? Oh you mean if a coach wanted to give a player performance enhancing drugs that they know they can hide to analyze the wins, or do you mean simulating reduced gravity because you plan to bilk the entire nation in taxes to pay for the next baseball stadium on the moon?
I don't think baseball needs a supercomputer to analyze just how bored I am watching men be paid millions of dollars to stand around 90% of the time in a grassy field, especially when that cost translates to the average American family spending hundreds at the ballpark for a single game.
They need to calculate what to do when players go on paternity leave.
"Fascism should more properly be called corporatism because it is the merger of state and corporate power." -- Mussolini
One of the great pleasures of baseball is that it generates a vast amount of data for the analytically minded to use and abuse to their heart's content.
This purchase is presumably related to MLB's recent announcement of a new system that will constantly track and measure the movement of the ball and every player on the field. Supposedly this is going to generate several terrabytes of information each game, and some team has decided to buy a Cray as a way of processing all that data. Whether that's a better idea than the proverbial Beowulf cluster I don't know, but that seems to be this team's thinking.
Most, maybe all, baseball teams have been doing some variant of advanced analytics for quite some time now. Most of this work is proprietary and secret, but there's been a lot of "open source" (or at least publicly available) work that's probably along the same lines. Sabermatricians (baseball stat people -- from "SABR', the Society for American Baseball Research) have gotten very good at measuring offense, and reasonably good at predicting hitters' future numbers. Nate Silver's PECOTA system is the most famous, but there are others that work about as well (ZiPS and Cairo being the ones I've spent time with, plus the "dumb as the monkey on Friends" system called Marcel). Pitching numbers are understood pretty well, at least as they relate to the Three True Outcomes, which are the results or a batter v. pitcher matchup that don't involve any defensive players (i.e., walks, strikeouts, and home runs).
The next great frontier of analytics is defense. There's been a lot of work in this field over the last decade, but the problem has always been in getting good data. If a ball is hit towards the shortstop and the shortstop doesn't get to it, why is that? Is it because the ball was hit too hard? Is it because the shortstop was badly positioned by his coaches? Is it because the shortstop isn't very good? Data that's not much more than "groundball to shortstop" can't really answer that question, but the new tracking system promises to answer that sort of question in full by precisely measuring reaction times, routes to the ball, and so forth. This in turn might lead to greater and greater changes in defensive positioning, different emphases in player acquisition, maybe even in-game changes based on small changes in wind patterns or whatever.
Some of what we're already learning about defense is very surprising. For example, there has been a lot of work done recently on catcher's ability to "frame" pitches, that is to make a borderline pitch look good. The most current results suggest that the pitch-framing difference between the best and worst catcher might be worth something on the order of 5 wins. That's roughly the difference between having a random scrub and an All-Star as your right fielder, and all from a catcher's ability (or inability) to fool the umpire. It's shocking.
As for what team this is, when the news first broke it was claimed that the purchasing team "would surprise most people". That rules out the teams that are well-known to be friendly to advanced analytics -- starting with the Red Sox, Yankees, Cub, and A's. The best guess I've seen is that it's the Phillies -- they have tons of cash and seem to be very behind on analytics, and seem likely to just go out and buy a supercomputer rather than have the MIT grads in their analytics department jerry-rig a bunch of Debian boxes into something cooler and weirder.
I don't understand the attraction. Unless you have a family member on the team, who gives a fuck?
I don't watch other people swim, I go to the pool and swim myself.
Same with biking, or any other kind of sport.
Psychohistory called, and they want their 5% of the profits.
A boffin to explain how these blinkly light things work and if they can run hadoop on the item of searing white hot technology (a LEO III) they have in the basement. In the hope that it can stop the English Cricket Team losing to the Dutch!
I was also a fencer. But I've spent the past 10 years playing catcher on a beer-league softball team.
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
I thought they already knew which teams are going to win, like wrestling?
Sports Psychology and team culture is probably more important to the success of an organization, and like a bad classic Star Trek scene where Kirk asks the killer probe computer "what is love?", the Cray powered data mining system is going to struggle with this.
Players are not factory machines with predictable performance. The .300 hitter who swatted 46 home runs last year signed to the $40 million contract can get snapped by TMZ stepping out with his mistress in November, get sued for divorce in December, lose half his estate in February and be expected to show up for Spring Training in March ready to play ball. Let's see how the computer models his mental state as his insomnia is making him sleep 4 hours a night, the court give his house to his ex wife, the mistress left him, he's started drinking and eating poorly, and now he's struggling at bat to simply not look foolish let alone be the home run hero he was last year.
You want two words to confound the computer?
"Tiger Woods."
Every player on the PGA tour should have sent a Thank-You card to Elin Nordegren for the spurned wife rampage she went on when Tigers' dalliances were discovered, and the psychological nuclear disentegration it caused to Tiger's game. The man imploded and has largely never recovered.
These are wildly extreme examples. Most players are just streaky in general and it's all due to the wetware sitting atop their shoulders. Look at how Alabama's football team handles mental conditioning with a full time on staff Sports Psychologist:
http://cw.ua.edu/2012/01/26/sports-psychologist-teaches-mental-toughness-to-young-team/
"They are looking for platforms that allow users to look at facets of a given dataset, adding new cuts to see how certain conditions affect the reflection of a hypothesized reality."
WTF does that actually mean?
they = the yankees and red sox, the red sox didn't start until the 2000's