Slashdot Mirror


Alternate Baseball Universes

Jamie found a NYTimes op-ed by a grad student and a professor from Cornell, outlining some research they did into alternate baseball universes. The goal was to find out how unlikely in fact was Joe DiMaggio's 56-game hitting streak, played out in the 1941 season. No one since has even come close to that record. The math guys ran simulations of the entire history of baseball from 1885 on — 10,000 of them. For each simulation they put each player up to the plate for each at-bat in each game in each year, just like it happened; and they rolled the dice on him, based on his actual hitting stats for that season. (Their algorithm sounds far simpler than whatever the Strat-O-Matic guys use.) The result: Joltin' Joe's record is not merely likely, it's basically a sure thing. Every alternate universe produced a streak of 39 games or better; one reached 109 games. Joe DiMaggio was not the likeliest player in the history of the game to accomplish the record, not by a long shot.

10 of 229 comments (clear)

  1. Re:If its so likely, they why hasn't it happened? by hedwards · · Score: 5, Insightful

    The most likely reason is that statistics isn't the appropriate method by which to study this problem.

    This sort of a study is really more about curiosity, it doesn't deal with things like changes to the way in which the game is played. For instance early on, and for quite a while later, it was common for a pitcher to pitch 9 innings every game, and in many cases to pitch both games out of a double header. Meaning more opportunity for errors and since batters get time to rest up, there's a bit of an edge under that style of play to the batter which doesn't exist today.

    That also doesn't include the variety of pitching which players see today or the fact that a player might get to see 3 different pitchers in a single game.

    Even the length of the season has an effect on how players play. None of those things are easily quantified, much less analyzed by statisticians.

  2. too simplistic by ndenissen · · Score: 5, Insightful

    From reading the article (which is light on the details) it seems like they used nothing but batting average, at bats, and games played.

    The problem is this doesn't control for variances in the quality of pitching. The chances of going that many games without running into a hot pitcher isn't accounted for.

    Imagine you average a 75% chance of getting a hit in any individual game. If you face three average pitchers, your chances are (.75)^3 but if you face a good pitcher an average pitcher and a bad pitcher it might be (.5)(.75)(1.0) which gives a different probability, despite the same average number of hits.

    In order to be realistic the calculation would need to account for the deviation from average in the ability of the pitchers (which would likely be higher 100 years ago because of fewer player and segregation, and now because of expansion, as compared to the 1950s)

    What they don't report is how often there are long (but not record) streaks in their model, so there is no way of knowing how accurately it reproduces reality.

  3. Re:How to Make Baseball Even MORE Boring? by pchan- · · Score: 5, Funny

    You don't understand. Baseball is so boring, the fans find the statistics exciting!

  4. Re:How to Make Baseball Even MORE Boring? by garett_spencley · · Score: 5, Funny

    I was once at a friend's BBQ and a lot of the other guests were really into sports and talking a lot about their various sporting events etc. I made a comment about how baseball was one of those sports that is fun to play but boring as hell to watch. One of the guys responded with, simply, "I disagree". To which I replied "You're right. It's pretty boring to play too." He wasn't very amused.

    Talk about a great way to make an awkward social event even more awkward :(

  5. Re:If its so likely, they why hasn't it happened? by ByteSlicer · · Score: 5, Insightful

    Because baseball players aren't dice?

  6. Anohter unreported weird fact by mrfantasy · · Score: 5, Funny

    In every simulation, a ground ball went between Bill Buckner's legs in the 1986 World Series.

    --

    -- Of course I'm paranoid. I'm a sysadmin.

  7. Re:You can't do statistics with a random # generat by kevinatilusa · · Score: 5, Informative

    It is not mathematically sound to do statistics with a random number generator. Computers do not actually generate random numbers, but instead, they can only make pseudo-random numbers that have a certain distribution. Any 'simulation' done in this way will always have a bias. In order to get correct statistics, you must actually compute the statistics. Sure, the proper way to put it mathematically would have been "we did a Monte-Carlo based simulation of the probability distribution of the longest hitting streak under our model due to the intractability of direct computation", but this is an editorial in the New York Times, not a mathematical journal! As a side note, just because a computation is performed on a set of pseudorandom numbers does not mean it is biased...usually the whole point of pseudorandomness is that the discrepancy between computations involving them and identical computations involving true random numbers will typically be quite small.
  8. Re:If its so likely, they why hasn't it happened? by Frequency+Domain · · Score: 5, Informative
    No bashing, it's not a bad question. The answer is because it still qualifies as a "rare event". The thing that's kind of counter-intuitive, but easy to demonstrate, is that having a particular rare event happen is rare, but having some rare event happen is common.

    A good illustration of this is the so-called "birthday paradox", which asks what's the probability of having duplicate birthdays in a group of n people (whose birthdays are independent of each other). Think of adding the people to the room one by one. The first person doesn't have any chance of having a duplicate birthday, because there's nobody else in the room. The second person has 1/365 chances of duplicating, 364/365 of missing the first one. Let's follow up on the misses, they're easier to work with. In general, if we've got k people in the room without a duplicate, that means they've used up k of the 365 days in the year, and the next person we introduce to the room has to miss all of those days to avoid a duplication. So the probability of everybody missing everybody else, by the time we get up to n people in the room, is (365/365)*(364/365)*(363/365)*...*((365-n+1)/365), which starts diving towards zero really fast. The probability of having one or more duplicates is 1 - P(no duplicates), which correspondingly climbs to one really fast. If you write a short program to do the exact calculations, you'll find that by the time you have 23 people in the room the probability is greater than 0.5 of having a duplicate, and by the time you get 57 people it's greater than 0.99!

    If you pick one particular person and ask what's the probability of duplicating that birthday it remains quite small. That's the difference between having a particular rare event rather than having some rare event. For a large enough group, some pair of people will almost surely share a birthday but the odds of it being you (or any other designated person) remain quite small.

    Just to preserve my computing geek cred, this is why you need collision resolution for hashing algorithms. You don't know which entries will share hash values, but collisions are almost certain to happen by the time you've loaded 3 * sqrt(Hash Table Capacity) values, e.g., if your hash table has capacity 10000 you will almost surely see a duplicate within the first 300 entries.

  9. Does Joe DiMaggio's Streak Deserve an Asterisk? by harryjohnston · · Score: 5, Informative

    This seems relevant:

    http://abcnews.go.com/Technology/WhosCounting/story?id=3694104&page=1

    Disclaimer: I'm not an American, so I know next to nothing about baseball - and care less!

  10. Re:If its so likely, they why hasn't it happened? by Anonymous Coward · · Score: 5, Interesting

    The early years tended to be batting competitions (in some ways like today's) rather than pitching competitions If by "early years", you mean 1920 and later, yeah.

    Otherwise, buddy, you're way off base.

    NL year-by-year stats.

    Look at those ERAs pre-1920. Before 1920, the ERA on the NL never significantly exceeded 3.00. After 1920, it never dropped below 3.3 or so, with the exception of a 2.99 in 1968, after which MLB made changes to the rules, amongst them lowering the acceptable height of the pitcher's mound.

    The time prior to 1920 was marked by pitchers such as Cy Young, Mordecai Brown, Walther Johnson, Ed Walsh, Christy Mathewson. You've probably heard of most of them.

    Here are the single-season MLB ERA leaders. Outside of Bob Gibson in the aforementioned 1968, you have to go all the way to Greg Maddux in 1994 at #48 all time to find a season after 1920 on the list. Barely 10 of the 100 lowest single-season ERAs in MLB history occurred after 1920. And that's only because Pedro Martinez in 2000 and Ron Guidry in 1978 tied with 9 others for #100 on the list. So only 8 of the best single-season ERAs happened after 1920.

    You need to research "dead ball era", and the response by baseball to "Black Sox". (Hint: just like the response to the 1994 strike, it involves the ball...)

    The fact that you got a +5 out of such a demonstrably incorrect post is a major indictment of the baseball knowledge of the Slashdot faithful.