Alternate Baseball Universes
Jamie found a NYTimes op-ed by a grad student and a professor from Cornell, outlining some research they did into alternate baseball universes. The goal was to find out how unlikely in fact was Joe DiMaggio's 56-game hitting streak, played out in the 1941 season. No one since has even come close to that record. The math guys ran simulations of the entire history of baseball from 1885 on — 10,000 of them. For each simulation they put each player up to the plate for each at-bat in each game in each year, just like it happened; and they rolled the dice on him, based on his actual hitting stats for that season. (Their algorithm sounds far simpler than whatever the Strat-O-Matic guys use.) The result: Joltin' Joe's record is not merely likely, it's basically a sure thing. Every alternate universe produced a streak of 39 games or better; one reached 109 games. Joe DiMaggio was not the likeliest player in the history of the game to accomplish the record, not by a long shot.
I know the statisticians among you are going to bash me with a cluestick for such a naive question, but I'll ask anyway - if this event is so likely to occur, then why hasn't it happened again?
We all know what to do, but we don't know how to get re-elected once we have done it
This doesn't take into account that once a player achieves an impressive hit streak he gets more media attention, people start asking him about Dimaggio's record, and every time he steps up to the plate he's a bit more nervous about it than the last time, making it slightly less likely that he'll get a hit.
The global economy is a great thing until you feel it locally.
I think this is basically a form of statistical proof that there is, indeed, a God.
Talk about the statistics of anyone at bat..
"He who can destroy a thing, controls a thing." --Paul Atreides, Dune
A batter with, say, a .300 batting average does not have a 30% chance of getting a hit each time he's up to bat. There are inevitably going to be days when you face someone like Randy Johnson or Roger Clemens or (cross fingers for 2008) Felix Hernandez. Couple that with some fantastic defenses and it's not surprising that a good batter goes a day without a hit.
So players have some days with a smaller chance of getting a hit than others (say, when the 2007 Mariners were running Horacio Ramirez out there with our god-awful defense). There's a reason advanced baseball statistics are more complex than what these guys did.
i found a typo streak!!!
"very alternate universe produced a steak f 39 games or better"
2 typos in a row!!!
unfortunately, not many of my comments are insightful, so with my batting average, you will have to refer to a parallel universe
there you will find that this comment contains something worthwhile reading. sorry
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
One of the key points mentioned in this article is when does the hitting game streak occur? They mention that it was much more likely to occur during the early 1900's which is known as the deadball era. The baseball wasn't as springy and they tended to use the same ball during the entire game. During that time it was more efficient to try and knock the ball between the holes in the fielders and get a double or single then to try and hit it out of the park.
I think it would be more impressive to take a subset of the data, and compare from 1930 up until the present. Of course, there have been other major changes to; glove sizes, introduction of the slider for a pitch, steroid use.
From reading the article (which is light on the details) it seems like they used nothing but batting average, at bats, and games played.
The problem is this doesn't control for variances in the quality of pitching. The chances of going that many games without running into a hot pitcher isn't accounted for.
Imagine you average a 75% chance of getting a hit in any individual game. If you face three average pitchers, your chances are (.75)^3 but if you face a good pitcher an average pitcher and a bad pitcher it might be (.5)(.75)(1.0) which gives a different probability, despite the same average number of hits.
In order to be realistic the calculation would need to account for the deviation from average in the ability of the pitchers (which would likely be higher 100 years ago because of fewer player and segregation, and now because of expansion, as compared to the 1950s)
What they don't report is how often there are long (but not record) streaks in their model, so there is no way of knowing how accurately it reproduces reality.
From the descriptions I've seen of their research, it seems that they're treating all games identically for the purpose of determining a typical season's behavior. While this may me necessary to make the computation tractable, it's not realistic, and introduces a sizable bias towards long hitting streaks.
In reality, a league is typically very imbalanced from team to team and from pitcher to pitcher (probably even more so in the game of the early 20th century than now). It's easier to get hits off of two successive average pitchers than it is to get hits both off of a very good and a very bad pitcher. For example (to oversimplify a good deal):
Say the league is split 50/50 between "good" pitchers (pitchers you'll get a hit off of 50% of games) and "bad" pitchers (pitchers you'll get a hit off of 80% of games). In a typical 20 game stretch, you'll encounter 10 good pitchers and 10 bad ones, and your odds of getting a hit in all 20 games would be (0.50)^10(0.80)^10, about 1/9537.
Under their analyis as I understand it, they'd replace all the pitchers by mediocre pitchers who you'd get a hit off of 65% of the time, and your odds would be (0.65)^20, about 1/5517.
This one assumption almost doubled your chances of getting a hit in all 20 games.
There are other biases as well going the other way (ignoring the effect of hitting slumps, for example), but this one jumped out at me.
Shouldn't we say that the probability of it happening was 1.0, because it did happen?
It seems to me that if their experiments report anything else, then either their models are erroneously inaccurate, or they got something else wrong.
... they didn't take into account my 162 game hitting streak in "The Bigs" on PS3. With settings on easy.
We figured out a long time ago that it's easier to elect seven judges than to elect 132 legislators.
Isn't this the same thing as saying that an large number of monkeys typing for a large period of time is more likely to properly re-create the complete works of tolkien ... by accident?
Involve baseball. There, fixed.
Also, go Jays.
Free the Quark 3 from asymptotic confinement! Bring your charm! Don't get down! All colours and flavours welcome!
Our simulations did something very much like this, except instead of a coin, we used random numbers generated by a computer.
It is not mathematically sound to do statistics with a random number generator. Computers do not actually generate random numbers, but instead, they can only make pseudo-random numbers that have a certain distribution.
Any 'simulation' done in this way will always have a bias.
In order to get correct statistics, you must actually compute the statistics.
maybe. for nerds? i doubt it. "the math guys"... ok, definitely not news for nerds -- too dismissive of the experts
Any guest worker system is indistinguishable from indentured servitude.
They took a bunch of measured statistics, ran a simulation with outcomes biased using said statistics, and then acted surprised when the simulation results ended up pretty close to what actually happened?
In every simulation, a ground ball went between Bill Buckner's legs in the 1986 World Series.
-- Of course I'm paranoid. I'm a sysadmin.
I think it's safe to say that most all statistics uses a random number generator. Computers *do* have the capability to produce true random numbers, as shown with /dev/random, which relies on an entropy pool and is suitable for cryptographic key generation.
By assuming the hitter's probability of getting a hit is equal to his season average the researchers don't take into account that most, if not all, batters have a higher batting average at some points in the season than they do in others. As one with experience in Monte Carlo simulations I know that taking that into account would complicate the analysis considerably, but I suspect their results would be a bit different if they even did something as simple as using a 10-game moving average of the batter's average.
This seems relevant:
http://abcnews.go.com/Technology/WhosCounting/story?id=3694104&page=1
Disclaimer: I'm not an American, so I know next to nothing about baseball - and care less!
No, it won't always have a "bias". Bias is a technical term here, and infact there is not likely to be bias. Bias is where the long-run average value of simulated variables is not equal to the actual average value of the thing you're simulating. For example, rolling a chipped die to draw numbers uniformly from 1 to 6 will probably cause this.
The problem with pseudorandom number generation tends to be dependence between samples (barring a more serious bug, which has happened... but this is always a problem, and there can also be bugs in the rest of the code anyway). Now this correlation is a problem for cryptography maybe, since there is intense interest in every bit of entropy in a very short signal, and a lot of clever guys hacking at it.
However in statistics, you basically just use the random numbers as "fuel" for a sequence of very stupid computations (more or less, glorified averages and averages of squares, &c.). The functions used in statistics are just too stupid to find out that the numbers have inter-dependence, so that they tend to give the same results for pseudo-random numbers as for real numbers. This is thanks to a lot of hard work from many fields, to improve pseudorandom number generators.
In fact, and as a tangent, theoretical computer scientists tend to believe that any randomness in an algorithm can be replaced by deterministic functions! (although they don't believe this as widely as they believe P!=NP). Since we can consider any statistical procedure an algorithm, the effect (at least philosophically) that this would have on many applied fields is mind-boggling. I would love to a proof and some general techniques for this "derandomization" - if there were one, we could finally absolve ourselves of our state of sin. (It would also imho inform the "free will" debate a bit.)
A lot of people think baseball is boring - today it is, but take it from a geezer, not always so.
/. - and this covers the alternate universe part as well.
I blame television. I can no longer watch a ball game on TV. Might as well be Entertainment tonight. They used to have a camera behind the backstop so you could see the pitch, the swing (from behind) and the infield. Another camera to go to the outfield, and maybe one for the infield. They game has strategy. It has finesse. It even has - to use a term no longer apparently known in the software world - elegance.
Now, it's unshaven bums (I'm an unshaven bum today, so OK) close-up, on the mound, with bad hair cuts curling from under the caps, with their follicles in high def while they spit and scratch. High res, high def and no sense of a team at work - if there still are any!
Baseball is a noble game of strategy, ruined by Madison Avenue's need to sell multi-hundred dollar sneakers to our kids.
Want to check out a good ball game? Here you go - http://www.startrek.com/startrek/view/series/DS9/episode/103565.html
So, if baseball's good enough for Klingons and Vulcans and Ferengi, it's good enough for
Now - get off my lawn!
Pathological kinda promises Path + Logical - but instead, you get stuck with pathetic.
Tickets, hot dogs and beer would be a lot more affordable.
Actually, baseball is very exciting compared to cricket.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
The parent and the post immediately after it make the same point. The parent is modded 1 and the one after is modded 5.
The point is that the statisticians didn't accurately model the chance of a hitless game.
Yeah, and so is the Cubs winning the World Series more than once in a hundred years
What?
Tell that to the casinos.
See, I think the quote you quoted really doesn't sum up the article at all, or at least, it's very misleading. The article doesn't say that streaks of 56 games are a common occurrence. What it says is that the fact that baseball history has a record of a 56 game hitting streak isn't that rare, given players historical batting averages.
Think of it this way. I flip a coin 10 million times and find the longest streak of heads. Say it's 50 (I have no idea how likely that is and i'm too lazy to actually simulate it, so i'm just going to make up numbers). I flip another 10 million coins, and find that the longest streak is 53. I flip another 10 million coins and find the longest streak is 40. I do this another 10,000 times and record the record in each case, and get a distribution for the longest streak in any given 10 million flips. Now, i'm not going to claim that 50 heads in a row isn't an extremely rare streak. But i can say what the odds are of getting a streak of at least 50 heads when i flip 10 million coins are.
That's what these researchers did. Simulated 10,000 seasons from 1871-2005 (using players historical batting averages as their likelihood to get a hit) and found what the longest hitting streak in each 'alternate history' is. Their claim is that 56 games is not an unusually high number (in fact It looks to be right about the median.) So sure, DiMaggio's streak was great, and incredibly rare. But it's interesting to see what the odds are of having a hitting streak that long.
If there's anything more important than my ego around, i want it caught and shot now.
Say hello to my little sig.
There is no such thing as true randomness. You can't measure something without effecting it.
Consciousness can effect randomness, as this Princeton page proves.
There's no feedback here.
:)
Don't forget that the makeup of teams, the behavior of other players, and even the rules of baseball all depend on what happens in the game. If someone was setting a 109-game hitting streak in the 1890s, then they would be facing more determined pitchers and probably better pitchers by the time they were more than 20 or 30 games into the run. It seems pretty good odds that would have changed their batting average for that year.
How are real hitting streaks distributed in time? Do they bunch up in the 1800s and early 1900s the way their simulations did?
Were there changes in the rules between the 1890s and the 1940s that might have reduced the effect of this kind of feedback mechanism?
Someone please mod parent Troll.
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
(if he was still alive) to read that no one has hit .400+ since 1930, since he hit .406 is '41.
Baseball is one of those things that if your father didn't like, you probably won't either. It seems to be a passion that is passed down from generation to generation.
Today, it seems a bit archaic or even boring to some because of the slower pace, but some of us appreciate it for those very same qualities.
Computers do not actually generate random numbers
That'll be a surprise to the multiple true random number generators build into most operating systems. There's many sources of random data in a computer. Timing between keystrokes, timing of mouse movements, network latency between packets, and of course hardware random number generators that use thermal noise as its source.
So to put it mildly, computers can, and DO generate truly random numbers that are completely unpredictable and free from bias.
(Oh, BTW, to do a Monte-Carlo simulation (which the referenced article is) you actually don't need true random numbers, you only need a pseudo-random source that's free from bias. Those pseudo-random sources do exist, and aren't that even that difficult to code.)
AccountKiller
You just have to slow down. Put away the x-box and the crackberry. It has it's own pace and ebb and flow. drink a few cold ones, fire up the grill. A nice saturdy or sunday relaxing watching a game is never wasted. Or go out to a park and see a real live game. Get out of your parent's basement and get some sun. Even better, call in to work for a mental health day and go to a park. If you don't have access to a mjor league park a minor league game can be fun too.
In an ADD culture, a nice relaxing baseball game can be great.
putting the 'B' in LGBTQ+
Interesting comparison made on this page, but I'm not sure if it is accurate. http://en.wikipedia.org/wiki/Don_Bradman#World_sport_context
You think that's a building. Now this is a building.
Among all batters and at-bats in the major leagues, the batting average hovers right around
It describes "the death of the
A top-quality hitter in an early era would have many more mediocre pitchers to exploit than in a later era. Similarly, a top-quality pitcher in an early era would have many more mediocre hitters to exploit than in a later one. It's easier to get hits off of two successive average pitchers than it is to get hits both off of a very good and a very bad pitcher. For example...
[pertinent example]
Your complaint about the inhomogeneity's affecting the outcome of the calculation would be more valid if the distribution of player abilities, match-ups, and the like were not uniform; but in fact they are. They do indeed make that assumption as you note, but it is a justified assumption.
What baffles me is why they are running "simulations" on this sort of thing rather than dealing with closed-form probability. It's cool to get the computers to do simulate baseball, and there's a market for that entertainment value, but you don't need to plink the pseudo-random number generator millions of times to find your way around the problem.
Bitches don't know 'bout my Blum Blum Shub.
Giving grad students unlimited computational power does NOT provide quality science.
Your assertion of slump effects sounds similar to "the myth of the hot hand" in analysis of basketball statistics. When you examine streaks, you find that the probability of making a shot following one successful shot is not any higher (or lower!) than the probability of making one after an unsuccessful shot. Same for "cold" hands. The distribution of shooting streaks really is, upon careful analysis, what you would expect based on examining only the mean. You don't gain anything by accounting for streaks, whether short or long or of highly-skilled or mediocre shooters.
I know the above analysis has been done for basketball, and I suspect there exists but have not seen a similar analysis of baseball hitting streaks. I suspect that using, say, a 10-game moving average as you suggest (or similar streak/slump accounting tactic) would be as useless in baseball as it is in basketball. (I'm much more familiar with baseball than basketball, both stats and as a sport).
So intuitively, you'd expect slumps and streaks to influence subsequent attempts. There's even a good body of argument from psychology, climate, environment, physiology (fatigue) etc. that there might. Actually examining the statistics, however, reveals that it just ain't so. and I think that is an earth-shatteringly important point; it's not about something trivial like sports, but about the power of statistical insight we can gain from physical reality. (Stephen J. Gould himself said he wouldn't begrudge anyone for thinking that baseball was as "earth-shatteringly important" a topic as anything else...)
Now that I have waxed poetic, I see that my captcha is "predict".
None of those things are truly random. Unless you are dealing with quantum effects, you are not dealing with something truly random.
In particular, timing between keystrokes is not at all random. In fact, one can use the timings between keypresses to figure out who the typist is!
The cake is a pie
... simulation runs have not yet identified an alternate universe in which a Slashdotter gets a date with Jessica Alba.
Have gnu, will travel.
That sounds like good eatin'.
Unless you are dealing with quantum effects, you are not dealing with something truly random.
From wikipedia on "electronic (thermal) noise":
In any electronic circuit, there exist random variations in current or voltage caused by the random movement of the electrons carrying the current as they are jolted around by thermal energy.
Is that quantum mechanical enough for you?
As for network latency between packets, while it may not be random on a quantum-mechanical level, it's still unpredictable unless you can get on the same lan segment as the target computer. The keyboard timings are taken on a small enough time scale that they're quite unpredictable, and not related to the typist.
AccountKiller
"Timing between keystrokes"
"timing of mouse movements"
"network latency between packets"
"hardware random number generators that use thermal noise as its source"
The term "truly random" is not something to be tossed around. Only truly random thing is the measurement of an unentangled quantum state. Something like timing between keystrokes clearly doesn't fit the bill(depends on age, sex etc.), unless it is inside the box and Schrodingers cat is walking on it, of course.
Network latency between packets might depend on your income (do you have money for a good qual. cable - with low stable latency?).
You might argue that the last one fits the bill, but I'm pretty sure you can't prove the outcome is unbiased - way too many factors to consider to know/prove how random it really is (although that might sound counter intuitive).
I guess you can get random number generators based on some radioactive source, which actually would be truly random.
To sum up: Don't say something is truly random just because it "feels" random.
After the streak ended, he started a new 16 game hitting streak. That means he hit safely in 72 of 73 games.
.408, a slugging average of .717, he faced four (4) future hall of fame pitchers, and he played in the 1941 All-Star Game (he went one-for-four, scored a run, and drove in a run). Source is http://www.baseball-almanac.com/feats/feats3.shtml
.412 and finished with a .406 average for the year.
During the streak Joe DiMaggio had a batting average of
During Joe DiMaggio's streak, Ted Williams actually had a higher batting average. William's batted
Joe DiMaggio had a 61 game hitting streak while playing for the San Francisco Seals in the Pacific Coast League in 1933.
This isn't how modern statistics is done. The pseudo-random number generators used in statistical research are entirely predictable when their initial seed is known, but are otherwise statistically random. They must obey certain requirements of "statistical randomness" that make the output look like pure entropy for essentially any form of real statistical examination, other than an attempt at determining the sequence directly. Monte carlo computation is always done with PRNGs so that the experiments are actually repeatable.
The modern PRNG, something like the Mersenne twister algorithm, is random enough that if you repeated the experiment done in the article with a true entropy source a bunch of times, and compared them to the results from running the PRNG based simulation, you should find identical distribution of results. If you don't, you'd have a statistical find much, much more interesting than baseball. The point of a good algorithm is that it won't have a visible "bias" in results.
Now, it is possible that a really bad PRNG could impact research results, and they have in the past. The RANDU algorithm is a particularly good example, but it was really, really, really astronomically bad. I doubt the authors used something like this.
-twb
As impressive as DiMaggio's streak is, I'm much more curious about Ted Williams' consecutive games on base streak. Joe D holds second place in this stat: 74. Williams got on based 152 out of 154 games in 1949, with a streak of 84 in there.
It is considered a MAJOR achievement if someone goes 60 consecutive games on base (rarer than even a 40 game hitting streak).
Since they ran the stats, they could have tested both...
I've also performed about 10,000 *ahem* "simulations" in which I hit a home run.
In real life though I rarely make it to first. Oh wait. We're talking about baseball?
I applaud the author for using (not "utilizing") a word that is slightly shorter than the correct word. "Alternative" would be a reasonable description of a universe that did not happen. "Alternate" means something like "oscillating". Presumably the author did not mean to imply that we are hopping back and forth between two universes. A pedantic quibble you say? Why yes, yes it is.
The man who bought 1 lottery ticket was much less likely to have won than the guy that bought 10, but yet most lottery winners only purchased a single ticket.
When you are trying to figure out why an unlikely event happened to a particular person, you must also analyze the parameters of that particular person. In other words, since 99.9% of lottery players only buy a single ticket, it turns out that only slightly less than 99.9% of winners are single-ticket purchases.
So, it really isn't too surprising that someone with a decent batting average ended up with this record because there are far more "decent" batters than there are "exceptional" batters and the difference between exceptional and decent plays a small role in probability to this extreme.
MODS - Parent post is not informative, it is flat-out wrong. A pseudo-random number generator is is considered unacceptable if it can't pass a Turing-like test - if I gave you two sequences where one was pseudo-random and the other was "truly" random, you would be unable to tell which was which using any statistical test you can dream up. If one of the sequences yielded biased results for some known distributional property, that would itself be grounds for rejecting it.
You're saying that computers _typically_ use thermal noise as a source of entropy for random-number generation?
The only one I'm aware of would be the Yamaha DX7 (and their ilk); that's how the random-noise low-frequency oscillator is fed.
Do daemons dream of electric sleep()?
It is not mathematically sound to do statistics with a random number generator.
Why do people like you get modded up, as you sound like you never even heard the word "Chi-square" before. People don't get modded up for being correct but for *sounding* knowledgeable.
You just got troll'd!
Modern Intel motherboards (i810 forward) and AMD motherboards (768 forward) have a hardware RNG (Random Number Generator) that IIRC is based on diode noise. That's straight up quantum randomness, and most modern Linux distros automatically detect and use it if available.
Range Voting: preference intensity matters
Joe D was my first cousin once removed. That is, he was my father's mother's sister's son. Yeah, I'm a geezer for sure. Unfortunately, as much as I love baseball, none of Joe's genes filtered down my way. I couldn't hit a big-league fastball on the best day I ever had.
This ain't rocket surgery.
Go back to bed, Chris.
"The result: Joltin' Joe's record is not merely likely, it's basically a sure thing. Every alternate universe produced a steak of 39 games or better; one reached 109 games. Joe DiMaggio was not the likeliest player in the history of the game to accomplish the record, not by a long shot."
.400 for a month, .300 for a few months, and a month at .156 thrown somewhere in there.
Is this just poorly written, or is their conclusion really this silly? The article seemed to say that they just took the player's batting average, and calculated how likely it is that he would get at least one hit in a game. How is this worthy of an academic paper? Basically, their outcome is: The higher a player's batting average, the more likely he is to have a streak longer than Dimaggio.. genius.
It doesn't account for any of the subtleties which make this type of streak rare and special. Off the top of my head, these include:
- Batting (and athletic performance in general) is streaky. Players always have a hot streak, where they bat
- Everyone has an "off night", which is all it takes to break the streak.
- If you're playing that well, they pitch around you. You might get intentionally walked, or thrown garbage and walked. So, you get less real at-bats to work with in some games.
- Some pitchers you just don't match up well against. That left hander with nasty junk is all but un-hittable.
- As the streak gets longer, the pressure gets higher, which impacts your performance. Just ask Paul Molitor, who had a 39 game streak in 1999, or Roger Maris who had health problems related to the pressure of beating Babe Ruth's single season home run record in 1961.
- As the streak gets longer, pitchers are more aware of it and pitch to you differently because of it.
Any of the above can put an end to a streak, all it takes is one game.
A conclusion of "it's basically a sure thing" is obviously horseshit, given the fact that the only guy who has even come close enough to the record to talk about is Molitor, and he wasn't even close. If your computer simulation says it's a sure thing, common sense says you have a flawed computer simulation.
The guys who dominated their simulations, in real life never approached a streal of DiMaggio's duration:
Ty Cobb: 40
Willie Keeler: 44
And, the one that should have been a real big clue to re-asses their analysis:
Hugh Duffy: 27
[i]Something like timing between keystrokes clearly doesn't fit the bill(depends on age, sex etc.)[/i]
You're not processing it correctly.
Yes, you can definitely find a correlation between typing speed and age. Can't really question that. But if you're using keystrokes as a random number source, you don't use the high bits. You use the low bits. If there's an even number of microseconds between keystrokes, it's a 0. If there's an odd number of microseconds between keystrokes, it's a 1.
I would find it extremely unlikely that [i]that[/i] particular number is biased based on age, sex, etc.
Obviously the same things can be used for network latency and mouse movement timing.
Breaking Into the Industry - A development log about starting a game studio.
Okay, okay, but what are the odds that Joe DiMaggio would have such a streak, and land Marilyn Monroe? Somebody needs to get on that simulation asap. Here are my statistics, by the way...
-- thinkyhead software and media
I just picked up Deep Space Nine Season One, so I gotta say... ...this is not linear.
but what did he do? Did he hit the ball 56 times in a row or what. Baseball isn't very popular around here.
-- Make America hate again!
Just in case there are any other baseball nerds here, check out this game - Out of the Park Baseball. Allows you to simulate baseball history as in the article, plus play as GM of a team. (I'm not involved with the game in any way, just a fan.)
I'm not American, but I did play baseball few times at school, so I know the basics. But I have no idea what is meant by a "hitting streak". He scored a home run in every game? Every innings? Or just got off home base? Or what?
So ... the model says streaks are probable. Reality says they're rare. So which is the more reasonable conclusion: that we're in a rare reality or that the model is not an accurate reflection of reality? Seems to me that the later is the better choice. Like seers of old, I don't care how interesting the theory and show of a model are, the only thing that matters is: "are the predictions accurate"
Baseball Overview
(This is a quick overview. I am omitting exceptions and details not needed for pedagogy.)
In baseball, there are two, nine-person teams, and four bases (home, first base, second base, and third base). The offensive team members take turns at batting, one at a time, with the other members off the field and not in play, while all nine members of the defensive team are in the field. The goal in baseball is to hit the ball thrown by the pitcher of the opposing team, then run around the bases, both starting and ending at home ("circling the bases"). This results in a "run," or one point for the batter/runner's team. Once the batter has hit the ball, however, the defensive team in the field can get the batter/runner "out," requiring him to leave the field without scoring, in one of three main ways:
1. By touching him with the ball when the batter/runner is not touching a base;
2. By catching a ball the batter/runner has hit in the air, before it has hit the ground; or
3. By touching a base to which the batter/runner must run (e.g., first base if he's just hit the ball) while holding the ball, before the batter/runner can touch the base.
Definition:: A hit occurs when a batter/runner hits the ball and reaches a base safely (i.e., without being made out). Yes, this can be confusing; it's not sufficient for the batter just to hit (i.e., make contact with) the ball; he has to also reach a base safely as a result.
There are lots of other ways the batter can be made out, but the most significant is the "strike out," the rules for which I won't bore you with now, but involve an inability to hit the ball thrown by the pitcher to begin with. Much of the strategy, and resulting fascination with the game, involves the game-within-a-game between the pitcher and the hitter.
When the defensive team has made three batter/runners out it is called a half-inning, and the teams exchange places (i.e., the offensive team takes the field and becomes the defensive team). Not surprisingly, when both teams have made three outs it is called an inning and, by rule, there are nine innings in a baseball game. As a result, every member of each team will be able to bat a minimum of three times per game; the average is something over four, I think. Thus, a batter has an opportunity to get a hit between four and five times per game.
A Hitting Streak
Definition: A hitting streak is a streak of consecutive games in which a particular batter gets at least one hit. As it happens, at the professional level only the very best players get a hit three out of every ten times they go to bat (a success rate, or "batting average," of 0.300); such "three hundred" hitters are greatly sought, even though even they fail at their job 70% of the time. (It's a hard game to play well; this is one reason baseball is called "the game of failure.") A typical player might have a batting average of 0.275. Another of the fascinations with the game is how this trivial batting average difference between the great and average player, 0.025, or a difference of one hit every forty times at bat, significantly affects play. But I digress.
A three-hundred hitter has a likelihood of 1 - (0.700)^4.5 = 1 - 0.201 = 0.799 of getting at least one hit in an average game (if we assume that he is at-bat an average of 4.5 times per game). His chance of doing so in 56 consecutive games, however, is (0.799)^56 = 3.52E-6. As others have mentioned, however, there are additional subtleties in the game. For example, if a pitcher is good, the opposing team will get few hits and quickly get its three outs per half-inning. This means, however, that there is a feedback effect: The average number of at-bats a player will have against a good pitcher is less than against a poor one, because his teammates are making outs instead of their own hits. This makes it harder for him to get the hit in the game to keep his streak alive -- instead of his 4.5 at-bats, he may get onl
Just because it's quantum mechanical, it doesnt mean it is truly random. Great care must be taken to claim that. The measured state have to be completely unentangled, which, cosidering how difficult this is even when u try, I seriously doubt electronic noise is. Depends on temperature, fx. What values would you look at in an electronic circuit to extract to true randomness? Can you point something out that is not somehow entangled to the environment? No, you cant.
Network latency is a really bad idea. Is he/she running bittorrent? Well, thats more likely if it's a teenager...
Keyboard timings is a laughable bad idea. Sure it is "hard" to predict keyboard timings, but with an arbitrarily large amount of computing power and data samples you prob. could do better than random guessing. Hence it is not truly random.
Now, I don't think you need truly random numbers to run monte carlo. I mean - you can't prove anything with that anyway, so you don't really gain anything with truly random numbers. The model doesnt seem to be very good and monte carlo is at its heart not useful for proofs.
Stats don't take into account the pressure that would build on a person as the streak got higher.
If you have ever had to hit free throws at the end of a basketball game you know what I mean.
To have that pressure over the course of a few weeks would be unbelievable and draining but emotionally and physically.
Even if one had the skill I don't know how many people could take the pressure.
It is also doesn't seem to account for pitching match-ups, weather, etc.
They don't take into consideration the increasing pressure that will be on you to continue that hitting streak, which may adversely affect your hitting ability.
OK, so, since DiMaggio had his 56 game streak 60 years ago, what are the odds (by their measurement) of a 55 game streak occuring some time between now and then?
Almost guaranteed, by their method.
What is the likelihood that we'd see no such streaks?
Almost none by their method.
Did we see any such streaks? No, no we didn't.
Pretty fatal flaw.
Something says their probability calculations were probably a bit off. There are two things they're not taking into account: First, as has been mentioned elsewhere, baseball is not a one-shot game. Things happen as a streak picks up, and you necessarily end up with less and less data about what those things are as a streak progresses (since you have fewer and fewer samples of behavior on a 20-game streak, a 30 game streak, and so on). The game changes, the players change, the pitches change.
Second, players, pitchers, and the game itself have changed over the last 60 years. One reason we may not be seeing DiMaggio's records hit again, is that the skill levels are just so rarified.
I'll make millions for reinventing (true) random...
Joltin Joe was not affected by the pressure late in the streak. For many people, the knowledge that they were in a history making streak would affect their average hitting rate. Some excel- some crumble.
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
The competition was also much weaker in the earlier years of "Major League" baseball. Until 1947, all the best Latino and African-American players were excluded. Could you imagine what the competition would look like today if the major leagues were re-segregated? Pujols, Santana, Bonds, A-Rod, Jeter, Ortiz, Ramirez, etc. etc. ... all gone. The "Major Leagues" would be a joke, and it would debatable which league was better.
And even in 1947, there were only two non-whites in the majors, Jackie Robinson and Larry Doby. It was symbolic, but most of the best non-whites were still excluded. IIRC, the Tigers and Red Sox didn't hire their first non-white players until 1959. I've read that, if you had the talent of Willie Mays or Hank Aaron, sure you could get a job. But if you were just an every day player, or a backup, there was no room for you on the roster.
I once posted the question to rec.sport.baseball on Usenet: At what date could we say that, substantially, the number of non-white players in the majors was based on ability and not skin color? The consensus, FWIW, was the 1970s.
Just use a bubble out of the game "trouble" and hook up a camera.
It would be interesting to see if they could say the same thing about some star money managers. How much is skill, and how much is luck?
Others have already mentioned technical changes in the game affecting the probabilities. History also changes the probabilities. In 1942 we were in World War II, and there was a draft. Some good players were already gone, making it easier for the outstanding remaining players to set individual records.
Contribute to civilization: ari.aynrand.org/donate
The Walrus, a Canadian magazine, ran an article about this a while ago:
http://www.walrusmagazine.com/articles/2007.10-joe-dimaggio-56-games/
I don't have time to re-read it right now, but as I recall the basic thesis was that it's *highly* likely that the streak didn't really happen, and that it was...ahem...aided along by willing assistance on the part of officials.
Good grief. What would Charlie Brown say? Our nation turns it lonely eyes to you.
So...is there a difference between this type of assistance and the...blurg...assistance the modern home run kings had in their pursuit of Hank Aaron's record?
Skot Nelson music is my saviour / i was maimed by rock and roll
Damn it, Scotty, I'm a doctor, not a baseball player!
Too bad there's not a "disinformative" mod for posts which propagate misinformation based on ignorance rather than trolling.
What do Bobo Newsom and Mario Mendoza have over Joltin' Joe and his streak? Membership in the All-Time 1337 Hall of Fame.
I would like to see the mathematical prove, LOL. As a researcher "unlikely" just doesn't quite cut it for me. It is easy to say that it is unlikely that p=np, but I doubt I will run away with 1. mil for that.
Anyways... I actually don't think it is that unlikely that there could be some person out there where the prob. of hitting a key on an odd microsecond is 50.000000 (continue with 10^(10^(number of atoms in the universe)) 0s)) 0000001%.
However, it is important to note, that it is not my job to prove there is, it is your job to prove there isn't and never will be (if you want to say that it is truly random).
Obviously the same things can be said for network latency and mouse movement timing.
And, for that matter, quantum random number generators.
Breaking Into the Industry - A development log about starting a game studio.