Alternate Baseball Universes
Jamie found a NYTimes op-ed by a grad student and a professor from Cornell, outlining some research they did into alternate baseball universes. The goal was to find out how unlikely in fact was Joe DiMaggio's 56-game hitting streak, played out in the 1941 season. No one since has even come close to that record. The math guys ran simulations of the entire history of baseball from 1885 on — 10,000 of them. For each simulation they put each player up to the plate for each at-bat in each game in each year, just like it happened; and they rolled the dice on him, based on his actual hitting stats for that season. (Their algorithm sounds far simpler than whatever the Strat-O-Matic guys use.) The result: Joltin' Joe's record is not merely likely, it's basically a sure thing. Every alternate universe produced a streak of 39 games or better; one reached 109 games. Joe DiMaggio was not the likeliest player in the history of the game to accomplish the record, not by a long shot.
A good illustration of this is the so-called "birthday paradox", which asks what's the probability of having duplicate birthdays in a group of n people (whose birthdays are independent of each other). Think of adding the people to the room one by one. The first person doesn't have any chance of having a duplicate birthday, because there's nobody else in the room. The second person has 1/365 chances of duplicating, 364/365 of missing the first one. Let's follow up on the misses, they're easier to work with. In general, if we've got k people in the room without a duplicate, that means they've used up k of the 365 days in the year, and the next person we introduce to the room has to miss all of those days to avoid a duplication. So the probability of everybody missing everybody else, by the time we get up to n people in the room, is (365/365)*(364/365)*(363/365)*...*((365-n+1)/365), which starts diving towards zero really fast. The probability of having one or more duplicates is 1 - P(no duplicates), which correspondingly climbs to one really fast. If you write a short program to do the exact calculations, you'll find that by the time you have 23 people in the room the probability is greater than 0.5 of having a duplicate, and by the time you get 57 people it's greater than 0.99!
If you pick one particular person and ask what's the probability of duplicating that birthday it remains quite small. That's the difference between having a particular rare event rather than having some rare event. For a large enough group, some pair of people will almost surely share a birthday but the odds of it being you (or any other designated person) remain quite small.
Just to preserve my computing geek cred, this is why you need collision resolution for hashing algorithms. You don't know which entries will share hash values, but collisions are almost certain to happen by the time you've loaded 3 * sqrt(Hash Table Capacity) values, e.g., if your hash table has capacity 10000 you will almost surely see a duplicate within the first 300 entries.
This seems relevant:
http://abcnews.go.com/Technology/WhosCounting/story?id=3694104&page=1
Disclaimer: I'm not an American, so I know next to nothing about baseball - and care less!