Distributed Computing World Climate Simulation
Burnt Offerings writes: "The BBC reports that scientists at climateprediction.com are nearing the completion and public release in late summer of a distributed computing project that simulates the world's climate from 1950-2050 AD. It seems that each user's simulation will have different initial conditions built into their runtime simulation and a single completed simulation from 1950-2050 AD takes on average eight-months (Doh!), assuming average household computing power. The results will be sent back to the project's team, where they will select the models that resulted in the 'real' climate patterns that have occured since 1950-2000. I presume they will then use these validated models to help extrapolate the world's climate from 2000-2050. Pretty cool (or should I say warm? or hot?)."
It's generally regarded as a Bayesian technique. Actually, there's far more to Bayesian statistics that bootstrapping, but it's the part I spend a lot of time working with. In fact, I suppose that bootstrapping isn't fundamentally a Bayesian process, but it is highly empirical so it appeals to the same "crowd" as more decidedly Bayesian techniques. By the by, "Bayesian" statistics are statistics that make heavy use of Bayes' Rule to incorporate prior knowledge not included in your measured data.
My background - you develop a program to predict something biological. Let us say, to pick a problem on the same order of difficulty as predicting the weather, that you're trying to predict the three dimensional confirmation that proteins assume, based on their sequence.
Now, okay, you have a bunch of known sequences, which other people (personally, I do both the data mining and some crystalography) have attached to known structures. So, what do you do?
Well, you could fiddle with your program until it predicts really well on those sequences, and announce that it was good. This is "Bad Science", as the parent-poster points out, since the criterion are arbitrary - you have a tendency to "discover" random noise in the data, and you have no way of validating your results.
So, second option. Instead, you split the data in half at random (actually into more than 2 pieces, but conceptually in half.) You take one half, and you make the model predict as well as you can on that data. Then, you VALIDATE ON THE OTHER HALF OF THE DATA. You *never* change the model on the basis of the second half of the data - that is arbitrary/bad/cheating. This is called "bootstrapping". It has nothing to do with compiler installation.
So, as far as most scientists (as opposed to mathematicians) are concerned, the important question is - does this work? In the biological sciences, I can say categorically, yes, this bootstrapping technique has a proven track record. It does work. Obviously, you can screw up (using non-representative data is a good start) but the technique, when properly applied, is sound.
So, I assume it would work for predicting the weather, as well. By work I mean - you would know how well your software predicted the weather. Bootstrapping is not a means of predicting the weather in and of itself, merely of honestly evaluating the effectiveness of a weather prediction mechanism you already have.
The good and new comes from no quarter where it is looked for, and is always something different from what is expected.