Distributed Computing World Climate Simulation
Burnt Offerings writes: "The BBC reports that scientists at climateprediction.com are nearing the completion and public release in late summer of a distributed computing project that simulates the world's climate from 1950-2050 AD. It seems that each user's simulation will have different initial conditions built into their runtime simulation and a single completed simulation from 1950-2050 AD takes on average eight-months (Doh!), assuming average household computing power. The results will be sent back to the project's team, where they will select the models that resulted in the 'real' climate patterns that have occured since 1950-2000. I presume they will then use these validated models to help extrapolate the world's climate from 2000-2050. Pretty cool (or should I say warm? or hot?)."
Weather is chaotic, but climate is ... well, ok, climate might be chaotic, but we really don't know -- and if it is chaotic, it is still only chaotic on timescales of more than 50 years.
Predicting climate 50 years in the future is a computationally difficult task, but it isn't impossible the way that predicting weather would be.
Tarsnap: Online backups for the truly paranoid
It's generally regarded as a Bayesian technique. Actually, there's far more to Bayesian statistics that bootstrapping, but it's the part I spend a lot of time working with. In fact, I suppose that bootstrapping isn't fundamentally a Bayesian process, but it is highly empirical so it appeals to the same "crowd" as more decidedly Bayesian techniques. By the by, "Bayesian" statistics are statistics that make heavy use of Bayes' Rule to incorporate prior knowledge not included in your measured data.
My background - you develop a program to predict something biological. Let us say, to pick a problem on the same order of difficulty as predicting the weather, that you're trying to predict the three dimensional confirmation that proteins assume, based on their sequence.
Now, okay, you have a bunch of known sequences, which other people (personally, I do both the data mining and some crystalography) have attached to known structures. So, what do you do?
Well, you could fiddle with your program until it predicts really well on those sequences, and announce that it was good. This is "Bad Science", as the parent-poster points out, since the criterion are arbitrary - you have a tendency to "discover" random noise in the data, and you have no way of validating your results.
So, second option. Instead, you split the data in half at random (actually into more than 2 pieces, but conceptually in half.) You take one half, and you make the model predict as well as you can on that data. Then, you VALIDATE ON THE OTHER HALF OF THE DATA. You *never* change the model on the basis of the second half of the data - that is arbitrary/bad/cheating. This is called "bootstrapping". It has nothing to do with compiler installation.
So, as far as most scientists (as opposed to mathematicians) are concerned, the important question is - does this work? In the biological sciences, I can say categorically, yes, this bootstrapping technique has a proven track record. It does work. Obviously, you can screw up (using non-representative data is a good start) but the technique, when properly applied, is sound.
So, I assume it would work for predicting the weather, as well. By work I mean - you would know how well your software predicted the weather. Bootstrapping is not a means of predicting the weather in and of itself, merely of honestly evaluating the effectiveness of a weather prediction mechanism you already have.
The good and new comes from no quarter where it is looked for, and is always something different from what is expected.
No. The term `starting conditions' appears in the BBC article, but if you go to the website it says:
In large-scale simulations such as these, there are often bits of physics/chemistry/weather that have to be put in by hand because, usually, the relevant bits of science would be too expensive to calculate, or couldn't be seen on the resolution of the simulation. While it's usually pretty doable to come up with reasonable models for the unresolved effects, there are often parameters in the models that could take a range of values.
This ensemble of models allows for the callibration of the model parameters against 50 years of data; this gives some confidence in the predictive power of the models for the next 50 years.
This sort of parameter estimation based on calibration is very common for models of complex systems, and not just for computer models. Ideally, one wants to get to the point where such things aren't necessary and you can directly calculate all the science a priori of course, but these model calibrations are often useful steps along the way.