Distributed Computing World Climate Simulation
Burnt Offerings writes: "The BBC reports that scientists at climateprediction.com are nearing the completion and public release in late summer of a distributed computing project that simulates the world's climate from 1950-2050 AD. It seems that each user's simulation will have different initial conditions built into their runtime simulation and a single completed simulation from 1950-2050 AD takes on average eight-months (Doh!), assuming average household computing power. The results will be sent back to the project's team, where they will select the models that resulted in the 'real' climate patterns that have occured since 1950-2000. I presume they will then use these validated models to help extrapolate the world's climate from 2000-2050. Pretty cool (or should I say warm? or hot?)."
If it wasn't, we'd have accurate forecasts up a few months in advance. As it is, I find forecasts are routinely wrong about even tomorrow's weather. What happened to the hole "butterfly flapping its wings in Singapore affects the weather in Kansas" thing? I don't see how initial conditions would tell them much, I bet even random quantum events have a very strong influence on weather models over 50 years. I'd put the odds of success for this distributed computing project around the same as SETI.
Websurfing done right! StumbleUpon
Do you have a big RS/6000 or two sitting around, or a sizeable Linux cluster(s) connected via fiber to the National Climatic Data Center in Silver Spring, MD, that you can crunch a few dozen gigabytes of data a couple of times a day to help out with?
Speaking as someone who builds clusters to run mesoscale atmospheric models, the amount of data that's required to be passed back and forth between the compute nodes of a cluster requires gigabit bandwidth to keep decent processors happy. I don't see how a WAN-based distributed computing project without massive bandwidth and nearly isochronous data transmissions are going to be of any use in producing a working forecast. Most atmospheric models I've seen require frequent communication between the nodes to keep the processors busy. In an average run for an area the size of a couple of average states for a 36 hour forecast, the traffic on the network in a five node cluster approaches a terabyte.
From the FAQ:
Won't all these computers being left on for 24 hours a day have a detrimental impact on the Climate System?
Assume a computer running 24hrs/day requires, on average, 50W of power. If 100,000 computers join the Casino-21 project, the project will require 5,000kW of power. There are 24 hours in a day, so each day the project will consume 120,000kW-hrs, or 432,000,000kJ of energy.
That's a big number, so let's try and put it in perspective by calculating how much energy is necessary to boil water for a cup of tea. Assuming a specific heat of water of 4.19 kJ/(kg-K), 0.237kg/cup of water, a necessary temperature rise from 20 degrees Celsius to 100 degrees Celsius, and that only one cup of water is boiled for each cup of tea, then about 80kJ/cup of energy are necessary (assuming our kettle is 100% efficient). This means that running the Casino-21 project for one day is equivalent to boiling water for 5,400,000 cups of tea.
Is 5,400,000 cups of tea a lot? According to the Tea Council, some 37 million people in the United Kingdom drink, on average, 3.4 cups of tea per day. That's nearly 126 million cups of tea per day in the UK alone!
Each day, about 23 times more energy will be spent boiling water for tea in the United Kingdom than would be used by the computers involved in the Casino-21 project. More seriously, a rough calculation suggests that 100,000 computers running 24hrs/day for one year at a power consumption of 50W will contribute approximately 0.0001% of the total amount of CO2 generated in one year. This is not an insignificant amount, but seems (to us) a worthwhile investment to better understand the climate system.
I run NO Distributed Computing (DC) project unless it follows these rules:
/. article accusing a DC app of loading in spyware, or a trojan of any sort. But I have faith that it will come.
1. Must Be Non-Profit. If it is for Profit I Must get a cut.
A. example: Seti@Home is run by the University of Berkley.
B. United Devices is for profit (think about it, Drug companies will make money). However, Easynews.com gives me 2 free Gigs of access a month for running it. Hey all I want is a piece, and I am getting it.
2. A DC project must be bug free. This may seem like a bloody obvious sort of thing. But considering the state of software releases nowadays one might think I am asking for a miracle! Seriously I understand the point of Version 2 releases and stuff like that. As long as it is handled competently and professionally I probably will forgive them. But I will have zero patience for a DC project that crashes my machine or keeps me from running ANY app. And that leads me to rule 3...
3. A DC must take a back seat to.. everything. It must also be maintence free.
Does this require any explanation?
4. Finally, it must be controversy free.
I have yet to come across a
It's generally regarded as a Bayesian technique. Actually, there's far more to Bayesian statistics that bootstrapping, but it's the part I spend a lot of time working with. In fact, I suppose that bootstrapping isn't fundamentally a Bayesian process, but it is highly empirical so it appeals to the same "crowd" as more decidedly Bayesian techniques. By the by, "Bayesian" statistics are statistics that make heavy use of Bayes' Rule to incorporate prior knowledge not included in your measured data.
My background - you develop a program to predict something biological. Let us say, to pick a problem on the same order of difficulty as predicting the weather, that you're trying to predict the three dimensional confirmation that proteins assume, based on their sequence.
Now, okay, you have a bunch of known sequences, which other people (personally, I do both the data mining and some crystalography) have attached to known structures. So, what do you do?
Well, you could fiddle with your program until it predicts really well on those sequences, and announce that it was good. This is "Bad Science", as the parent-poster points out, since the criterion are arbitrary - you have a tendency to "discover" random noise in the data, and you have no way of validating your results.
So, second option. Instead, you split the data in half at random (actually into more than 2 pieces, but conceptually in half.) You take one half, and you make the model predict as well as you can on that data. Then, you VALIDATE ON THE OTHER HALF OF THE DATA. You *never* change the model on the basis of the second half of the data - that is arbitrary/bad/cheating. This is called "bootstrapping". It has nothing to do with compiler installation.
So, as far as most scientists (as opposed to mathematicians) are concerned, the important question is - does this work? In the biological sciences, I can say categorically, yes, this bootstrapping technique has a proven track record. It does work. Obviously, you can screw up (using non-representative data is a good start) but the technique, when properly applied, is sound.
So, I assume it would work for predicting the weather, as well. By work I mean - you would know how well your software predicted the weather. Bootstrapping is not a means of predicting the weather in and of itself, merely of honestly evaluating the effectiveness of a weather prediction mechanism you already have.
The good and new comes from no quarter where it is looked for, and is always something different from what is expected.
From the FAQ:
"Many people have complained about the screensaver aspect of the Casino-21 client, and rightfully so. Screensavers only run when a computer has been idle for a period of time, are resource-hungry and place a limit on the platforms that can be supported. A background client will run whenever there is spare processing power, can be made more efficient than a screensaver and will support many more platforms. Following all of your suggestions, the Casino-21 client will be designed to run in the background. An additional client will be provided to view the progress of your climate simulation, and will be able to be run in screensaver mode when applicable."
So...Running the screen saver is not necessary.
As below, so above and beyond, I imagine drawn beyond the lines of reason. Push the envelope. Watch it bend.
Perhaps it's not impossible, but no-one has been able to do it yet. That's why they're resorting to this...
Can anybody read between the lines here? They're essentially saying, "Every climate model we have (that predicts global warming) wasn't able to accurately predict the global warming 1900-2000. We're fresh out of ideas so let's run a couple of million models with varying random values. When one of them (inevitably) comes pretty close we can cling to that as "proving" it to be a working model and use its results as convincing evidence that we must cut CO2 production or we will all die."
I'm not giving these jokers a minute of my CPU time. They are guessing. They don't have a workable model so instead of trying to keep thinking they're in a rush to get a "verified" (by passed events) model within a year so they can try to use the results to push their political agenda. The fact that a few of the millions of models they run correctly guesses the last 50 years of climate change is no indication it will predict future climate change unless there is a reasonable belief that the model was based on some logic. These models are based on random guesses at chaotic values.
Trust me, the results are already known. It will show global warming for 2000-2050. Can you imagine the coup if the random model that happened to guess 1950-2000 also showed global cooling of 5 degrees in the next 5 decades? How much you wanna bet that that result would NEVER see the light of day...
Spend your CPU cycles on SETI...
Oh come on, didn't you learn anything in freshman physics? Or didn't you take it. NOTHING anyone has done has been original. It was all based on previous work. Go search in google for a wacky guy named Lorenz and his crazy transformations. In fact, lookup "galileo spacetime". Your ignorant ass will realise that Einstein wasn't the first person to think of relativity. You might also want to lookup a fellow named Herman Minkowski.
Like typical slashdot troll scum, you don't know how to use google.
That sounds a little better. I did go to their website, and saw that they were going to use one of their four models, but I didn't dig farther to see that the journalists (as per usual) didn't understand what they were copying into their notebooks.
But what the researchers should be doing first is back-testing by using the first 25 years as calibration and the second 25 as a check on the extrapolation. Then doing it the other way around. Or maybe the distributed software does that, and all the permutations in-between.
At any rate, where it should fall on its ass is in the prediction of weather that actually makes a difference: hurricanes and tornadoes, which have crucial features that won't be well modeled, if at all, by the large differential boxes they selected. It will also run afoul of interference from random volcanic eruptions on a Pinatubo-Mount St. Helens ashfall scale, which happen on a decade or so time scale, the timing and location of which would be critical to the rest of the test run.
So I'm going to stick with my attitude that this is a tragic waste of CPU cycles that might actually go towards developing a drug that might actually save a life.
--Blair
P.S. SETI is likewise a waste; if we do hear a beep in the darkness, our only logical reaction will be to band together 6 billion of us as one to build the biggest, nastiest zero-time-of-flight weapon we can create, then hunker down in the sweaty dark to wait to fire it. Anyone coming that far is going to be wanting to make a buck off of it, taking chunks of the planet or slaves, and they're going to be ready for casual resistance.