Same Programs + Different Computers = Different Weather Forecasts

← Back to Stories (view on slashdot.org)

Same Programs + Different Computers = Different Weather Forecasts

Posted by timothy on Sunday July 28, 2013 @01:26AM from the climate-change-without-leaving-the-room dept.

knorthern knight writes "Most major weather services (US NWS, Britain's Met Office, etc) have their own supercomputers, and their own weather models. But there are some models which are used globally. A new paper has been published, comparing outputs from one such program on different machines around the world. Apparently, the same code, running on different machines, can produce different outputs due to accumulation of differing round-off errors. The handling of floating-point numbers in computing is a field in its own right. The paper apparently deals with 10-day weather forecasts. Weather forecasts are generally done in steps of 1 hour. I.e. the output from hour 1 is used as the starting condition for the hour 2 forecast. The output from hour 2 is used as the starting condition for hour 3, etc. The paper is paywalled, but the abstract says: 'The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.'"

3 of 240 comments (clear)

Min score:

Reason:

Sort:

Re:Have these people never heard of IEEE754???? by cnettel · 2013-07-28 01:35 · Score: 5, Insightful

No, it isn't, when the system itself is not well-conditioned. And I bet you don't want your compiler to run a real codebase in a IEEE754 strict interpretation, as that will disallow almost any optimization. Even if you would allow it, then "trivial" rearrangements, that don't affect the theoretical analysis of stability, correctness or condition number, will still introduce different rounding perturbations. Perturb weather or some other systems, and you will get a completely different trajectory.
That said, many applied fields, including meteorology, could benefit from more well-disciplined computational science approaches. But don't expect all that much of a difference.
Re:Damn you people by Anonymous Coward · 2013-07-28 03:38 · Score: 5, Insightful

Precision is the point. Mathematical chaos diverges exponentially. This means that if you have a value of 9.3440281 in one calculation and it returns 3.5 and a value of 9.344028147 in another, that you can get completely different results (where the second case returns 8.1). Now you say: well, let's just make it more precise then! So you put in the value of 9.34402814672 and get a completely different result (1.7), and so on*. If you weren't dealing with mathematical chaos, you would continually refine the values down (e.g. 3.5, 3.45, 3.467, etc.).
* Note: I should be careful with this layman's description to point out that more precise values technically shrink the window down. But since it is exponentially divergent in the first place, this might not ever do you any good in a realistic setting. Ref Lyapunov exponents and mathematical chaos
Re:Have these people never heard of IEEE754???? by Anonymous Coward · 2013-07-28 03:54 · Score: 5, Insightful

Almost nothing you do with IEEE754 floating point numbers is correct in the strict mathematical sense. You can't even represent 0.1 (1/10) as an IEEE754 floating point number. There are entire series of lectures on the topic of scientific computing with floating point numbers. The errors are usually small enough that a few simple rules keep you safe (e.g., never compare floating point numbers for equality), but when you do many iterations, the errors can accumulate and mess with your results, and if in that case you do the calculations in a different order, the accumulated error will mess with your results in a different way. That's what's happening here.