Same Programs + Different Computers = Different Weather Forecasts
knorthern knight writes "Most major weather services (US NWS, Britain's Met Office, etc) have their own supercomputers, and their own weather models. But there are some models which are used globally. A new paper has been published, comparing outputs from one such program on different machines around the world. Apparently, the same code, running on different machines, can produce different outputs due to accumulation of differing round-off errors. The handling of floating-point numbers in computing is a field in its own right. The paper apparently deals with 10-day weather forecasts. Weather forecasts are generally done in steps of 1 hour. I.e. the output from hour 1 is used as the starting condition for the hour 2 forecast. The output from hour 2 is used as the starting condition for hour 3, etc. The paper is paywalled, but the abstract says: 'The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.'"
Why don't you use 128bit Integers to represent some form of fixed point? I highly doubt you need any more precision than that.
WTF are these amateurs doing? This is a solved problem and has been for several decades. Base float is solved. How to condition your computations so that order remains the same or does not impact the results is solved. Pathetic.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Almost all the CFD (Computational Fluid Mechanics) simulations us time marching of Navier-Stokes equations. Despite being very non linear and very hard, one great thing about them is they naturally parallelize very well. The partition the solution domain into many subdomains and distribute the finite volume mesh associated with each sub domain to a different node. Each mesh is also parallelized using GPU. At the end of the day these threads complete execution at slightly different times and post updates asynchronously. So even if you use the same OS and the same basic cluster, if you run it twice you get two different results if you run it far enough, like 10 days. I am totally not surprised if you change OS or architecture or big-endian-small-endian things or the math processor or the GPU brands the solutions differ a lot when you make 10 day forecast.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
When doing spice simulations of a circuit many years ago, we ran across one interesting feature. When using the exact same inputs and the exact same executable, the sim would converge and run on one machine, but it would fail to converge on another. It just happened that one of the machines was an Intel server, and the other was an AMD, and we attributed it to ever so slightly different round off errors between the floating point implementation of the two. It didn't help that we were trying to simulate a bad circuit design that was on the hairy edge of convergence, but it was eye opening that you could not guarantee 100% identical results between different hardware platforms.
The x86 architecture, since the 8081, has double precision 64 bit floats, and a special 80 bit float--some compilers call this long double and use 128 bits to store this. How does this compare to other architectures?
The people writing this code ought to've known better.
This very effect was noted in weather simulations back in the 1960's. Read Chaos - The making of a new science, by Jmaes Gleick.
Rounding errors are orders of magnitude smaller than measurement errors, they are not the precision bottleneck.
This problem has been known since at least the 1970s, and it was weather simulation that discovered it. It lead to the field of chaos theory.
With an early simulation, they ran their program and got a result. They saved their initial variables and then ran it the next day and got a completely different result.
Looking into it, they found out that when they saved their initial values, they only saved the first 5 digits or so of their numbers. It was the tiny bit at the end that made the results completely different.
This was terribly shocking. Everybody felt that tiny differences would melt away into some averaging process, and never be an influence. Instead, it multiplied up to dominate the entire result.
To give yourself a feel for what's going on, imagine knocking a billiard ball on a table that's miles wide. How accurate must your initial angle be to knock it into a pocket on the other side? Now imagine a normal table with balls bouncing around for half an hour. Each time a ball hits another, the angle deviation multiplies. In short order with two different (very minor differences) angles, some balls are completely missing other balls. There's your entire butterfly effect.
Now imagine the other famous realm of the butterfly effect -- "time travel". You go back and make the slightest deviation in one single particle, one single quantum of energy, and in short order atmospheric molecules are bouncing around differently, this multiplies up to different weather, people are having sex at different times, different eggs are being fertilized by different sperm, and in not very long an entirely different generation starts getting born. (I read once that even if you took a temperature, pressure, wind direction, humidity measurement every cubic foot, you could only predict the weather accurately to about a month. The tiniest molecular deviation would probably get you another few days on top of that if you were lucky.)
Even if the current people in these parallel worlds lived more or less the same, their kids would be completely different. That's why all these "parallel world" stories are such a joke. You would literally need a Q-like being tracking multiple worlds, forcing things to stay more or less along similar paths.
Here's the funnest part -- if quantum "wave collapse" is truly random, then even a god setting up identical initial conditions wouldn't produce identical results in parallel worlds. (Interestingly, the mechanism on the "other side" doing the "randomization" could be deterministic, but that would not save Einstein's concept of Reality vs. Locality. It was particles that were Real, not the meta-particles running the "simulation" of them.)
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
This is a good time to review some problems in met codes. The first real problem is that the science is poorly understood. If the model is poorly constructed conditioning is one of the least of your problems. By and large, the push for V&V came form the met world. The second thing is that the spatial resolution is 'way too big. And, long before IEEE 754, it was anecdotally known that you lose a digit whenever you change systems (hardware or software).
D. E. (Steve) Stevenson, Ph.D. Emeritus Associate Professor,School of Computing,Clemson University.
The summary says, "There exist differences in the results for different compilers, parallel libraries, and optimization levels,". That doesn't mean different computers, although different computers were used. It means that they weren't running the same code path and same order of operations so differences should have been expected.
Unfortunately, any information regarding whether the differences are significant for local or even regional weather prediction is behind the paywall.
Certainly not a problem for climate "scientists" all over the planet and their crazy predictions out 10 or 20 years.
In other words, they all gave different answers, but each one was equally certain that *it* was right.
They really need to standardize on what butterflies to use.
So, predictions of global warming and increasing weather variability are really justt artifacts of round-off errors.
My Slashdot tagline for today:
1 + 1 = 3, for large values of 1.
If you don't believe that statement, look at this:
http://hpcugent.github.io/easybuild/
and then the diagram in here:
http://hpcbios.readthedocs.org/en/latest/HPCBIOS_2012-92.html
Put the equivalent of that diagram into scattered wiki instructions and ask any 2 people to come up with the same build;
how tough would that be? Only the people who have tried it, know really well what it means...
btw.
In modern HPC systems it is common to provide 3 MPI stacks (IntelMPI, OpenMPI, MVAPICH) and a bunch of compilers;
ah, and that's just the first two layers from the bottom of that diagram! Are you surprised you have fireworks on the top?
In molecular dynamics simulations, kinetics are known to be approximate and states at a given time are not considered directly correlated with that time point; we only hope to get statistically correct distribution of states across ensembles. Consequently, differences in rounding between wildly different compiler/hardware architectures are expected. However, deterministic behavior of the system is achieved by employing higher precisions for accumulation steps, which ensures that averages over a sufficiently long time (big enough sample) are the same no matter what hardware is employed. Consequently a tremendous speed-up is possible running CUDA code on consumer grade nvidia cards which have far fewer double precision execution units than single float precision units. So, we have deterministic trajectories but nobody expects these to match real-world processes on a time-function basis :-)
Beyond about 3 days (based on the Meteorology classes I took in college) most forecasts are just a shitty guess. Looking at a 10 days forecast is like calling your local psychic hotline. Sometimes they're right, but it's just a lucky guess.
btw. "Consistency of Floating-Point Results using the Intel® Compiler or Why doesn’t my application always give the same answer?"
ref. http://software.intel.com/sites/default/files/m/4/4/6/9/4/39386-FP_Consistency_102511.pdf
Basically they should be happy their code ported to two different architectures and ran all the way. Expecting same results for processes behaving choatically is asking for too much.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
I'm no programming expert, but isn't this basically Computer Science 101 stuff?
I once saw a piece of code written by a mathematician which said "pow(x, -1)". Ugh. I wonder if meteorologists know better.
It doesn't mean anything. You must not listen to it. Global Warming is still happening, and the models are all correct and agree with each other.
97% of all scientists agree that we should stop generating CO2 NOW, otherwise humanity will be responsible for the greatest environmental catastrophe ever to hit the Earth. There is no need for any further examination of the science - what we need is ACTION.
Slashdot should not be supporting denier propaganda in this way. This story should be removed immediately.
QED - quod erat demonstrandum! Or to paraphrase - the proof is in the pudding... :-)
it's called Binary Coded Decimal (BCD) and it works well. plenty of banks still use it because it's reliable and works. it's plenty slower but it's accurate regardless of the processor it's used on.
Anons need not reply. Questions end with a question mark.
oh wait, you cannot compute Pi on any machine because its transcendental
n/a
Have gnu, will travel.
our bank account
Pretty much most iterative simulation systems like weather simulation will behave this way. When the result of one step of the simulation is the input for another step any rounding error will possibly get amplified.
Also see Butterfly Effect https://en.wikipedia.org/wiki/Butterfly_effect (not the movie!).
BCD solves the problem of binary not being able to unambigously represent certain decimal fractions. BCD solves little for scientific computing. You still need to round and in parallel programs you still gather & round in non-deterministic order. The non-determinism of the particular program won't go away if rewritten to use BCD.
Floating Point arithmetic is not associative.
Everyone who reads Stack Overflow knows this, because every who doesn't know this posts to Stack Overflow asking why they get weird results.
Everyone who does numerical simulation or scientific programming work knows this because they've torn their hair out at least once wondering if they have a subtle bug or if it's just round-off error.
Everyone who does cross-platform work knows this because different platforms implement compilers (and IEEE-754) in slightly different ways.
Everyone who does parallel programming knows this because holy shit will you see round-off differences when you through many minutes of TFlops at a problem and it sequences difference every time.
Everyone who looks at the standards knows this because for Chrissakes, Fused-Multiply-Add is standards compliant but _optional_.
Why is this remotely news?
Edward Lorenz discovered that floating point truncation causes weather simulations to diverge massively back in 1961.
This was the foundation of Chaos Theory and it was Lorenz who created the term "Butterfly Effect"
http://www.ganssle.com/articles/achaos.htm
The butterfly that changed the weather for the world was not in Texas. But it rather at the end of a floating point word with.
This is a common problem with all serious scientific codes. If it's important, you rerun test cases any time you change compiler flags or system software and compare results to make sure the changes are within an acceptable tolerance. They're never the same, so if the change is larger than the threshold, human examination and judgement are required to determine if the change is acceptable. It's not uncommon to discover latent bugs that didn't appear until the machine actually did what the programmer wrote.
Anyone who thinks that this is a solved problem, or ever will be a solved problem doesn't understand the many issues involved which range from algorithm choice to order of execution and intermediate result truncation.
FWIW So far as I know, no x86 systems provide 128 bit floating point. Power, Sparc and Z series are the only options for that I'm aware of. I spent a good bit of time investigating this when I code of mine had convergence issues.
It also demonstrates the problem with modeling climate change. But of course, if you already know what the answer you want is, you can just modify things until you get the answer you want.
So... of all the hardware tested, which one more accurately predicted the actual weather?
That's what I want to know!
Endianness, floating-point representation, long-short INT, roundoff, machine error have been known from the 1970's as posted above.
Trouble now is that a 'new' kid is in town: the Geographer (Geo-groper).
Geographer + computer + stolen (borrowed) code (the Geographer does not even understand INT32) = Shitty output.
The UN IPCC 'Reports' are ripe with shit that they call 'science.'
And the Politicos and Warmer-boys just eat it up faster than it dribbles out.
But who is to complain? The National Science Foundation (lots of Geo-gropers) throw money at this shit like the Treasury can't print enough money fast enough.
Like I wrote: SFB.
Excuse my completely uneducated, non-scientific response but, this is, in essence about weather forecasting, right? It would seem to a laymen like myself you are a group of highly trained scientists of different genres looking to be as accurate as possible. There is one variable that I'm sure none of you wants to admit. I highly respect, appreciate, & admire what you do for the common good. With all the super computers, past data, software and modeling, there's a fly in the soup. You guys, at the end of the day, still have to have a little luck to correct! Mother Nature has no part in anything to do with science. Chaos is by definition impossible to predict! It's just a thought I wanted to throw out there. Anyone who hunts or fishes passionately knows what I'm eluding to. Everything is in "perfect" condition for the hunt & the game is nowhere to be found. I'm not being critical, but you can't be 100% accurate with anything to do with nature. Thanks for all you do!
just use doubles.
may be slightly slower, but you wont have this problem/
I didn't know anyone was still using the old Pentiums anymore.
This is what chaotic systems do. Not to worry, it doesn't change the accuracy of the forecast.
A better article...
From what I can gather, although the code was well scrubbed so that the single processor, threaded and message passing (MPI) versions produce the same binary result indicating no vectorization errors, machine rounding differences caused problems.
Since all the platforms were IEEE754 compliant and the code was mostly written in Fortran 90, I'm assuming that one of the main contributor to this rounding is the evaluation order of terms and perhaps the way that double fourier series and spherical harmonics where written.
Both SPH and DFS operations use sine/cosine evaluation which vary a great deal from platform to platform (since generally they only round within 1ulp, not within 1/2ulp of an infinitely precise result).
I remember many moons ago, when I was working on fixed-point FFT accelerators, we were lazy and generated sine/cosine tables using the host platform (x86) and neglected to worry about the fact that using different compliers and different optimization levels on the same platform we got twiddle-factor tables that were different (off-by-one).
With one bug report, we eventually tracked it down to different intrinsics (x87 FSIN w/ math or FSINCOS) were used and sometime libraries were used. Ack... Later library releases we complied in a whole bunch of pregenerated tables to avoid this problem.
Of course putting in a table or designing your own FSIN function for a spherical harmonic or fourier series numerical library solver might be a bit out of scope (not to mention tank the performance), so I'm sure that's why they didn't bother to make the code platform independent w/ respect to transcendental functions, although with Fortran 90, it seems like they could of fixed the evaluation order issues (with appropriate parenthesis to force a certain evaluation order, something you can't do in C).
"The paper is paywalled" - then don't link to it! Link only to open, accessible content. If someone wants to brick themselves up in a paywall ghetto, don't give them publicity.
Handbrake transcodes video as a multi-threaded application. I have yet to try it, but if I re-encoded the same video multiple times from the same source, would I get a different file size based on an MD5 or SHA1 checksum?
Life is not for the lazy.
Here's a simple example. Try 0.5 - 0.4 - 0.1 on a calculator or calculator app. If it is using the FPU it will probably get a non-zero result. This is why calculators, including ours, are normally implemented using decimal arithmetic rather than the FPU.
All IEEE754 would do is ensure that each FPU based calculator would yield the same non-zero result.
It is surprising how quickly certain rounding errors can add up. I've had the dubious pleasure of writing an insurance rating algorithm based on multiplying tables of factors. The difference between half-up and banker's round at 6 decimal places makes for rating errors totalling > 50% of the expected premium in a surprisingly small number of calculations. It's one thing to know about error propagation from a theoretical standpoint, but it's quite another to see it happen in real life.
I sympathize with the weather forecasters.
That's correct. Addition is not commutative with floating point numbers. So, 1,000,000 + 1,000 + 1,000 is not necessarily the same as 1,000 + 1,000 + 1,000,000.
Also, x * x + x != x * (x + 1) but many compilers make this substitution to reduce code size or increase FPU throughput.
Didn't we know this? Take forecasts with a grain of salt because they could be wrong?
Be seeing you...
This problem is not going to go away unless/until computers start doing their math rationally and symbolically. That is, with fractional results stored as fractions with no rounding. Where irrational constants are used in calculations, they'll have to be carried through in symbolic form as you would using pencil and paper. That is, the computer actually stores a representation of 1/2pi, NOT 1.570796327.
Of course, that leaves the 'minor matter' of performance.
Actually, it's worse.
many predictions - most, nowadays, are out-and-out fraud.
But you mustn't say this.....
Well, if the system dynamics are governed entirely by random perturbations, then fraud is of no consequence, just as legitimate prediction is of no consequence.
While cretins dribble on about the importance of using 64-bit, 80-bit, 128-bit, or one million bits of floating point precision, there used to be this little mathematical discipline known as 'Numerical Analysis' that has a little bit to say about the issue. For god's sake, does no-one in IT actually get a proper education these days?
1) Weather prediction algorithms are SUPPOSED to have minor inaccuracies introduced into the data set. This is the whole idea. Run the calculation, say, three or four times with minor noise values added to the input values (from you weather station collection points). If the predictions from each run vary greatly from one another, this is indicative that the prediction is essentially junk. If each prediction is fairly close to the others, this is indicative that the computer program MAY be giving a fairly accurate weather forecast.
2) The above method is actually used to decide the accuracy of longer term weather forecasts. Forecasts close to the present time are expected to be highly accurate. It is the fall-off in accuracy as the prediction time increases that is of interest to the meteorologists.
3) Weather prediction software should NOT be vulnerable to issues of precision, rounding or whatever. The software should have been written by people with a proper understanding of the mathematical theories of numerical analysis. To make this clear to those of you to thick to get it, here's a neat example:
MPEG1 and MPEG2 used floating point methods in video compression/decompression, and as a consequence compression was inefficient, frequently incorrect against desired targets, and had video decoders that would produce different results given the same data streams to display. Then proper mathematicians got involved. They dropped the cretinous "always use doubles" method of junk programming. They examined the mathematical 'space' the algorithms needed to operate in, and created mathematically correct INTEGER methods to handle compression/decompression. Unlike with MPEG2, every MPEG4 (H264) decoder produces the same result.
There is no rules in maths that says floating point is better/more correct than integer, or that doubles are better than singles. Indeed, a lack or understanding of the principles of numerical analysis means that thick headed programmers can make all kinds of dreadful mistakes by the simple order in which the calculations are carried out (even if said order would be OK if each value carried infinite precision). Too many programmers are proud to be crap at maths. These crap programmers are the ones that ALWAYS use doubles, and will go to even greater 'precision' if their code doesn't work as they expect.
As long as you don't exceed the capacity of the fraction bits, floating point operations on pure integers represented as floating point numbers are actually exact, so that's not a good example. 1002000, 1001000, 1000000, 2000 and 1000 are all exactly representable as IEEE754 floating point numbers, so the order really doesn't matter in this case.
I've seen Microsoft Access do the same thing. Apparently Person-B had loaded a slightly different OS date-handler DLL because they found a bug for date patterns of a specific country they happened to be interested in once. A specific spot on a report that calculated date difference thus produced slightly different answers than if ran on the PC of Person-A, making the final totals not add up the same.
Table-ized A.I.
So how much confidence should we have in calculations that purport to predict average global apparatuses and sea levels 50 or 100 years from now?
And that doesn't help if you are trying to do operations that produce repeating numbers in base 10. You're just trading one set of problem numbers for a different set of problem numbers.
Yes and no. You get rounding in either base when you have insignificant significant digits. However by not doing a conversion from one base to another you avoid a second opportunity for rounding errors.
Also numbers with repeatings digits can be expressed as a fraction. In our calculator a fraction is a basic data type. If an operation includes a fraction we will try to produce a result that is a fraction. This can sometimes avoid a rounding error.
... would still not solve some of the larger problems inherent in weather prediction ...
I'm not suggesting a solution to this problems. I am just providing a simple example of how an FPU or IEEE754 can get things wrong.
This has been known for at least 20 years.
I realize that they use Navier-Stokes equations and samples of current atmospheric conditions, and then propagate based on models and probability to estimate the next hour's forecast, and then the next and so on. In the end though, you can approximate their accuracy by saying '95% accurate today, 95% of 95% tomorrow. 95% of 95% of 95% the next day, and so on. After two weeks, you are at 50/50. The thing is: they aren't 95% accurate, they are only about 90%. Also because of the nature of the calculations, they don't use high precision (hundreds/thousands/billions of digits) for calculations. You are stuck using the limits of the bit width of the registers in the CPU/ALU (64 bits on a 64 bit processor). Some might yelp about 'double precision' but if you start arbitrarily expanding the number of bits, you may as well use either high or arbitrary precision.
1. I calculate my personal and corporation tax using four digit decimals (with a two digit Pound/Penny (or Dollar/Cent)system this seems ok)
2 Her Majesties Revenue & Customs sometimes calculates to two decimals and sometimes does not in the same tax calculation.
3. Hence having paid my estimate of due tax, I have got demands for one penny to four pennies sent by post (cost say 15 pound to Revenue to issue pay or punishment warning letter) or face punishment and fines. I duly pay one penny at local Post Office in cash, who charge HMRC 4 pounds or so to transmit the penny due. They (the Post Office staffers) laugh and say this is a very regular occurrence. Thus penny rounding error costs HMRC say 19 or 20 pounds of spend to collect. It would appear they cannot just 'not balance the books and not collect' [computer instruction in calculation to disregard sums due of less than xx pennies] due to stringency of reporting to parliament they have done all possible to collect due tax. (A tick box syndrome on HMRC officials reporting to government)
Regards Eion MacDonald
I should have wrote "repeating decimal" not "repeating digits".
http://en.wikipedia.org/wiki/Repeating_decimal
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html "What Every Computer Scientist Should Know About Floating-Point Arithmetic", by David Goldberg, published in the March, 1991 issue of Computing Surveys
I don't know why anyone thought this was surprising (it would have been surprising if they didn't get different results, given that some use GPUs, some don't, etc.). What does tend to get "amusing" is that even with the same processor folks get different results (sometimes due to software issues, chip rev issues, or actual hardware bugs that go undetected ... but are minor enough to remain so unless someone gets really careful and whips out the old logic analyzer).
Once I was challenged to resolve a mismatch in the 19th digit in a customers CFD code.
He had a constant in the deck (FORTRAN),,,,,, "PI=3.14".
These codes are full of such cruft. Some have been pressed into global warming climate modeling use. Half of the community cries foul. The other half wants more budget.
The code needs to be opened up.....!