Same Programs + Different Computers = Different Weather Forecasts
knorthern knight writes "Most major weather services (US NWS, Britain's Met Office, etc) have their own supercomputers, and their own weather models. But there are some models which are used globally. A new paper has been published, comparing outputs from one such program on different machines around the world. Apparently, the same code, running on different machines, can produce different outputs due to accumulation of differing round-off errors. The handling of floating-point numbers in computing is a field in its own right. The paper apparently deals with 10-day weather forecasts. Weather forecasts are generally done in steps of 1 hour. I.e. the output from hour 1 is used as the starting condition for the hour 2 forecast. The output from hour 2 is used as the starting condition for hour 3, etc. The paper is paywalled, but the abstract says: 'The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.'"
Why don't you use 128bit Integers to represent some form of fixed point? I highly doubt you need any more precision than that.
Almost all the CFD (Computational Fluid Mechanics) simulations us time marching of Navier-Stokes equations. Despite being very non linear and very hard, one great thing about them is they naturally parallelize very well. The partition the solution domain into many subdomains and distribute the finite volume mesh associated with each sub domain to a different node. Each mesh is also parallelized using GPU. At the end of the day these threads complete execution at slightly different times and post updates asynchronously. So even if you use the same OS and the same basic cluster, if you run it twice you get two different results if you run it far enough, like 10 days. I am totally not surprised if you change OS or architecture or big-endian-small-endian things or the math processor or the GPU brands the solutions differ a lot when you make 10 day forecast.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
That said, many applied fields, including meteorology, could benefit from more well-disciplined computational science approaches. But don't expect all that much of a difference.
When doing spice simulations of a circuit many years ago, we ran across one interesting feature. When using the exact same inputs and the exact same executable, the sim would converge and run on one machine, but it would fail to converge on another. It just happened that one of the machines was an Intel server, and the other was an AMD, and we attributed it to ever so slightly different round off errors between the floating point implementation of the two. It didn't help that we were trying to simulate a bad circuit design that was on the hairy edge of convergence, but it was eye opening that you could not guarantee 100% identical results between different hardware platforms.
Yes... because that never rounds off numbers.
https://en.wikipedia.org/wiki/IEEE_floating_point#Rounding_rules
Technology, the cause of and solution to all of life's problems.
The x86 architecture, since the 8081, has double precision 64 bit floats, and a special 80 bit float--some compilers call this long double and use 128 bits to store this. How does this compare to other architectures?
I was in particular thinking about the section on rounding in IEEE754. You are also overlooking that badly conditioned != behaves in a random fashion. My guess is they did not involve the numerics people in the optimization process, which is a complete fail when you know your problem is not well conditioned.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
When floating point roundoff errors grow big enough to affect the outcome of the simulation, you have long since reached the point where you are not predicting anything useful any longer. It is not exactly a problem if the results differ at that point.
This very effect was noted in weather simulations back in the 1960's. Read Chaos - The making of a new science, by Jmaes Gleick.
It doesn't help you that individual operations are rounded deterministically, if the order of your operations is non-deterministic. You cannot expect bit-identical results if you parallelize or allow any level of operation reordering. Even a very well-written code might implement a reduce operation in different hierarchies depending on memory layout. Enforcing all these things to be done in the exactly same order, with full IEEE754 compliance is a significant performance cost. By taking numerical aspects into account, you can ensure that your result is not invalid or unreasonable. However, for a chaotic problem where a machine epsilon difference in input data might be enough for a macroscopically different end result, there is nothing you can do and still expect reasonable utilization of modern architectures.
That is the problem when people start compiling with things like --ffast-math.
I wish I still had my mod points from a few days ago, because this post deserves some.
This problem has been known since at least the 1970s, and it was weather simulation that discovered it. It lead to the field of chaos theory.
With an early simulation, they ran their program and got a result. They saved their initial variables and then ran it the next day and got a completely different result.
Looking into it, they found out that when they saved their initial values, they only saved the first 5 digits or so of their numbers. It was the tiny bit at the end that made the results completely different.
This was terribly shocking. Everybody felt that tiny differences would melt away into some averaging process, and never be an influence. Instead, it multiplied up to dominate the entire result.
To give yourself a feel for what's going on, imagine knocking a billiard ball on a table that's miles wide. How accurate must your initial angle be to knock it into a pocket on the other side? Now imagine a normal table with balls bouncing around for half an hour. Each time a ball hits another, the angle deviation multiplies. In short order with two different (very minor differences) angles, some balls are completely missing other balls. There's your entire butterfly effect.
Now imagine the other famous realm of the butterfly effect -- "time travel". You go back and make the slightest deviation in one single particle, one single quantum of energy, and in short order atmospheric molecules are bouncing around differently, this multiplies up to different weather, people are having sex at different times, different eggs are being fertilized by different sperm, and in not very long an entirely different generation starts getting born. (I read once that even if you took a temperature, pressure, wind direction, humidity measurement every cubic foot, you could only predict the weather accurately to about a month. The tiniest molecular deviation would probably get you another few days on top of that if you were lucky.)
Even if the current people in these parallel worlds lived more or less the same, their kids would be completely different. That's why all these "parallel world" stories are such a joke. You would literally need a Q-like being tracking multiple worlds, forcing things to stay more or less along similar paths.
Here's the funnest part -- if quantum "wave collapse" is truly random, then even a god setting up identical initial conditions wouldn't produce identical results in parallel worlds. (Interestingly, the mechanism on the "other side" doing the "randomization" could be deterministic, but that would not save Einstein's concept of Reality vs. Locality. It was particles that were Real, not the meta-particles running the "simulation" of them.)
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
This is a good time to review some problems in met codes. The first real problem is that the science is poorly understood. If the model is poorly constructed conditioning is one of the least of your problems. By and large, the push for V&V came form the met world. The second thing is that the spatial resolution is 'way too big. And, long before IEEE 754, it was anecdotally known that you lose a digit whenever you change systems (hardware or software).
D. E. (Steve) Stevenson, Ph.D. Emeritus Associate Professor,School of Computing,Clemson University.
WTF are these amateurs doing? This is a solved problem and has been for several decades. Base float is solved. How to condition your computations so that order remains the same or does not impact the results is solved. Pathetic.
I ran into this once when working on support for an AIX compiler - got a bug report that we were doing floating point wrong because the code gave different results on AIX than some other machine (HP I think). After looking into it, it turned out that the algorithm accumulated roundoff errors quite badly, and basically wasn't working right on _any_ platform, but would give different results due to slightly different handling of round-off on the different platforms.
The problem is, this kind of code is very often written by scientists, who have most likely never heard of this issue, or forgot about it, or thought they handled it right but didn't - it's not their area of expertise, so it's not surprising if you think about it. I only hope that for engineering software that designs bridges, airplanes, etc, they realized that they better have it looked over by someone who knows what they are doing.
BTW, this is one reason why I take all the global warming predictions with a big grain of salt - they are all based on computer simulations which are difficult if not impossible to validate, and given what I've seen, I don't trust the results from them at all.
Those tiny rounding errors are causing different forecasts.
So are the measurement errors, and to a much higher degree. The roundoff errors just don't matter.
How accurate and reliable can these forecasts be in reality then?
Once they reach the point where errors have accumulated to this degree, not at all. Everybody knows that.
In other words, they all gave different answers, but each one was equally certain that *it* was right.
They didn't predict the rain correctly yesterday here, that's why I believe those predictions are obviously incorrect.
So are you saying that enforcing predictable and correct answers has a significant performance cost?
FCKGW 09F9 42
They really need to standardize on what butterflies to use.
At first I agreed with you and thought the GP wasn't aware of the concept of chaos (small errors in input give large errors in output). However, that's not what he wrote. He correctly pointed out that the rounding error is much smaller than the error from the initial measurement. Logically it should be the dominant error that first leads to chaotic behavior. The problem then seems to be over-belief in the forecast due to not accounting correctly for the measurement error. Long before any rounding errors start to play a role one should have stopped the simulation as it didn't predict anything useful anyway.
Measurement errors are involved once at boundary conditions. Precision errors propagates in the computations. So, even if a single precision error is magnitude orders smaller than measurement errors, they can have an impact on the result depending on the computations involved while solving the problem.
Achille Talon
Hop!
WTF are these amateurs doing?
Enjoying decent performance. Doing weather forecasts slower than real time is a lot easier but somewhat less useful.
My interpretation of the abstract (I cannot access the actual paper) is that they could not show that any particular compiler or architecture made the predictions any better, just different. In that case you just go with whichever runs fastest.
Finally! A year of moderation! Ready for 2019?
Basically they should be happy their code ported to two different architectures and ran all the way. Expecting same results for processes behaving choatically is asking for too much.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
When floating point roundoff errors grow big enough to affect the outcome of the simulation, you have long since reached the point where you are not predicting anything useful any longer.
This is not true. If the model predicts rain at 2 pm two days out and different rounding moves it to 3 pm, that is still a useful forecast in a lot of cases.
Finally! A year of moderation! Ready for 2019?
Alas, TFA is about a situation where they take the SAME inputs (initial measurements), run the program on ten different sets of hardware, and get ten different results.
I fail to see how the same program + same inputs == "differences in inputs cause most of the error"....
"I do not agree with what you say, but I will defend to the death your right to say it"
That would be a case of solving the wrong problem. Getting the exact same result every time doesn't much matter if that result is dominated by noise and rounding errors. In fact, the diverging results are a good thing, since, once they start to diverge, you know you've reached the point where you can no longer trust any of the results. If all the machines worked exactly the same, you could figure the same thing out, but it would require some very advanced mathematical analysis. With the build-the-machines-slightly-differently approach, the point where your results are becoming meaningless leaps out at you.
Remember, the desired result here is not a set of identical numbers everywhere. It is an accurate simulation. Getting the same results everywhere would not make the simulation one bit more accurate. So really, this is a good thing.
it's called Binary Coded Decimal (BCD) and it works well. plenty of banks still use it because it's reliable and works. it's plenty slower but it's accurate regardless of the processor it's used on.
Anons need not reply. Questions end with a question mark.
WTF are these amateurs doing? This is a solved problem and has been for several decades. Base float is solved. How to condition your computations so that order remains the same or does not impact the results is solved. Pathetic.
Go read up on chaotic systems, then come back to us.
But it does it in a consistant way across platforms.
So are you saying that enforcing predictable and correct answers has a significant performance cost?
He said nothing about "correct."
And yes, enforcing predictable answers across toolchains and architectures has significant performance cost. Even ignoring optimizations, with the x87 FPU (which uses 80-bit registers) it means the compiler needs to emit a rounding operation after every single intermediate operation because the x87 uses 80-bit internal floats but IEEE754 specifies that all operations, even intermediate ones, are always to be performed as if rounded like 32-bit or 64-bit floats.
When you get into the effects of order-of-operations type optimizations even on hardware that only uses 64-bit floats, you find that in most cases (x + y + z) != (z + y + x) even when the same floating point precision is present in each step of the calculation. Even things like common-divisor optimizations (if z is used as a divisor many times, compute 1/z a single time and multiply because multiplication is much faster than division) destroy the chance of equal outcome between compilers that will do it and compilers that will not.
The best way to get insight into the issues is to become familiar with the single-digit-of-precision estimation technique.
"His name was James Damore."
I once saw a piece of code written by a mathematician which said "pow(x, -1)". Ugh. I wonder if meteorologists know better.
It might be written that way to get a well-defined behavior depending upon the value of x. E.g. what if x is +0?
http://linux.die.net/man/3/pow
Maybe they do know better.
Pretty much most iterative simulation systems like weather simulation will behave this way. When the result of one step of the simulation is the input for another step any rounding error will possibly get amplified.
Also see Butterfly Effect https://en.wikipedia.org/wiki/Butterfly_effect (not the movie!).
Almost nothing you do with IEEE754 floating point numbers is correct in the strict mathematical sense. You can't even represent 0.1 (1/10) as an IEEE754 floating point number. There are entire series of lectures on the topic of scientific computing with floating point numbers. The errors are usually small enough that a few simple rules keep you safe (e.g., never compare floating point numbers for equality), but when you do many iterations, the errors can accumulate and mess with your results, and if in that case you do the calculations in a different order, the accumulated error will mess with your results in a different way. That's what's happening here.
I am nor arguing about that, I know that this is true. What gets me is that this is a surprise to anyone. I mean, have the done optimization without error estimation? Have they completely ignored error when optimizing? You do not just calculate away on these problems and then check whether the results seem to match reality. The results are far too important for that amateur-level approach.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Nice, but no. He's pointing out the obvious: Climate scientists are usually reliant on their own coding skills, which love it or hate it, are not quite on the same level (usually) as a Computer scientist / Software engineer.
And yes, little errors do matter, since a little error in a preceding calculation may be used in the next series of calculations, and so on...the snowball effect.
I am John Hurt.
Floating Point arithmetic is not associative.
Everyone who reads Stack Overflow knows this, because every who doesn't know this posts to Stack Overflow asking why they get weird results.
Everyone who does numerical simulation or scientific programming work knows this because they've torn their hair out at least once wondering if they have a subtle bug or if it's just round-off error.
Everyone who does cross-platform work knows this because different platforms implement compilers (and IEEE-754) in slightly different ways.
Everyone who does parallel programming knows this because holy shit will you see round-off differences when you through many minutes of TFlops at a problem and it sequences difference every time.
Everyone who looks at the standards knows this because for Chrissakes, Fused-Multiply-Add is standards compliant but _optional_.
Why is this remotely news?
Edward Lorenz discovered that floating point truncation causes weather simulations to diverge massively back in 1961.
This was the foundation of Chaos Theory and it was Lorenz who created the term "Butterfly Effect"
http://www.ganssle.com/articles/achaos.htm
*SNIP*
BTW, this is one reason why I take all the global warming predictions with a big grain of salt - they are all based on computer simulations which are difficult if not impossible to validate, and given what I've seen, I don't trust the results from them at all.
In the case of climate simulations, different models (both physics-wise and code-wise) are run with different computers on the same input data, and yield basically the same results.
When simulation chaotic behaviour, very small differences can make a big difference in the outcome of your simulations. As an example, I'm currently working on simulations of sparks in vacuum, which is a "runaway" process. In this case, adding a single particle early in the simulations (before the spark actually happens) can change the time for the spark to appear by several tens of %. This also happens if we are running with different library versions (SuperLU, Lapack), different compilers, and different compiler flags. Once the spark happens, the behaviour is predictable and repeatable - but the time for it to happen, as the system is "balancing on the edge, before falling over", is quite random.
Please mod parent up!
Inaccuracies in the input most likely did cause most of the error. Maybe nobody noticed because that error was the same in all the calculations. Eventually a difference between the calculations starting to build up because of differences in rounding between the different runs. This variation was noticed, but it would still be small compared to the differences caused by inaccuracies in the input. In short means by the time you notice the difference between two runs, both of them are already way off compared to the real value due to both of them having been working on the same inaccurate input.
If you want to do better, then do calculations with a representation that keeps track of uncertainty. Even in those cases where you cannot do a floating point operation and get an exact result, you can still do the calculation and know the possible range of the error. So each number is represented by a minimum and a maximum (or a mean and an error margin). As you do calculations the minimum and maximum values will be going further and further apart. Once they get too far apart, you know the results are no longer useful.
When you start the simulation, you initialize the numbers with an error margin corresponding to the accuracy of the measurements. Different runs on different platforms may not build up errors at the same rate, and that is something you can actually look at. If the ranges from two different runs no longer overlaps, then you know there is a bug somewhere. If one simulation says the air temperature is going to be be between -10 and +30 and the other simulation says it is going to be between 0 and +20, then they can both be right, but neither simulation result is particular useful. If one simulation says it is going to be between -10 and 0 and the other says it is going to be between +20 and +30, then you know at least one of them has a bug.
Do you care about the security of your wireless mouse?
All I was taught about floating point at that level was how wrong results we could get, and that we should avoid it. Several years later on a more advanced course, I learned about how to do floating point calculations, if you really need to.
Do you care about the security of your wireless mouse?
In the case of climate simulations, different models (both physics-wise and code-wise) are run with different computers on the same input data, and yield basically the same results.
Yes, but how many of those basically same results were achieved by tweaking the model until the output was basically the same?
The problem with climate science is that it's not experimental. You cannot run controlled experiments on the climate. Thus, the quality of climate science research is determined not by how accurately it models reality (since it's impossible to test), but by how accepted your research is by other climate scientists. This can easily lead to the point where the science becomes totally disconnected from the reality. Much like astronomy with its dark matter & energy & ridiculous constants to attempt to fit together the observed structure of the universe into a failing model.
Because of the nature of chaotic systems to have two similar, but different initial conditions to diverge exponentially, or in other words, for any accumulated error to result in exponential growth in error at a later point, it is fighting a losing battle. Of course you want to avoid any obvious error sources and do the best you can. But at some point, you could double the number of digits of precision you use, and only squeeze out a fraction of a day more in accurate predictions.
I didn't know anyone was still using the old Pentiums anymore.
When a complete re-implementation led by a physicist, who cited such coding issues among his reasons for doubt, funded via the Koch brothers, who's views on global warming are well known, comes to the same positive conclusion as all the other models then I would say that this is unlikely to be an issue. If these models where so flawed that rounding errors changed the results this much then they would not be coming to the same conclusion you would expect such a study to be extra careful to avoid any errors that might tilt it in favour of human caused global warming. (see https://en.wikipedia.org/wiki/Berkeley_Earth_Surface_Temperature)
Again, the article says that they used the same input. This can be verified with a simple diff. Same input leading to different results means that some other input (that is, the circuitry of the CPU or software libraries) have to be at fault, unless you want to start to argue that computer hardware is non-deterministic. Then you've opened an entirely different can of worms that your error margin system will do little or nothing to address.
And which point do you like to make? pow(x,-1) is equivalent to 1/x. So pretty valid.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Good points - in fact in this case one can say that ALL of the calculations done by the different computer architectures are in fact wrong. to varying degrees When doing floating point math without rounding analysis being done then all bets are off. Measurements always have accuracies, and floating point math also adds it's own inaccuracies.
The Boost library can help: http://www.boost.org/doc/libs/1_54_0/libs/numeric/interval/doc/interval.htm
Of course all this extra interval management costs in terms of development and performance. But what is the cost of having supercomputers coming up with answers with unknown accuracy?
ipv6 is my vpn
This is what chaotic systems do. Not to worry, it doesn't change the accuracy of the forecast.
the Berkeley Earth Surface Temperature (BEST) project
https://en.wikipedia.org/wiki/Berkeley_Earth_Surface_Temperature
was done by and funded by people who wanted to show global warming wrong or already thought it was, no way would they tweak their model to fit the consensus of other climate researchers yet they came to the same conclusion.
You can run experiments without changing things, make a prediction based on the current state does it come true? if so tested positive! this is not hard to understand and has been happening.
(see http://www.rawstory.com/rs/2013/03/27/climate-change-models-predict-remarkably-accurate-results/ for this, the real article http://www.nature.com/ngeo/journal/v6/n4/full/ngeo1788.html is pay-walled unfortunately)
Also you can test climate models ability to match reality, make them using a limited data set (eg 20k-1k years ago) and then test them on another(eg last 1k years) to see weather they match. Again this is not a hard method to understand, if the new set does not match perditions your wrong, if it does then you are more likely correct. This method is standard across biology as well as several other fields not ideal but good enough.
When floating point roundoff errors grow big enough to affect the outcome of the simulation, you have long since reached the point where you are not predicting anything useful any longer. It is not exactly a problem if the results differ at that point.
Weather model forecasts are run as an ensemble, not a single run. Generally forecast modelers, like climate modelers, start with numerous small variations in the initial state, run the model multiple times, and average the results.
Thing is, reading the abstract (since the article is paywalled) - its not clear that the summary here is correct. To me, anyway, it seems like they may be saying that, in practice, ensemble forecasting solves this problem even though it's present in individual runs.
#DeleteChrome
A better article...
From what I can gather, although the code was well scrubbed so that the single processor, threaded and message passing (MPI) versions produce the same binary result indicating no vectorization errors, machine rounding differences caused problems.
Since all the platforms were IEEE754 compliant and the code was mostly written in Fortran 90, I'm assuming that one of the main contributor to this rounding is the evaluation order of terms and perhaps the way that double fourier series and spherical harmonics where written.
Both SPH and DFS operations use sine/cosine evaluation which vary a great deal from platform to platform (since generally they only round within 1ulp, not within 1/2ulp of an infinitely precise result).
I remember many moons ago, when I was working on fixed-point FFT accelerators, we were lazy and generated sine/cosine tables using the host platform (x86) and neglected to worry about the fact that using different compliers and different optimization levels on the same platform we got twiddle-factor tables that were different (off-by-one).
With one bug report, we eventually tracked it down to different intrinsics (x87 FSIN w/ math or FSINCOS) were used and sometime libraries were used. Ack... Later library releases we complied in a whole bunch of pregenerated tables to avoid this problem.
Of course putting in a table or designing your own FSIN function for a spherical harmonic or fourier series numerical library solver might be a bit out of scope (not to mention tank the performance), so I'm sure that's why they didn't bother to make the code platform independent w/ respect to transcendental functions, although with Fortran 90, it seems like they could of fixed the evaluation order issues (with appropriate parenthesis to force a certain evaluation order, something you can't do in C).
"Remember, the desired result here is not a set of identical numbers everywhere. It is an accurate simulation."
*An* accurate simulation is not the desired result either, an accurate model is. Without reproducibility you don't have a model.
Reproducibility is important always.
I guess you have never written an actual simulation code. the IEEE754 standard tells you what happens and what kind of precision you get when applying basic operation on float. But that does not guarantee anything at the higher level.
The order of operation is extremely important not to lose precision. For instance, how do you sum a set of float to achieve maximal precision? Hint you do not start from the first one and iterate to the last one. You basically need to keep them sorted by increasing absolute value. Whenever you sum two of them, you need to insert the result in the set and recurse until there is no more. So essentially if you want to sum a set of floats, you need to sort first, which induces a significant overhead.
Now when you think about a complex simulation code, you might not have all the numbers available at once. So you do not actually know in which order the numbers should be summed up. If you have a value that ou previously computed as a sum and later on you get a new value to add in there. To get the best precision you might need to redo the whole sum.
Obvisouly keeping the best tradeoff of precision of the computation vs size of the memory you need ot keep vs time of the calculation is challenging. That is why precision is often mostly ignored in these calculations. Also most simulation code pay a lot of attention to how much precision is lost during the computation.
These problem are non trivial.
Measurement errors are involved once at boundary conditions. Precision errors propagates in the computations.
If measurement errors are less than precision errors and precision errors are sufficient to bring out chaos, changing the initial state by epsilon would also bring chaos.
Getting different results using different architectures is a good thing, it allows to see how chaotic the initial conditions are and evaluate the reliability of the result.
ID: the nose did not occur naturally, how would we wear glasses otherwise? (apologies to Voltaire)
My interpretation of the abstract (I cannot access the actual paper) is that they could not show that any particular compiler or architecture made the predictions any better, just different. In that case you just go with whichever runs fastest.
Or you could, you know, compare the results with reality and go with whichever one is most accurate.
[Fuck Beta]
o0t!
Averaging the result makes sense for climate modeling. But for meteorological forecasts, it makes more sense to report the most commonly occuring prediction in the ensemble, plus something about risks if you're talking about dangerous weather.
xkcd is not in the sudoers file. This incident will be reported.
Climate predictions are not vulnerable to rounding errors the way meteorological predictions are. Meteorologists are solving an initial value problem, climate scientist are solving a boundary value problem.
You can make simple climate models that do not rely on computer simulations (energy budget calculations of various sorts), and those are certainly enough to predict big problems from anthopogenic global warming. Heavy-duty numerical climate models aren't used to "prove" global warming, they're used to get better estimates for various things.
xkcd is not in the sudoers file. This incident will be reported.
Propagation of rounding errors is not a big problem in climate modeling. These models are run thousands of times in order to establish averages, very different from meteorological models (although they are basically the same!) which are run many times to find the most likely specific events.
xkcd is not in the sudoers file. This incident will be reported.
It is so unfortunate that academics do not have the wisdom of Slashdot available before they submit papers. Alas, that is the reality they have to live with.
Finally! A year of moderation! Ready for 2019?
You'r contradicting yourself.
Inaccuracies in the input most likely did cause most of the error.
is the exact logical opposite of
Eventually a difference between the calculations starting to build up because of differences in rounding between the different runs.
It's the differences in rounding based on the same input data that the paper is talking about. Not the inaccuracies in input data (testing for which would involve, by definition, different sets of input data varying by a known quantity). If the rounding was behaving the same, we would expect the same output given the same program and input. If a system produces different output every time its run with the same input, then we have a useless system as we cannot have any way of verifying that what is produced is correct. If you can't unit test the system, then you have a religion, not a scientific simulation.
Yes, it is possible to estimate how well a climate model models reality. The parameters that vary in climate models are not unconstrained, but constrained by physics (experimental evidence). If your climate model accurately hindcasts the climate developments of the 20th century (say), but the parameters are at the extreme range of what's plausible from experimental physics, then it probably isn't a very good model.
Not all climate scientists focus on general circulation models either. If your particular GCM isn't accepted by climate scientists, it's probably because it has trouble accounting for things we know from other sub-disciplines of climate science.
These are pretty old, discredited talking points.
xkcd is not in the sudoers file. This incident will be reported.
does it really matter what am i meant to do with that information exactly:
A) panic
B) picnic
Handbrake transcodes video as a multi-threaded application. I have yet to try it, but if I re-encoded the same video multiple times from the same source, would I get a different file size based on an MD5 or SHA1 checksum?
Life is not for the lazy.
Here's a simple example. Try 0.5 - 0.4 - 0.1 on a calculator or calculator app. If it is using the FPU it will probably get a non-zero result. This is why calculators, including ours, are normally implemented using decimal arithmetic rather than the FPU.
All IEEE754 would do is ensure that each FPU based calculator would yield the same non-zero result.
It is surprising how quickly certain rounding errors can add up. I've had the dubious pleasure of writing an insurance rating algorithm based on multiplying tables of factors. The difference between half-up and banker's round at 6 decimal places makes for rating errors totalling > 50% of the expected premium in a surprisingly small number of calculations. It's one thing to know about error propagation from a theoretical standpoint, but it's quite another to see it happen in real life.
I sympathize with the weather forecasters.
Yes, guessing always has, and always will be, easier than deriving the correct answer.
Ken
That's correct. Addition is not commutative with floating point numbers. So, 1,000,000 + 1,000 + 1,000 is not necessarily the same as 1,000 + 1,000 + 1,000,000.
Also, x * x + x != x * (x + 1) but many compilers make this substitution to reduce code size or increase FPU throughput.
Didn't we know this? Take forecasts with a grain of salt because they could be wrong?
Be seeing you...
another one says the earth will absorb the heat Which one do you trust?
I think I'd have to go with the one that doesn't redefine "absorb" to mean "magically disappear".
Igor Presnyakov stole my hat
Yes, it is possible to estimate how well a climate model models reality.
It's possible to make a climate model, then wait for reality to happen, then see how well they matched, yes. But you can't run experiments to see if your model is sound. And climate models do diverge from reality as reality happens, see this graph for example.
The parameters that vary in climate models are not unconstrained, but constrained by physics (experimental evidence). If your climate model accurately hindcasts the climate developments of the 20th century (say), but the parameters are at the extreme range of what's plausible from experimental physics, then it probably isn't a very good model.
That hasn't stopped astronomers from positing ridiculous things such as dark matter and dark energy.
the Berkeley Earth Surface Temperature (BEST) project https://en.wikipedia.org/wiki/Berkeley_Earth_Surface_Temperature was done by and funded by people who wanted to show global warming wrong or already thought it was, no way would they tweak their model to fit the consensus of other climate researchers yet they came to the same conclusion.
They didn't make a model, they measured temperatures. I agree that you can measure temperatures accurately. From skimming the article it seems they discredited the 'urban heat bias' hypothesis which is interesting to know.
Also you can test climate models ability to match reality, make them using a limited data set (eg 20k-1k years ago) and then test them on another(eg last 1k years) to see weather they match. Again this is not a hard method to understand, if the new set does not match perditions your wrong, if it does then you are more likely correct. This method is standard across biology as well as several other fields not ideal but good enough.
That doesn't show your model matches reality, it shows that you managed to make a complicated mathematical formula that managed to use some data points to generate some other data points.
This problem is not going to go away unless/until computers start doing their math rationally and symbolically. That is, with fractional results stored as fractions with no rounding. Where irrational constants are used in calculations, they'll have to be carried through in symbolic form as you would using pencil and paper. That is, the computer actually stores a representation of 1/2pi, NOT 1.570796327.
Of course, that leaves the 'minor matter' of performance.
And which point do you like to make? pow(x,-1) is equivalent to 1/x. So pretty valid.
Sure, for a mathematician, 1/x = x^{-1}.
But pow(a, b) is implemented as exp(b * log(a)), and both exp() and log() are probably implemented as high-degree chebyschev polynomial approximations, so pow(x,-1) involves a f**k of a lot more computation than the 1/x which the mathematician intended. Not to mention lots more possibility for numerical error.
No. You are assuming if both calculations produce the same result, then that result is correct. In reality, you can run the same calculation twice and get the same error.
If you take the same source and compile it for two different systems, is it the same program? What the compiled program does is probably within the specs of the language.
That depends very much on what the purpose of the program is. I have worked with cryptography, and for most usages in that field, a program which produces the same output twice is unusable. A program which does floating point operations need to be done in a way, where you can figure out how large an error you get. Knowing the accuracy is more important than getting the same result twice. If you do get two different results from the same calculation, you can check if the difference is within the accuracy you were supposed to get.
The complete program is not one unit. You unit test individual units. And unit tests can deal perfectly well with units, where the spec allows for more than one possible output. The unit test just need to verify that the output is within spec. Testing for one specific output value is usable in some cases, but not always.
Do you care about the security of your wireless mouse?
Distributed systems are inherently non-deterministic. Moreover it says right there in the tittle, that the different results were produced on different computers.
Do you care about the security of your wireless mouse?
Once they reach the point where errors have accumulated to this degree, not at all. Everybody knows that.
Climatologist either don't or are in denial of that fact.
Apocalypse Cancelled, Sorry, No Ticket Refunds
I thought that was my point? It seemed that you were trying to argue that the input was actually different.
I've seen Microsoft Access do the same thing. Apparently Person-B had loaded a slightly different OS date-handler DLL because they found a bug for date patterns of a specific country they happened to be interested in once. A specific spot on a report that calculated date difference thus produced slightly different answers than if ran on the PC of Person-A, making the final totals not add up the same.
Table-ized A.I.
That's the simulation of climate rather than weather, which is a substantially different problem. It's a problem that's still hard and is still plagued by chaos-theory effects on numerical modeling. Not to worry, though: scientists have understood this problem and its implications for about 7 orders of magnitude longer than you've heard about it.
Remember, the desired result here is not a set of identical numbers everywhere. It is an accurate simulation.
Well, I'd say a useful simulation, which entails some reasonable level of accuracy, but speed and cost are also important.
It isn't helpful if an algorithm gives you a slightly better simulation of tomorrow's weather if it takes a week to run. If your algorithm is faster or less expensive to run then you can run it more often, or use the saved computer time to run other models. Having an ensemble of models or more frequent updates might be more useful to forecasters than having one model that stays coherent for an extra 30 minutes out. The weather is so chaotic that it gets exponentially more expensive to predict further out.
And that doesn't help if you are trying to do operations that produce repeating numbers in base 10. You're just trading one set of problem numbers for a different set of problem numbers.
Yes and no. You get rounding in either base when you have insignificant significant digits. However by not doing a conversion from one base to another you avoid a second opportunity for rounding errors.
Also numbers with repeatings digits can be expressed as a fraction. In our calculator a fraction is a basic data type. If an operation includes a fraction we will try to produce a result that is a fraction. This can sometimes avoid a rounding error.
... would still not solve some of the larger problems inherent in weather prediction ...
I'm not suggesting a solution to this problems. I am just providing a simple example of how an FPU or IEEE754 can get things wrong.
But tweaking the FP to ensure reproducibility doesn't improve the accuracy of the model. In fact, it hides the inaccuracies of the model. So, while I completely agree with you in principle, I think that what you said has no bearing on this particular case.
"In the case of climate simulations, different models (both physics-wise and code-wise) are run with different computers on the same input data, and yield basically the same results."
Maybe that means that their models are bad and they're all fudging their data?
Much more useful than running your simulation on multiple different supercomputers is to run it multiple times on one supercomputer, but with your input variables perturbed slightly on each run. If you randomly perturb your input measurements proportional to the standard error in those measurements, then the differences between runs will directly tell you how accurate your forecast is. (This should work independent of whether inaccuracy is dominated by initial condition inaccuracy, or by round off. It doesn't help so much if your model is bad.) You probably don't need to do this for every single forecast. After you've done it often enough for different weather conditions, you should get to know what your accuracy profile looks like.
Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
No, I was arguing that the input was not actually accurate enough to do the calculation in the first place. Floating point numbers can handle much higher accuracy than the measurements used as input. By the time you notice the difference between two runs you are already way past the point were the output could be useful.
So there are two sources of errors. Inaccurate input data which leads to reproducible bad output. Rounding errors during calculation which is smaller and thus only becomes significant later. The inaccuracy of the output due to inaccurate input data cannot be seen by running the calculation twice with the same input data. But by comparing to the real world, it can be observed that it diverges from the calculation. That divergence can be caused by inaccurate input data, flaws in the algorithm, or simply by the real world having much higher granularity than the discrete datapoints used in the algorithm.
Divergence between two runs of the same algorithm on the same input data can be caused by a number of other factors. Such factors include different rounding due to differences in the platform being used (different hardware and/or software), or non-determinism due to timing in a distributed system. For example if a node receives three floating point numbers and add them, the sum can depend on which order the three numbers were received.
The differences due to rounding errors are however not of much practical interest. By the time they are large enough to notice, the errors due to inaccurate input are already too large for the output to be of practical interest.
Do you care about the security of your wireless mouse?
> In that case you just go with whichever runs fastest.
Not quite, optimizing to "result = 1" will be fastest, but obviously not correct. If you know -Ofast will degrade numerics compared to -O0 you do know something.
So you do a sensitivity analysis and learn what parts of the results you can trust and what parts you can't.
Or you re-run your forecast models from 10 days ago with what-you-knew 10 days ago and see which ones got closest to reality. After doing those hindcasts for a while you can build up some confidence about model performance.
That doesn't work so well when trying to model a 1 in 500 year storm which you have no hindcast experience with, but it's better than nothing.
~.~
I'm a peripheral visionary.
Anyone who claims Boost is stable and portable is so full of BS it's not even funny.
for i in `facebook friends "=bday" 2>/dev/null | cut -d " " -f 3-`; do facebook wallpost $i "Happy birthday!"; done
In the cast of the boost-interval library, the link I posted has a very clear warning about that; so I don't understand why that quote is relevant here. This warning shows that "floating point is hard" and that is MORE reason to be careful with your intervals!
ipv6 is my vpn
They didn't predict the rain correctly yesterday here, that's why I believe those predictions are obviously incorrect.
And there's really no excuse. I can tell you yesterday's weather with 100% accuracy.*
* - Unless I didn't write it down.
Dark Reflection
256 bit integer math is sufficient to address the observable universe with planck length precision.
Unless you intent to run your simulation from components you found in a dumpster.
Climate research and weather research are individually large enough fields to justify developing a customized CPU but considering that the x86 architecture already has support for 256bit integer operations through the AVX instruction set this seems more to be a problem of languages not supporting it in a good way.
Sure, but in the same time that you can process one 256-bit integer you could SIMD to process several shorter integers in the same time.
If you're running a simulation I suspect that processing more points is probably more important than processing each individual point down to the planck length.
So now we're just arguing over where to draw the line. Oh, and all of this is setting aside all the arguments over order-of-operations and such. If you want a fully deterministic process it probably means far more locking/synchronization/etc. If you can get a perfect model run in 12 hours, and a good-enough one in 2 hours, most would take the latter (and 5 more to go along with it).
1. I calculate my personal and corporation tax using four digit decimals (with a two digit Pound/Penny (or Dollar/Cent)system this seems ok)
2 Her Majesties Revenue & Customs sometimes calculates to two decimals and sometimes does not in the same tax calculation.
3. Hence having paid my estimate of due tax, I have got demands for one penny to four pennies sent by post (cost say 15 pound to Revenue to issue pay or punishment warning letter) or face punishment and fines. I duly pay one penny at local Post Office in cash, who charge HMRC 4 pounds or so to transmit the penny due. They (the Post Office staffers) laugh and say this is a very regular occurrence. Thus penny rounding error costs HMRC say 19 or 20 pounds of spend to collect. It would appear they cannot just 'not balance the books and not collect' [computer instruction in calculation to disregard sums due of less than xx pennies] due to stringency of reporting to parliament they have done all possible to collect due tax. (A tick box syndrome on HMRC officials reporting to government)
Regards Eion MacDonald
You didn't read what I wrote.
To repeast myself:
they could not show that any particular compiler or architecture made the predictions any better, just different.
If one of the architectures/compilers had come up with a constant "result = 1" or indeed any degradation of model performance at all, they would have been able to make much stronger statements in the abstract.
Finally! A year of moderation! Ready for 2019?
I should have wrote "repeating decimal" not "repeating digits".
http://en.wikipedia.org/wiki/Repeating_decimal
You don't understand why your proposed solution is bad when it has a negative impact on performance and is not portable between different supercomputers? Where do I even begin...
Let's put it this way: your university/research organization will eventually buy a new supercomputer. It may have a different architecture from the old one. Do you then rewrite your 20,000 SLOC code which is using ? Do you really imagine anyone is going to pay you for that?
for i in `facebook friends "=bday" 2>/dev/null | cut -d " " -f 3-`; do facebook wallpost $i "Happy birthday!"; done
The reality is that the original code was not portable between supercomputers and already comes up with incorrect answers but yet people didn't realize it until now! Do you realize that this means that all the weather forecasts from the first supercomputer implementation of this program are now known to be wrong too? What is the cost of having answers that have unknown accuracy?
You don't have to use Boost - but you HAVE TO manage your intervals and accuracy and rounding errors! If you don't then you can not know what the accuracy is of your answers! Note this has relevance beyond supercomputing too - Digital Signal Processing of Audio also is adversely affected by people programming floating point filters incorrectly, causing noise artifacts and inharmonic distortion due to improper noise shaping and bad coefficient rounding and fading.
Jeff
ipv6 is my vpn
The reality is that the original code was not portable between supercomputers and already comes up with incorrect answers but yet people didn't realize it until now!
Ah, jeez. If you think this is the first time someone noticed that different computers give different results, I would like to introduce you to Edward Lorenz, a prominent physicist in the 1960s, and the field of science called Chaos Theory which he fathered. There is nothing new about the fact that this happens. It is taught in Computational Physics 101. The novelty of the study reported is that it quantifies the variation on different supercomputers in a comprehensive way.
And you may want to look up the definition of portable. If you take "portable" to mean "gives exactly the same results on all computers evar", then there are no portable programs in the entire world. By Boost not being portable, I mean that it doesn't even run on your new SGI Altix where there is no GCC compiler available.
for i in `facebook friends "=bday" 2>/dev/null | cut -d " " -f 3-`; do facebook wallpost $i "Happy birthday!"; done
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html "What Every Computer Scientist Should Know About Floating-Point Arithmetic", by David Goldberg, published in the March, 1991 issue of Computing Surveys
I don't know why anyone thought this was surprising (it would have been surprising if they didn't get different results, given that some use GPUs, some don't, etc.). What does tend to get "amusing" is that even with the same processor folks get different results (sometimes due to software issues, chip rev issues, or actual hardware bugs that go undetected ... but are minor enough to remain so unless someone gets really careful and whips out the old logic analyzer).
I'll see your trajectoryof a ball and raise you a Pioneer Anomaly
Apocalypse Cancelled, Sorry, No Ticket Refunds
Some massive twat marked my comment down from 1? Really? It's almost as if they're political activists who don't like any criticism whatsoever of their "post-modern" scientific methods.
Do you also believe it is impossible to predict that a flipped coin will come up heads 50% of the time, because it is impossible to predict what it will come up as on a single flip?
Do you also believe it is impossible to predict that a flipped coin will come up heads 50% of the time, because it is impossible to predict what it will come up as on a single flip?
No, but that's a nice strawman. What's pi in base 2?
Apocalypse Cancelled, Sorry, No Ticket Refunds
It is not. That is the exact kind of argument you were making.
Well, apparently the people who wrote the software that this whole article was about did not know that their software was broken because of this. http://journals.ametsoc.org/doi/abs/10.1175/MWR-D-12-00352.1
ipv6 is my vpn