Ask Slashdot: How Reproducible Is Arithmetic In the Cloud?

← Back to Stories (view on slashdot.org)

Ask Slashdot: How Reproducible Is Arithmetic In the Cloud?

Posted by timothy on Thursday November 21, 2013 @11:59AM from the irreproducible-results dept.

goodminton writes "I'm research the long-term consistency and reproducibility of math results in the cloud and have questions about floating point calculations. For example, say I create a virtual OS instance on a cloud provider (doesn't matter which one) and install Mathematica to run a precise calculation. Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time. In the cloud, hardware, firmware and hypervisors are invisible to the users but could still impact the implementation/operation of floating point math. Say I archive the virutal instance and in 5 or 10 years I fire it up on another cloud provider and run the same calculation. What's the likelihood that the results would be the same? What can be done to adjust for this? Currently, I know people who 'archive' hardware just for the purpose of ensuring reproducibility and I'm wondering how this tranlates to the world of cloud and virtualization across multiple hardware types."

9 of 226 comments (clear)

Min score:

Reason:

Sort:

Fixed-point arithmetic by mkremer · 2013-11-21 12:01 · Score: 5, Informative

Use Fixed-point arithmetic.
In Mathematica make sure to specify your precision.
Look at 'Arbitrary-Precision Numbers' and 'Machine-Precision Numbers' for more information on how Mathematica does this.
1. Re:Fixed-point arithmetic by Giant+Electronic+Bra · 2013-11-21 12:44 · Score: 4, Informative
  
  Yes, you can do this, but its not feasible for all calculations. Things like trig functions are implemented on FP numbers, and once you start using FP its better to just keep using it, converting back and forth is just bad and defeats the whole purpose anyway. So in reality you end up with applications that DO use FP (believe me, as an old FORTH programmer I can attest to the benefits of scaled integer arithmetic!). Its one of those things, we're stuck with FP and once we assume that, then the whole question of small differences in results of machine-level instructions or of minor differences in libraries on different platforms, etc. you will probably find that arbitrary VMs won't produce exactly identical results when you run on different platforms (AWS, KVM, VMWare, some new thing).
  Is it ia huge problem though? The results produced should be similar, the parameters being varied were never controlled for anyway. Its how often the rounding errors between two FPUs are identical. Neither the new nor the old results should be considered 'better' and they should generally be about the same if the result is robust. A climate sym for example run on two different systems for an ensemble of runs with similar inputs should produce statistically indistinguishable results. If they don't then you should know what the differences are by comparison. In reality I doubt very many experiments will be in doubt based on this.
  
  --
  "Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
2. Re:Fixed-point arithmetic by Joce640k · 2013-11-21 14:05 · Score: 4, Informative
  
  Submitter is entirely ignorant of floating point issues in general. Other than the buzzword "cloud" this is no different from any other clueless question about numerical issues in computing. "Help me, I don't know anything about the problem, but I just realized it exists!"
  Wrong.
  In IEEE floating point math, "(a+b)+c" might not be the same as "a+(b+c)".
  The exact results of a calculation can depend on how a compiler optimized the code. Change the compiler and all bets are off. Different versions of the same software can produce different results.
  If you want the exact same results across all compilers you need to write your own math routines which guarantee the order of evaluation of expressions.
  OTOH, operating system, hardware, firmware and hypervisors shouldn't make any difference if they're running the same code. IEEE math *is* deterministic.
  
  --
  No sig today...
3. Re:Fixed-point arithmetic by Giant+Electronic+Bra · 2013-11-21 14:40 · Score: 5, Informative
  
  Trust me, its a subject I've studied. The problem here is that your system is unstable, tiny differences in inputs generate huge differences in output. You cannot simply take one set of inputs that produces what you think is the 'right answer' from that system and ignore all the rest! You have to explore the ensemble behavior of many different sets of inputs, and the overall set of responses of the system is your output, not any one specific run with specific inputs that would produce a totally different result if one was off by a tiny bit.
  Of course Lorenz realized this. Simple experiments with an LDE will show you this kind of result. You simply cannot treat these systems the way you would ones which exhibit function-like behavior (at least within some bounds). Lorenz of course also realized THAT, but sadly not everyone has got the memo yet! lol.
  
  --
  "Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
4. Re:Fixed-point arithmetic by tlhIngan · 2013-11-21 18:09 · Score: 5, Informative
  
  Don't use floating point if you can avoid it.
  If you can't, and the results are EXTREMELY important (remember, floating point is an APPROXIMATION of numbers), then you have to read What Every Computer Scientist Should Know About Floating Point Numbers. (Yes, it's an Oracle link, but if you google it, most of the links are PDFs while the Oracle one is HTML).
  If you're worried about your cloud provider screwing with your results, then you're definitely doing it wrong (read that article).
  And yes, lots of people, even scientists, do it wrong because the idealized notion of what a floating point type is and how it actually works in hardware is completely different. Floating point numbers are tricky - they're VERY easy to use, but they're also VERY easy to use wrongly, and it's only if you know how the actual hardware is doing the calculations can you structure your programs and algorithms to do it right.
  And no actual hardware FPU or VPU (vector unit - some do floating point) implements the full IEEE spec. Many come close, but none implement it exactly - there's always an omission or two. Especially since a lot of FPUs provide extended precision that goes beyond IEEE spec.
5. Re:Fixed-point arithmetic by goodminton · 2013-11-21 19:04 · Score: 5, Informative
  
  Awesome link! I'm the OP and I really appreciate your response. The reason I'm looking into this is that I work with many scientists who use commercial software packages where they don't control the code or compiler and their results are archived and can be reanalyzed years later. I was recently helping someone revive an old server to perform just such a reanalysis and we had so much trouble getting the machine going again I started planning to clone/virtualize it. That got me thinking about where to put the virtual machine (dedicated hardware, cloud, etc) and it also got me curious about hypervisors. I found some papers indicating that commercial hypervisors can have variability in their floating point math performance and all of that culminated in my post. Thanks again.
Use infinite precision software packages by shutdown+-p+now · 2013-11-21 12:17 · Score: 4, Informative

What the title says - e.g. bignum for Python etc. It will be significantly slower, but the result is going to be stable at least for a given library version, and that is far easier to archive.
Your chances are pretty darned good by Red+Jesus · 2013-11-21 12:18 · Score: 5, Informative

Mathematica in particular uses adaptive precision; if you ask it to compute some quantity to fifty decimal places, it will do so.
In general, if you want bit-for-bit reproducible calculations to arbitrary precision, the MPFR library may be right for you. It computes correctly-rounded special functions to arbitrary accuracy. If you write a program that calls MPFR routines, then even if your own approximations are not correctly-rounded, they will at least be reproducible.
If you want to do your calculations to machine precision, you can probably rely on C to behave reproducibly if you do two things: use a compiler flag like -mpc64 on GCC to force the elementary floating point operations (addition, subtraction, multiplication, division, and square root) to behave predictably, and use a correctly-rounded floating point library like crlibm (Sun also released a version of this at one point) to make the transcendental functions behave predictably.
False assumption by bertok · 2013-11-21 15:26 · Score: 4, Informative

This assumption by the OP:

Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time.
... is entirely wrong. One of the defining features of Mathematica is symbolic expression rewriting and arbitrary-precision computation to avoid all of those specific issues. For example, the expression:
N[Sin[1], 50]
Will always evaluate to exactly:
0.84147098480789650665250232163029899962256306079837
And, as expected, evaluating to 51 digits yields:
0.841470984807896506652502321630298999622563060798371
Notice how the last digit in the first case remains unchanged, as expected.
This is explained at length in the documentation, and also in numerous Wolfram blog articles that go on about the details of the algorithms used to achieve this on a range of processors and operating systems. The (rare) exceptions are marked as such in the help and usually have (slower) arbitrary-precision or symbolic variants. For research purposes, Mathematica comes with an entire bag of tools that can be used to implement numerical algorithms to any precision reliably.
Conclusion: The author of the post didn't even bother to flip through the manual, despite having strict requirements spanning decades. He does however have the spare time to post on Slashdot and waste everybody else's time.