Ask Slashdot: How Reproducible Is Arithmetic In the Cloud?
goodminton writes "I'm research the long-term consistency and reproducibility of math results in the cloud and have questions about floating point calculations. For example, say I create a virtual OS instance on a cloud provider (doesn't matter which one) and install Mathematica to run a precise calculation. Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time. In the cloud, hardware, firmware and hypervisors are invisible to the users but could still impact the implementation/operation of floating point math. Say I archive the virutal instance and in 5 or 10 years I fire it up on another cloud provider and run the same calculation. What's the likelihood that the results would be the same? What can be done to adjust for this? Currently, I know people who 'archive' hardware just for the purpose of ensuring reproducibility and I'm wondering how this tranlates to the world of cloud and virtualization across multiple hardware types."
Use Fixed-point arithmetic.
In Mathematica make sure to specify your precision.
Look at 'Arbitrary-Precision Numbers' and 'Machine-Precision Numbers' for more information on how Mathematica does this.
The result is always the same, but the definition of reality is changing. The result of every single calculation is in fact 42 in some units. The hard part is figuring out the units.
Some drink at the fountain of knowledge. Others just gargle.
Just scroll down a couple of posts. "Quite soon the Wolfram Language is going to start showing up in lots of places, notably on the web and in the cloud."
Problem solved!
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
If the value your computing is so dependent of the details of float point implementation that you'er worried about it, you probably have an issue of numerical stability and the results you are computing are likely useless, so this is really a mute point.
What the title says - e.g. bignum for Python etc. It will be significantly slower, but the result is going to be stable at least for a given library version, and that is far easier to archive.
Mathematica in particular uses adaptive precision; if you ask it to compute some quantity to fifty decimal places, it will do so.
In general, if you want bit-for-bit reproducible calculations to arbitrary precision, the MPFR library may be right for you. It computes correctly-rounded special functions to arbitrary accuracy. If you write a program that calls MPFR routines, then even if your own approximations are not correctly-rounded, they will at least be reproducible.
If you want to do your calculations to machine precision, you can probably rely on C to behave reproducibly if you do two things: use a compiler flag like -mpc64 on GCC to force the elementary floating point operations (addition, subtraction, multiplication, division, and square root) to behave predictably, and use a correctly-rounded floating point library like crlibm (Sun also released a version of this at one point) to make the transcendental functions behave predictably.
Different results on different hardware was a major problem up until CPU designers started to implement the IEEE754 standard for floating point arithmetic. IEEE754 conforming implementations should all return identical results for identical calculations
However, x86 systems have an 80-bit extended precision format and if the software uses 80-bit floats on x86 hardware and then you run the same code on an architecture that does not support the x86 80-bit format (say, ARM or Sparc or PowerPC) then you are likely to get different answers.
I think newer revisions of IEEE754 have support for extended precision formats up to 16-bytes, but you need to know your hardware (and how your software uses it) to make sure that you are doing equal work on systems with equal capabilities. You may have to sacrifice precision for portability.
When information is power, privacy is freedom.
If your calculations are processor-dependent, that's a bad sign for your code. If your results really depend on things that can be altered by the specific floating-point implementation, you need to write code that's robust to changes in the way floating-point arithmetic is done, generally by tracking the uncertainty associated with each number in your calculation. (Obviously you don't need real-time performance since you're using cloud computing in the first place.) I'm not an expert on Mathematica, but it probably has such things built in if you go through the documentation, since Mathematica notebooks are supposed to exhibit reproduceable behavior on different machines. (Which is not to say that no matter what you write it's automatically going to be reproduceable.
Archiving hardware to get consistent results is mainly used when there are legal issues and some lawyer can jump in and say, "A-ha! This bit here is different, and therefore there's some kind of fraud going on!"
I used to read Caltizzle. I was a lot cooler than you.
http://www.smbc-comics.com/?id=2999
Let's say you're using C on an x86. float (32-bit) and double (64-bit) are well defined. However, the x86 FPU internally uses long double (80-bit).
So if you do some math on a float or a double, the results can vary depending on if it was done as 80-bit or if the intermediaries were spilled and truncated back to 64/32 bit.
Do you even lift?
These aren't the 'roids you're looking for.
So I'm supposed to do all my calculations without any Pi? How can you have any Pi if you don't eat your machine?
This has pretty much been the bible for many, many, many years now: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
If you haven't read it, you should - no matter if you're a scientific developer, a games developer, or someone who writes javascript for web apps - it's just something everyone should have read.
Forgive the Oracle-ness (it was originally written by Sun). Covers what you need to know about IEEE754 from a mathematical and programming perspective.
Long story short, determinism across multiple (strictly IEE754 compliant) architectures while possible is hard - and likely not worth it. But if you're doing scientific computing, perhaps it may be worth it to you. (Just be prepared for a messy ride of maintaining LSB error accumulators, and giving up typically 1-3 more LSB of precision for the determinism - and not only having to worry about the math of your algorithms, but the math of tracking IEEE754 floating point error for every calculation you do).
What you can do, easily, however is understand the amount of error in your calculations and state the calculation error with your findings.
My first thought on seeing "tranlate" and "I'm research" was that it's only language, but then I read invalid and incorrect statements about how precision is defined in Mathematica. So now I'm not quite sure it's just language.
Archiving a whole virtual machine as opposed to the code being compiled and run is baffling to me.
Now if you are trying to archive the machine to run your old version of Mathematica and see if you get the same result, you may want to check your license agreement with Wolfram first. Second, you should be able to export the code and run the same code on new versions.
I'm really really confused on why you would want this to begin with though. Precision has increased quite a bit with the advent of 64bit hardware. I'd be more interested in taking some theoretical code and changing "double" to "uberlong" and see if I get the same results than what I solved today on today's hardware.
Unless this is some type of Government work which requires you to maintain the whole system, I simply fail to see any benefit.
Having "Cloud" does not change how precision works in Math languages.
-The wise argue that there are few absolutes, the fool argues that there are no probabilities.
This assumption by the OP:
Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time.
... is entirely wrong. One of the defining features of Mathematica is symbolic expression rewriting and arbitrary-precision computation to avoid all of those specific issues. For example, the expression:
N[Sin[1], 50]
Will always evaluate to exactly:
0.84147098480789650665250232163029899962256306079837
And, as expected, evaluating to 51 digits yields:
0.841470984807896506652502321630298999622563060798371
Notice how the last digit in the first case remains unchanged, as expected.
This is explained at length in the documentation, and also in numerous Wolfram blog articles that go on about the details of the algorithms used to achieve this on a range of processors and operating systems. The (rare) exceptions are marked as such in the help and usually have (slower) arbitrary-precision or symbolic variants. For research purposes, Mathematica comes with an entire bag of tools that can be used to implement numerical algorithms to any precision reliably.
Conclusion: The author of the post didn't even bother to flip through the manual, despite having strict requirements spanning decades. He does however have the spare time to post on Slashdot and waste everybody else's time.
They do not. IEEE754 has no "grey area". The results must match bit-exact or you are not IEEE754.
Of course, there can be implementation bugs. For example, Qemu does co-processor emulation only with 64 bit floats instead of the required 80 bit. Nobody seem to really care however. The other thing is of course that if reproducibility is more important than correctness, I suspect the math is done wrong.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.