Ask Slashdot: How Reproducible Is Arithmetic In the Cloud?

← Back to Stories (view on slashdot.org)

Ask Slashdot: How Reproducible Is Arithmetic In the Cloud?

Posted by timothy on Thursday November 21, 2013 @11:59AM from the irreproducible-results dept.

goodminton writes "I'm research the long-term consistency and reproducibility of math results in the cloud and have questions about floating point calculations. For example, say I create a virtual OS instance on a cloud provider (doesn't matter which one) and install Mathematica to run a precise calculation. Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time. In the cloud, hardware, firmware and hypervisors are invisible to the users but could still impact the implementation/operation of floating point math. Say I archive the virutal instance and in 5 or 10 years I fire it up on another cloud provider and run the same calculation. What's the likelihood that the results would be the same? What can be done to adjust for this? Currently, I know people who 'archive' hardware just for the purpose of ensuring reproducibility and I'm wondering how this tranlates to the world of cloud and virtualization across multiple hardware types."

37 of 226 comments (clear)

Min score:

Reason:

Sort:

Fixed-point arithmetic by mkremer · 2013-11-21 12:01 · Score: 5, Informative

Use Fixed-point arithmetic.
In Mathematica make sure to specify your precision.
Look at 'Arbitrary-Precision Numbers' and 'Machine-Precision Numbers' for more information on how Mathematica does this.
1. Re:Fixed-point arithmetic by Anonymous Coward · 2013-11-21 12:21 · Score: 5, Insightful
  
  Submitter is entirely ignorant of floating point issues in general. Other than the buzzword "cloud" this is no different from any other clueless question about numerical issues in computing. "Help me, I don't know anything about the problem, but I just realized it exists!"
2. Re:Fixed-point arithmetic by Giant+Electronic+Bra · 2013-11-21 12:44 · Score: 4, Informative
  
  Yes, you can do this, but its not feasible for all calculations. Things like trig functions are implemented on FP numbers, and once you start using FP its better to just keep using it, converting back and forth is just bad and defeats the whole purpose anyway. So in reality you end up with applications that DO use FP (believe me, as an old FORTH programmer I can attest to the benefits of scaled integer arithmetic!). Its one of those things, we're stuck with FP and once we assume that, then the whole question of small differences in results of machine-level instructions or of minor differences in libraries on different platforms, etc. you will probably find that arbitrary VMs won't produce exactly identical results when you run on different platforms (AWS, KVM, VMWare, some new thing).
  Is it ia huge problem though? The results produced should be similar, the parameters being varied were never controlled for anyway. Its how often the rounding errors between two FPUs are identical. Neither the new nor the old results should be considered 'better' and they should generally be about the same if the result is robust. A climate sym for example run on two different systems for an ensemble of runs with similar inputs should produce statistically indistinguishable results. If they don't then you should know what the differences are by comparison. In reality I doubt very many experiments will be in doubt based on this.
  
  --
  "Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
3. Re:Fixed-point arithmetic by Jane+Q.+Public · 2013-11-21 13:04 · Score: 4, Insightful
  
  "Is it ia huge problem though?"
  If tools like Mathematica are dependent on the floating-point precision of a given processor, They're Doing It Wrong.
4. Re:Fixed-point arithmetic by raddan · 2013-11-21 13:37 · Score: 4, Interesting
  
  Experiments can vary wildly with even small differences in floating-point precision. I recently had a bug in a machine learning algorithm that produced completely different results because I was off by one trillionth! I was being foolish, of course, because I hadn't use an epsilon for doing FP, but you get the idea.
  
  But it turns out-- even if you're a good engineer and you are careful with your floating point numbers, the fact is: floating point is approximate computation. And for many kinds of mathematical problems, like dynamical systems, this approximation changes the result. One of the founders of chaos theory, Edward Lorenz, of Lorenz attractor fame, discovered the problem by truncating the precision of FP numbers from a printout when he was re-entering them into a simulation. The simulation behaved completely differently despite the difference in precision being in the thousands. That was a weather simulation. See where I'm going with this?
5. Re:Fixed-point arithmetic by Joce640k · 2013-11-21 14:05 · Score: 4, Informative
  
  Submitter is entirely ignorant of floating point issues in general. Other than the buzzword "cloud" this is no different from any other clueless question about numerical issues in computing. "Help me, I don't know anything about the problem, but I just realized it exists!"
  Wrong.
  In IEEE floating point math, "(a+b)+c" might not be the same as "a+(b+c)".
  The exact results of a calculation can depend on how a compiler optimized the code. Change the compiler and all bets are off. Different versions of the same software can produce different results.
  If you want the exact same results across all compilers you need to write your own math routines which guarantee the order of evaluation of expressions.
  OTOH, operating system, hardware, firmware and hypervisors shouldn't make any difference if they're running the same code. IEEE math *is* deterministic.
  
  --
  No sig today...
6. Re:Fixed-point arithmetic by NEDHead · 2013-11-21 14:06 · Score: 4, Funny
  
  I have a mechanical calculator that is extremely reliable, so long as you oil it.
7. Re:Fixed-point arithmetic by Giant+Electronic+Bra · 2013-11-21 14:30 · Score: 5, Insightful
  
  I think the problem is that people PERCEIVE it to be a problem. Nothing is any more problematic than it was before, good numerical simulations will be stable over some range of inputs. It shouldn't MATTER if you get slightly different results for one given input. If that's all you tested, well, you did it wrong indeed. Mathematica is fine, people need to A) understand scientific computing and B) understand how to run and interpret models. I think most scientists that are doing a lot of modelling these days DO know these things. Its the occasional users that get it wrong I suspect.
  
  --
  "Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
8. Re:Fixed-point arithmetic by Giant+Electronic+Bra · 2013-11-21 14:40 · Score: 5, Informative
  
  Trust me, its a subject I've studied. The problem here is that your system is unstable, tiny differences in inputs generate huge differences in output. You cannot simply take one set of inputs that produces what you think is the 'right answer' from that system and ignore all the rest! You have to explore the ensemble behavior of many different sets of inputs, and the overall set of responses of the system is your output, not any one specific run with specific inputs that would produce a totally different result if one was off by a tiny bit.
  Of course Lorenz realized this. Simple experiments with an LDE will show you this kind of result. You simply cannot treat these systems the way you would ones which exhibit function-like behavior (at least within some bounds). Lorenz of course also realized THAT, but sadly not everyone has got the memo yet! lol.
  
  --
  "Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
9. Re:Fixed-point arithmetic by immaterial · 2013-11-21 14:49 · Score: 5, Insightful
  
  For a guy who started off a reply with an emphatic "Wrong" you sure do seem to agree with the guy you quoted.
10. Re:Fixed-point arithmetic by gl4ss · 2013-11-21 14:53 · Score: 4, Insightful
  
  the question was not about compilers or indeed about software, but about fpu's, about firing up the same instance, with the same compilers and indeed with the same original binary.
  it sounds like just fishing for reasons to have a budget to keep old power hw around.
  I would think that if the results change so much to matter depending on fpu, that the whole calculation method is suspect to begin with and exploits some feature/bug to get a tuned result(but assuming that the cpu/vm adheres to the standard that they would be the same - if the old one doesn't and the new one does then I think that a honest scientist would want to know that too).
  
  --
  world was created 5 seconds before this post as it is.
11. Re:Fixed-point arithmetic by RightwingNutjob · 2013-11-21 15:41 · Score: 3, Informative
  
  If you want exact results from a fixed number of significant bits, you want magic.
  
  Whatever calculation you're making, be aware of the dynamic range of the intermediate results. Structure your calculations so that all intermediate results stay well within the dynamic range of the datatype. If you want to compute the standard deviation of 2048x2048 32-bit integers, use a 64 bit or 128 bit integer to compute the intermediate sum(x^2). If you try to use an IEEE double, you'll end up overflowing the 53 bits they give you because 2^11 * 2^11 * 2^32=2^54.
  
  If you can, reformulate your calculation steps so to minimize the sensitivity to random errors on the order of a machine epsilon.
  
  An electronic computer manual from UNIVAC/Boroughs/IBM written for pure mathematicians in ~1953 will tell you the same thing.
12. Re:Fixed-point arithmetic by fuzzyfuzzyfungus · 2013-11-21 16:17 · Score: 3, Funny
  
  Or simply don't use the broken "cloud computing" model. If you have some calculations to do, and care the least about the results, how about buying a computer that does those calculations for you?
  In other news, many problems become much easier when you assume a suitably large pile of money.
  
  Incidentally, the same is true of explosives, amphetamines, and hookers.
13. Re:Fixed-point arithmetic by fuzzyfuzzyfungus · 2013-11-21 16:27 · Score: 3, Funny
  
  "How do I test for Turing completeness "on the cloud"?"
  
  This one is actually a nontrivial challenge. Once the tape starts to get damp, you need to keep track of the probability that executing a given head-moving operation will cause the tape to snap and abruptly leave you with a confused finite state machine...
14. Re:Fixed-point arithmetic by philip.paradis · 2013-11-21 17:40 · Score: 4, Funny
  
  Incidentally, the same is true of explosives, amphetamines, and hookers.
  I don't have to be a mathematician to say that sounds like one hell of a party.
  
  --
  Write failed: Broken pipe
15. Re:Fixed-point arithmetic by tlhIngan · 2013-11-21 18:09 · Score: 5, Informative
  
  Don't use floating point if you can avoid it.
  If you can't, and the results are EXTREMELY important (remember, floating point is an APPROXIMATION of numbers), then you have to read What Every Computer Scientist Should Know About Floating Point Numbers. (Yes, it's an Oracle link, but if you google it, most of the links are PDFs while the Oracle one is HTML).
  If you're worried about your cloud provider screwing with your results, then you're definitely doing it wrong (read that article).
  And yes, lots of people, even scientists, do it wrong because the idealized notion of what a floating point type is and how it actually works in hardware is completely different. Floating point numbers are tricky - they're VERY easy to use, but they're also VERY easy to use wrongly, and it's only if you know how the actual hardware is doing the calculations can you structure your programs and algorithms to do it right.
  And no actual hardware FPU or VPU (vector unit - some do floating point) implements the full IEEE spec. Many come close, but none implement it exactly - there's always an omission or two. Especially since a lot of FPUs provide extended precision that goes beyond IEEE spec.
16. Re:Fixed-point arithmetic by Chalnoth · 2013-11-21 18:19 · Score: 3, Informative
  
  Yup. And if you want to use any kind of parallelism to compute the final result, you're going to have quite a hard time ensuring that the order of operations is always the same.
  That said, there are libraries around that make use of IEEE's reproducibility guarantees to ensure reproducible results. That will likely correct any reproducibility issues that would otherwise be introduced by the compiler, but you still have the order of operations issue (which is a fundamental problem).
  Personally, I think a better solution is to simply assume that you're never going to get reproducible floating-point results, and design the system to handle small, inconsistent rounding errors. I think that's a much easier problem to deal with than making floating-point reproducible in any modestly-complex system.
17. Re:Fixed-point arithmetic by noh8rz10 · 2013-11-21 18:42 · Score: 3, Funny
  
  that link has a lot of words.
18. Re:Fixed-point arithmetic by goodminton · 2013-11-21 19:04 · Score: 5, Informative
  
  Awesome link! I'm the OP and I really appreciate your response. The reason I'm looking into this is that I work with many scientists who use commercial software packages where they don't control the code or compiler and their results are archived and can be reanalyzed years later. I was recently helping someone revive an old server to perform just such a reanalysis and we had so much trouble getting the machine going again I started planning to clone/virtualize it. That got me thinking about where to put the virtual machine (dedicated hardware, cloud, etc) and it also got me curious about hypervisors. I found some papers indicating that commercial hypervisors can have variability in their floating point math performance and all of that culminated in my post. Thanks again.
19. Re:Fixed-point arithmetic by amck · 2013-11-21 23:56 · Score: 4, Insightful
  
  Getting the result to be deterministic is only the start of the problem. How do you know it is _correct_, or more properly, know the error bounds involved? How much does it matter to your problem?
  e.g. If I am doing a 48-hour weather forecast, I can compare my results with observations next week; I can treat numerical error as a part of "model" error along with input observational uncertainty, etc.
  I might validate part of my solutions by checking that, for example, the total water content of my planet doesn't change. For a 48-hour forecast, I might tolerate methods that slightly lose water over 48 hours in return for a fast solution. For a climate forecast/projection, this would be unacceptable.
  Getting the same answer every time is no comfort if I have no way of knowing if its the right answer.
  
  --
  Anyone who believes exponential growth can go on forever in a finite world is either a madman or an economist
bend reality by goombah99 · 2013-11-21 12:02 · Score: 5, Funny

The result is always the same, but the definition of reality is changing. The result of every single calculation is in fact 42 in some units. The hard part is figuring out the units.

--
Some drink at the fountain of knowledge. Others just gargle.
Easiest solution by ShaunC · 2013-11-21 12:16 · Score: 3, Funny

Just scroll down a couple of posts. "Quite soon the Wolfram Language is going to start showing up in lots of places, notably on the web and in the cloud."
Problem solved!

--
Thanks to the War on Drugs, it's easier to buy meth than it is to buy cold medicine!
Numerical instability by Anonymous Coward · 2013-11-21 12:16 · Score: 5, Insightful

If the value your computing is so dependent of the details of float point implementation that you'er worried about it, you probably have an issue of numerical stability and the results you are computing are likely useless, so this is really a mute point.
1. Re:Numerical instability by brantondaveperson · 2013-11-21 14:25 · Score: 3, Funny
  
  This is the only answer so far that makes sense, which is a pity because
  A) It's an AC
  and
  B) The point is moot, not mute.
  But we all knew that, didn't we.
Use infinite precision software packages by shutdown+-p+now · 2013-11-21 12:17 · Score: 4, Informative

What the title says - e.g. bignum for Python etc. It will be significantly slower, but the result is going to be stable at least for a given library version, and that is far easier to archive.
Your chances are pretty darned good by Red+Jesus · 2013-11-21 12:18 · Score: 5, Informative

Mathematica in particular uses adaptive precision; if you ask it to compute some quantity to fifty decimal places, it will do so.
In general, if you want bit-for-bit reproducible calculations to arbitrary precision, the MPFR library may be right for you. It computes correctly-rounded special functions to arbitrary accuracy. If you write a program that calls MPFR routines, then even if your own approximations are not correctly-rounded, they will at least be reproducible.
If you want to do your calculations to machine precision, you can probably rely on C to behave reproducibly if you do two things: use a compiler flag like -mpc64 on GCC to force the elementary floating point operations (addition, subtraction, multiplication, division, and square root) to behave predictably, and use a correctly-rounded floating point library like crlibm (Sun also released a version of this at one point) to make the transcendental functions behave predictably.
iEEE 754 by Jah-Wren+Ryel · 2013-11-21 12:19 · Score: 3, Insightful

Different results on different hardware was a major problem up until CPU designers started to implement the IEEE754 standard for floating point arithmetic. IEEE754 conforming implementations should all return identical results for identical calculations
However, x86 systems have an 80-bit extended precision format and if the software uses 80-bit floats on x86 hardware and then you run the same code on an architecture that does not support the x86 80-bit format (say, ARM or Sparc or PowerPC) then you are likely to get different answers.
I think newer revisions of IEEE754 have support for extended precision formats up to 16-bytes, but you need to know your hardware (and how your software uses it) to make sure that you are doing equal work on systems with equal capabilities. You may have to sacrifice precision for portability.

--
When information is power, privacy is freedom.
You need to know some numerical analysis by daniel_mcl · 2013-11-21 12:22 · Score: 5, Insightful

If your calculations are processor-dependent, that's a bad sign for your code. If your results really depend on things that can be altered by the specific floating-point implementation, you need to write code that's robust to changes in the way floating-point arithmetic is done, generally by tracking the uncertainty associated with each number in your calculation. (Obviously you don't need real-time performance since you're using cloud computing in the first place.) I'm not an expert on Mathematica, but it probably has such things built in if you go through the documentation, since Mathematica notebooks are supposed to exhibit reproduceable behavior on different machines. (Which is not to say that no matter what you write it's automatically going to be reproduceable.
Archiving hardware to get consistent results is mainly used when there are legal issues and some lawyer can jump in and say, "A-ha! This bit here is different, and therefore there's some kind of fraud going on!"

--
I used to read Caltizzle. I was a lot cooler than you.
1. Re:You need to know some numerical analysis by rockmuelle · 2013-11-21 13:02 · Score: 5, Insightful
  
  This.
  Reproducibility (what we strive for in science) is not the same as repeatability (what the poster is actually trying to achieve). Results that are not robust on different platforms aren't really scientific results.
  I wish more scientists understood this.
  -Chris
2. Re:You need to know some numerical analysis by Red+Jesus · 2013-11-21 14:02 · Score: 4, Interesting
  
  While that's true in many cases, there are some situations in which we need . Read Shewchuk's excellent paper on the subject.
  
  When disaster strikes and a real RAM-correct algorithm implemented in floating-point arithmetic fails to produce a meaningful result, it is often because the algorithm has performed tests whose results are mutually contradictory.
  The easiest way to think about it is with a made-up problem about sorting. Let's say that you have a list of mathematical expressions like sin(pi*e^2), sqrt(14*pi*ln(8)), tan(10/13), etc and you want to sort them, but some numbers in the list are so close to each other that they might compare differently on different computers that round differently, (e.g. one computer says that sin(-10) is greater than ln(100)-ln(58) and the other says it's less).
  Imagine now that this list has billions of elements and you're trying to sort the items using some sort of distributed algorithm. For the sorting to work properly, you *need* to be sure that a < b implies that b > a. There are situations (often in computational geometry) where it's OK if you get the wrong answer for borderline cases (e.g. it doesn't matter whether you can tell whether sin(-10) is bigger than ln(100)-ln(58) because they're close enough for graphics purposes) as long as you get the wrong answer consistently, so the next algorithm out (sorting in my example, or triangulation in Shewchuk's) doesn't get stuck in infinite loops.
Obligatory Comic by ttucker · 2013-11-21 12:24 · Score: 4, Funny

http://www.smbc-comics.com/?id=2999
Re:WTF? by larry+bagina · 2013-11-21 12:25 · Score: 3, Informative

Let's say you're using C on an x86. float (32-bit) and double (64-bit) are well defined. However, the x86 FPU internally uses long double (80-bit).
So if you do some math on a float or a double, the results can vary depending on if it was done as 80-bit or if the intermediaries were spilled and truncated back to 64/32 bit.

--
Do you even lift?
These aren't the 'roids you're looking for.
Re:Arbitray precision by weilawei · 2013-11-21 12:27 · Score: 3, Funny

So I'm supposed to do all my calculations without any Pi? How can you have any Pi if you don't eat your machine?
Ye Old Text by Anonymous Coward · 2013-11-21 12:47 · Score: 3, Insightful

This has pretty much been the bible for many, many, many years now: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
If you haven't read it, you should - no matter if you're a scientific developer, a games developer, or someone who writes javascript for web apps - it's just something everyone should have read.
Forgive the Oracle-ness (it was originally written by Sun). Covers what you need to know about IEEE754 from a mathematical and programming perspective.
Long story short, determinism across multiple (strictly IEE754 compliant) architectures while possible is hard - and likely not worth it. But if you're doing scientific computing, perhaps it may be worth it to you. (Just be prepared for a messy ride of maintaining LSB error accumulators, and giving up typically 1-3 more LSB of precision for the determinism - and not only having to worry about the math of your algorithms, but the math of tracking IEEE754 floating point error for every calculation you do).
What you can do, easily, however is understand the amount of error in your calculations and state the calculation error with your findings.
Is it just a language barrier? by s.petry · 2013-11-21 12:52 · Score: 3, Informative

My first thought on seeing "tranlate" and "I'm research" was that it's only language, but then I read invalid and incorrect statements about how precision is defined in Mathematica. So now I'm not quite sure it's just language.
Archiving a whole virtual machine as opposed to the code being compiled and run is baffling to me.
Now if you are trying to archive the machine to run your old version of Mathematica and see if you get the same result, you may want to check your license agreement with Wolfram first. Second, you should be able to export the code and run the same code on new versions.
I'm really really confused on why you would want this to begin with though. Precision has increased quite a bit with the advent of 64bit hardware. I'd be more interested in taking some theoretical code and changing "double" to "uberlong" and see if I get the same results than what I solved today on today's hardware.
Unless this is some type of Government work which requires you to maintain the whole system, I simply fail to see any benefit.
Having "Cloud" does not change how precision works in Math languages.

--
-The wise argue that there are few absolutes, the fool argues that there are no probabilities.
False assumption by bertok · 2013-11-21 15:26 · Score: 4, Informative

This assumption by the OP:

Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time.
... is entirely wrong. One of the defining features of Mathematica is symbolic expression rewriting and arbitrary-precision computation to avoid all of those specific issues. For example, the expression:
N[Sin[1], 50]
Will always evaluate to exactly:
0.84147098480789650665250232163029899962256306079837
And, as expected, evaluating to 51 digits yields:
0.841470984807896506652502321630298999622563060798371
Notice how the last digit in the first case remains unchanged, as expected.
This is explained at length in the documentation, and also in numerous Wolfram blog articles that go on about the details of the algorithms used to achieve this on a range of processors and operating systems. The (rare) exceptions are marked as such in the help and usually have (slower) arbitrary-precision or symbolic variants. For research purposes, Mathematica comes with an entire bag of tools that can be used to implement numerical algorithms to any precision reliably.
Conclusion: The author of the post didn't even bother to flip through the manual, despite having strict requirements spanning decades. He does however have the spare time to post on Slashdot and waste everybody else's time.
Re:WTF? by gweihir · 2013-11-21 17:18 · Score: 3, Informative

They do not. IEEE754 has no "grey area". The results must match bit-exact or you are not IEEE754.
Of course, there can be implementation bugs. For example, Qemu does co-processor emulation only with 64 bit floats instead of the required 80 bit. Nobody seem to really care however. The other thing is of course that if reproducibility is more important than correctness, I suspect the math is done wrong.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.